soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Strip Whitespace - 6 Updates
"C++ Creator Bjarne Stroustrup Weighs in on Distributed Systems, Type Safety and Rust" - 8 Updates
Strip Whitespace - 1 Update
Does std::regex need to be so large? - 8 Updates
Can similar programs be legel, warning or error? - 2 Updates

Mike Copeland <mrc2323@cox.net>: Aug 24 05:14PM -0700

> If all you want to do is slip the leading whitespace it would probably
> be more efficient to scan for the first non-space character in your read
> line, then output the rest of the line. Less in-memory copying.

That's my intent...but how do I do this? TIA

--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Siri Cruise <chine.bleu@yahoo.com>: Aug 24 05:43PM -0700

In article
<MPG.39ae05b3efcc46529896ee@news.eternal-september.org>,
> > be more efficient to scan for the first non-space character in your read
> > line, then output the rest of the line. Less in-memory copying.

> That's my intent...but how do I do this? TIA

C++ is so much better than C.

resetbuffer();
int c = 0; while (c=fgetc(fn), c!=EOF && isspace(c)) ;
while (c!='\n' && c!=EOF) {addbuffer(c); c = fgetc(fn);}
if (emptybuffer() && c==EOF) eofbuffer();

How crude and inefficient.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
The first law of discordiamism: The more energy This post / \
to make order is nore energy made into entropy. insults Islam. Mohammed

Sam <sam@email-scan.com>: Aug 25 07:07AM -0400

Mike Copeland writes:

> > be more efficient to scan for the first non-space character in your read
> > line, then output the rest of the line. Less in-memory copying.

> That's my intent...but how do I do this? TIA

line.erase(line.begin(),
std::find_if(line.begin(), line.end(),
[]
(auto c)
{
return c != ' ';
}));

A complete description and explanation of the above C++ library algorithms
and container methods can be found in pretty much every C++ textbook.

Paavo Helde <eesnimi@osa.pri.ee>: Aug 25 02:19PM +0300

25.08.2020 14:07 Sam kirjutas:

> A complete description and explanation of the above C++ library
> algorithms and container methods can be found in pretty much every C++
> textbook.

This is much more concise and actually strips all leading whitespace,
not just spaces:

line.erase(0, line.find_first_not_of(" \t\r\n"));

Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 25 01:45PM

On Tue, 2020-08-25, Siri Cruise wrote:
> while (c!='\n' && c!=EOF) {addbuffer(c); c = fgetc(fn);}
> if (emptybuffer() && c==EOF) eofbuffer();

> How crude and inefficient.

I'd do it like this:

std::string s;
while(std::getline(is, s)) {
lstrip(s);
do_stuff_with(s);
}

lstrip(std::string&) is trivial to write.

Or if I really didn't want the copying, I'd use std::string_view, or
iterators and implement:

It lstrip(It a, It b);

I don't think I've ever done this though. Normally, once you've
stripped leading whitespace you still have plenty of parsing to do,
and you haven't won a lot by doing one tiny part of it.

It's e.g. more commonly useful to have a split() function which splits
a string on whitespace, while removing said whitespace.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Siri Cruise <chine.bleu@yahoo.com>: Aug 25 08:27AM -0700

In article <slrnrka5f5.1pog.grahn+nntp@frailea.sa.invalid>,

> Or if I really didn't want the copying, I'd use std::string_view, or
> iterators and implement:

> It lstrip(It a, It b);

It's simple deeds should be done cheap.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
The first law of discordiamism: The more energy This post / \
to make order is nore energy made into entropy. insults Islam. Mohammed

"C++ Creator Bjarne Stroustrup Weighs in on Distributed Systems, Type Safety and Rust"

Ike Naar <ike@rie.sdf.org>: Aug 25 05:21AM

> [...] If DBL_EPSILON*INT_MAX < 0, it's guaranteed to do so.

If both DBL_EPSILON and INT_MAX are positive, can their product ever be < 0 ?

"Öö Tiib" <ootiib@hot.ee>: Aug 24 11:19PM -0700

On Tuesday, 25 August 2020 08:21:08 UTC+3, Ike Naar wrote:
> On 2020-08-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > [...] If DBL_EPSILON*INT_MAX < 0, it's guaranteed to do so.

> If both DBL_EPSILON and INT_MAX are positive, can their product ever be < 0 ?

He apparently meant < 1 not < 0.

David Brown <david.brown@hesbynett.no>: Aug 25 10:42AM +0200

On 24/08/2020 18:10, Ben Bacarisse wrote:

> I would be prepared to have go! Could you argue that this it /is/ a
> type error? Conversion between arithmetic types are very common. Are
> they all detrimental to type safety?

"Type safety", I would say, really means that the only conversions done
are those that you know have a well defined meaning. It is about
objects having clearly defined types that can be used in certain ways,
and the language making it difficult to use them in other ways. In the
BCPL example, the language is not type safe. I don't know BCPL, so I
don't know if it is because there is no distinction between types
"integer" and "array of integer", or because there is nothing stopping
you using an integer as though it were an array.

In this sense, conversions between bool and other integer types is
type-safe - the behaviour is clearly specified and it is an allowed
operation on the types.

But you can certainly argue that it is not a good operation to allow,
and that C++ would be a "safer" language (not "type safer") if these
operations were not allowed. There are good arguments for saying that
"bool" should not have been considered an arithmetic type, especially as
conversions from other types to bool do not work in the same way as
conversions to other integer types. (Of course it is too late to change
things now.)

People also use the term "type safety" to mean "using types to make code
safer" - with the aim that more kinds of error in code can be caught as
compile-time errors rather than found at run-time or in testing. That
is, of course, a laudable aim - and automatic conversions between bool
and other arithmetic types goes against that aim.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Aug 25 12:54PM +0100

On Tue, 25 Aug 2020 10:42:19 +0200
> "Type safety", I would say, really means that the only conversions done
> are those that you know have a well defined meaning.

"Type safety" describes a quality rather than anything particularly
quantifiable and so is subject to opinion. Even so I wouldn't agree
with your suggested definition. Given the address of an object of type
T held by a pointer to T, conversion of that pointer to void*, so
suppressing all static type information in the compiler, has a well
defined meaning in C++ but I don't think anyone would say it was type
safe.

Expressions like "strictly typed" or "soundly typed" are somewhat
clearer in the computer science world. In a strictly typed/soundly
typed language, which C++ isn't, every object representation must
actually be of the type explicitly indicated by the programmer or
unambigously inferred by the compiler on type unification, or the
program will not compile. "Strongly typed" is somewhat vaguer.
Implicit conversion from integer to bool and vice versa is ill-advised,
as also is implicit conversion from wider integers to narrower integers
more generally, and would offend many people's views of strong typing
(including it appears Richard's).

Fully type-checked dynamically typed languages such as python or lisp
can also be said to be strongly typed.

"daniel...@gmail.com" <danielaparker@gmail.com>: Aug 25 05:19AM -0700

On Tuesday, August 25, 2020 at 4:42:28 AM UTC-4, David Brown wrote:
> In this sense, conversions between bool and other integer types is
> type-safe - the behaviour is clearly specified and it is an allowed
> operation on the types.

I think that's too narrow considering the common use of the term,
for example, a Microsoft document has "type safety ... means that
every variable, function argument, and function return value is storing
an acceptable kind of data, and that operations that involve values
of different types "make sense" and don't cause data loss, incorrect
interpretation of bit patterns, or memory corruption."

> conversions from other types to bool do not work in the same way as
> conversions to other integer types. (Of course it is too late to change
> things now.)

At the time bool entered the language, the rationale was compatibility
with and easy conversion of C and C++ usage patterns like

#define bool int
#define true 1
#define false 0

bool once = 0;
if (!once++) {}

"type safety" was a far lesser consideration.

Daniel

James Kuyper <jameskuyper@alumni.caltech.edu>: Aug 25 08:35AM -0400

On 8/25/20 1:21 AM, Ike Naar wrote:
> On 2020-08-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>> [...] If DBL_EPSILON*INT_MAX < 0, it's guaranteed to do so.

> If both DBL_EPSILON and INT_MAX are positive, can their product ever be < 0 ?

Sorry - that was supposed to be 1, not 0.

James Kuyper <jameskuyper@alumni.caltech.edu>: Aug 25 09:41AM -0400

On 8/25/20 4:42 AM, David Brown wrote:
...
> "Type safety", I would say, really means that the only conversions done
> are those that you know have a well defined meaning.

I agree with Wikipedia's opening description of type safety "In computer
science, type safety is the extent to which a programming language
discourages or prevents type errors." An implicit conversion that allows
you to use the wrong type is NOT an example of a feature that makes a
language type safe, no matter how well-defined the conversion is.

David Brown <david.brown@hesbynett.no>: Aug 25 04:51PM +0200

On 25/08/2020 13:54, Chris Vine wrote:
>> are those that you know have a well defined meaning.

> "Type safety" describes a quality rather than anything particularly
> quantifiable and so is subject to opinion.

Agreed.

> suppressing all static type information in the compiler, has a well
> defined meaning in C++ but I don't think anyone would say it was type
> safe.

C++ is not an entirely type-safe language. Conversions of pointers like
this are a way to get around the type-safety features the language
provides. But generally these things are undefined behaviour if used to
break type safety. Converting a pointer to a different pointer type
(directly, via void*, via uintptr_t, etc.) can have a well-defined
meaning in itself. But use of the converted pointer is highly
restricted - there is often little you can do (in the sense of having
well-defined behaviour) other than convert it back again.

However, I was inaccurate in the way I described it. Often the
conversions between incompatible pointer types can have well-defined
meanings - it is the /use/ of those converted pointers that is not well
defined.

> (including it appears Richard's).
> > Fully type-checked dynamically typed languages such as python or lisp
> can also be said to be strongly typed.

Yes.

(It often surprises people to hear Python described as strongly typed -
it's easy to make the mistake of thinking strongly typed implies
statically typed.)

Strip Whitespace

ram@zedat.fu-berlin.de (Stefan Ram): Aug 25 02:35PM

>void skip_whitespace( ::std::istringstream & s )

One could also use a "drop_while_view" ([range.drop.while]):

constexpr auto source = " \t \t \t hello there";
auto is_invisible = [](const auto x) { return x == ' ' || x == '\t'; };
auto skip_ws = drop_while_view{source, is_invisible};
for (auto c : skip_ws) {
cout << c; // prints hello there with no leading space
}

(source code quoted from n4849).

Does std::regex need to be so large?

legalize+jeeves@mail.xmission.com (Richard): Aug 25 03:46AM

[Please do not mail me a copy of your followup]

Cholo Lennon <chololennon@hotmail.com> spake the secret code
>constrained RAM environment like a blockchain, that was a no no. I ended
>up using methods from std::string to parse the data. The resulting code
>was awful, but the size stayed around 40 Kb.

You may want to look at the compile-time regex library:
<https://github.com/hanickadot/compile-time-regular-expressions>
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Bonita Montero <Bonita.Montero@gmail.com>: Aug 25 07:21AM +0200

> Well the guys at Microsoft told me that they compile with optimisation
> for size. Because "A page fault can ruin your day".

We're talking about I-cache hit-rates and not about page-faults.

> I've got 32Gb on my dev machine, and 16 on this thing. If the code is
> bigger I'll have less free to use as disc cache. Given that our compiled
> code base is 144G I'm not going to fit it all into RAM any time soon.

144G code in memory - LOL. Even oracle isn't that large.

Christian Gollwitzer <auriocus@gmx.de>: Aug 25 08:23AM +0200

Am 25.08.20 um 05:46 schrieb Richard:

> You may want to look at the compile-time regex library:
> <https://github.com/hanickadot/compile-time-regular-expressions>

Wow, this looks really good. Compiling is slow, but the generated code
is very compact (the links to godbolt result in something like 100
instructions for one of teh demo functions)

Christian

legalize+jeeves@mail.xmission.com (Richard): Aug 25 06:29AM

[Please do not mail me a copy of your followup]

Christian Gollwitzer <auriocus@gmx.de> spake the secret code

>Wow, this looks really good. Compiling is slow, but the generated code
>is very compact (the links to godbolt result in something like 100
>instructions for one of teh demo functions)

Yes, it's quite neat. Compile time regex requires a modern compiler as
it has a dependency on some compiler extensions that have been proposed for
standardization but not yet adopted. If that makes it infeasible to
use (e.g. in the embedded scenario where you may be reliant on an
externally supplied toolchain), then you might want to look at the
boost.spirit parsing library. Matching a regex is essentially a
parsing problem. With boost.spirit, you can create parsers that are
very efficient in terms of code space and runtime. Boost.spirit v2
has been around a long time and supports older compilers. V3 assumes
C++11 IIRC and requires more modern compilers for some features.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Juha Nieminen <nospam@thanks.invalid>: Aug 25 06:56AM

>>itself, but it's not that far off either.)

> Next you'll be telling me node.js is a superb server side system because of
> its speed and efficiency. Note I have worked somewhere that used it - it isn't.

I notice that you didn't actually refute what I said.

>>even having to rely on any sort of browser bug or security hole.)

> The fact that the javascript API allows this shows the unnecessary low
> level activity shows the massive feature creep browsers have incurred.

And you are still not refuting what I said.

So, is a program running in a browser "1/100 the speed of a native program"
or not? Or are you just going to dodge some more?

boltar@nuttyella.co.uk: Aug 25 08:39AM

On Mon, 24 Aug 2020 13:20:05 -0300

>> Not that I care about bitcoin in the slightest, its just another form of
>> financial speculation for the gullible and a currency for criminals.

>Troll detected

How tediously predictable and pathetic. Anyone who has an opinion that differs
from the hipster groupthink is a troll. Whatever.

boltar@nuttyella.co.uk: Aug 25 08:42AM

On Mon, 24 Aug 2020 12:09:06 -0700
>Love WebGL. Here is some of my work:

>http://funwithfractals.atspace.cc/ct_gfield_test/3d_user/ct_wormhole_exp.html

Very pretty. Here's an arcade game that did something similar in 1983 using
a 1.5Mhz 6809 CPU:

https://en.wikipedia.org/wiki/Star_Wars_(1983_video_game)

boltar@nuttyella.co.uk: Aug 25 08:44AM

On Tue, 25 Aug 2020 06:56:41 +0000 (UTC)

>And you are still not refuting what I said.

>So, is a program running in a browser "1/100 the speed of a native program"
>or not? Or are you just going to dodge some more?

It was exaggeration to make a point as you well know. But my experience of
node.js in a production enviroment compared to a C/C++ equivalent is that
the latter runs approx 4-5x the speed. I don't see why a browser would be
any different.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Aug 25 12:34AM +0100

On Mon, 24 Aug 2020 17:35:18 -0400
> > undefined behaviour in C++, although it is supported by gcc and clang.
> > You can only legally access the currently active member of the union in
> > C++ (it is different in C).

[provisions of C standard concerning unions snipped]
> Given that history, I'm comfortable about making the same assumption in
> C++ code, despite the fact that the wording of the C++ standard is still
> closer to what C90 said than to what C2011 says.

The reputed effect of [class.union]/2 of C++ is that you can only read
the active member of a union, which is the member last written to, so
type punning through them is illegal. At least that is what Stroustrup
claims in TC++PL, as also does CppCoreGuidelines, and I have seen others
claim the same. The gcc developers treat union type punning as an
extension, not as a requirement arising from the standard. I do not
know whether or not Visual Studio allows type punning in C++ through
unions.

This has been gone into on this newsgroup before and I do not
particularly want to do so again. Are there contrary arguments?
Probably.

> std::memcpy(&i, &l, sizeof i);

> does the C++ standard guarantee anything more about the value stored in
> 'i' than it does about the corresponding union code? If so, where?

The C++ standard incorporates relevant parts of the C standard library
into C++ in [library.c]/1 and [cstring.syn], as modified by
[diff.library]. That includes memcpy(), which is required by C to copy
the object representation of the value copied into the destination,
which is also more or less what in C++ [basic.types]/1 to /4 say.
Padding isn't an issue in this case, but for it to work, the object
representations of uint64_t and a 2-array of uint32_t must also in any
other respects correspond to each other.

In C, the same is also true of the use of unions for type punning. So
in C, the answer to your question is no. The issue in C++ is about the
supposed requirement relating to there being only one active type.

On the issue of memcpy() versus a cast, the point is that with a cast
the effective type (C) and dynamic type (C++) of the result of the cast
remains that of the source. With memcpy() it is that of the
destination. So memcpy() does not give rise to strict aliasing issues,
whereas a cast does. Using reinterpret_cast in this case clearly
generates undefined behaviour.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Aug 25 12:41AM +0100

On Tue, 25 Aug 2020 00:34:36 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]
> [diff.library]. That includes memcpy(), which is required by C to copy
> the object representation of the value copied into the destination,
> which is also more or less what in C++ [basic.types]/1 to /4 say.

By the way, one interesting factlet is that you cannot implement
std::memcpy() using standard C++ conforming to C++17/20, because of
restrictions on pointer arithmetic in those standards. Its
implementation is therefore compiler-primitive specific, mandated by
the provisions I have referred to.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Tuesday, August 25, 2020

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

No comments:

Blog Archive

About Me