Monday, January 23, 2023

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 23 06:15AM -0800

On Monday, 23 January 2023 at 13:42:53 UTC, David Brown wrote:
> primitive than) that found in many other programming languages. But
> that does not mean C does not have strings, defined in the standards and
> as part of the language and standard library.
 
The language states that a text literal in double quotes produces a nul-terminated
string. I think that's the only place the C language itself defines a string. Otherwise
it is purely a standard library concept.
scott@slp53.sl.home (Scott Lurndal): Jan 23 02:44PM

>it is not a "snag" or a "problem". It is an unavoidable artefact of the
>simple way C strings are implemented. And it is easily solved by
>calling "strlen".
 
Easily solved, but performance for large strings is poor.
scott@slp53.sl.home (Scott Lurndal): Jan 23 02:46PM

>> return hw + 5;
 
>You're returning a newly created string object, thereby inducing
>the issue a second allocation
 
No, he's simply returning the final characters of the C-style
sequence of 'char' entities. About as efficient as possible for
the stated example.
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 23 07:34AM -0800

On Monday, 23 January 2023 at 14:45:05 UTC, Scott Lurndal wrote:
> >simple way C strings are implemented. And it is easily solved by
> >calling "strlen".
> Easily solved, but performance for large strings is poor.
 
Yes. Normally when you assign a string, you don't need to keep the old
copy hanging about. So using std::strings and move assignment will
allow the assignment to be implemented in a few machine instructions.
(You can probably do this in C by reusing the buffer, but the code has to
be written vary carefully to ensure that the pointers are pointing to the right
type of memory).
Bonita Montero <Bonita.Montero@gmail.com>: Jan 23 04:48PM +0100

Am 23.01.2023 um 15:46 schrieb Scott Lurndal:
 
> No, he's simply returning the final characters of the C-style
> sequence of 'char' entities. About as efficient as possible for
> the stated example.
 
Maybe, but maybe he suggested that
as the body of my function-definition.
Paavo Helde <eesnimi@osa.pri.ee>: Jan 23 05:56PM +0200


> Another reason to use char* is that a lot of parsers will memory map a file
> R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of the
> file and which you can manipulate as you see fit.
 
Seems like a non-portable hack.
Muttley@dastardlyhq.com: Jan 23 04:09PM

On Mon, 23 Jan 2023 16:48:13 +0100
>> the stated example.
 
>Maybe, but maybe he suggested that
>as the body of my function-definition.
 
I was simply pointing out that for simple operations such as that char* is
usually more efficient than std::string.
Muttley@dastardlyhq.com: Jan 23 04:11PM

On Mon, 23 Jan 2023 17:56:15 +0200
>> R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of the
>> file and which you can manipulate as you see fit.
 
>Seems like a non-portable hack.
 
Portable where? Its been standard posix functionality for decades and is
anything but a hack. The whole point of a private map is so you can manipulate
the file contents in memory without having to labouriously read it all in
first and without changing the file itself. Its extremely useful.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 23 05:13PM +0100

>> as the body of my function-definition.
 
> I was simply pointing out that for simple operations such as that char* is
> usually more efficient than std::string.
 
You put that in a context where we discussed further allocations
if you extract a substring, and I took that into that context.
With your further explanations your code fits even less.
Muttley@dastardlyhq.com: Jan 23 04:20PM

On Mon, 23 Jan 2023 13:53:55 +0100
>> :
>> return hw + 5;
 
>You're returning a newly created string object, thereby inducing
 
This is C, not C++. There are no objects. Its returning a memory address
which in this case will probably be pointing to part of the program text area.
 
>blem. And you're returning the exclamation mark I also stripped.
 
hw[10] = '\0';
 
Sorted.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Jan 23 08:22AM -0800

On Monday, January 23, 2023 at 9:15:58 AM UTC-5, Malcolm McLean wrote:
 
> The language states that a text literal in double quotes produces a nul-terminated
> string. I think that's the only place the C language itself defines a string. Otherwise
> it is purely a standard library concept.
 
The language doesn't state any such thing. The standard does, but there is no separate standard for the C language. The C standard describes both the C language and the C standard library. The part that describes the language defines the syntax for a string literal (NOT a text literal), and defines the corresponding semantics, which often (but not always) create a null-terminated string. The part that describes the C standard library starts, as it's very first sentence, with a definition of a C string: "A string is a contiguous sequence of characters terminated by and including the first null character." (7.1.1p1). In that sentence, the term "string" is italicized, an ISO convention indicating that the sentence in which that italicized term appears constitutes the official definition of that term.
This makes sense, because nothing in the language itself depends upon strings; they matter only because various functions in the C standard library take pointers to strings as arguments, or give such pointers as the return value of the function.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 23 05:25PM +0100

>>> return hw + 5;
 
>> You're returning a newly created string object, thereby inducing
 
> This is C, not C++. ...
 
In a C++-newsgroup and we discussed the topic of copy-allocations ...
Muttley@dastardlyhq.com: Jan 23 04:37PM

On Mon, 23 Jan 2023 17:25:49 +0100
 
>>> You're returning a newly created string object, thereby inducing
 
>> This is C, not C++. ...
 
>In a C++-newsgroup and we discussed the topic of copy-allocations ...
 
When comparing C vs C++ a discussion of C is entirely appropriate.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 23 05:40PM +0100


>>> This is C, not C++. ...
 
>> In a C++-newsgroup and we discussed the topic of copy-allocations ...
 
> When comparing C vs C++ a discussion of C is entirely appropriate.
 
Yes, somewhere else in this thread.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Jan 23 08:40AM -0800

> >> return hw + 5;
 
> >You're returning a newly created string object, thereby inducing
> This is C, not C++. ...
 
Actually, this IS C++. While this discussion has been about C, it's actually taking place on comp.lang.c++.
 
> ... There are no objects. ...
 
As C defines the term "object", both hw and the array that "hello world!" points at are objects. However, you are correct in saying that hw+5 would not be an object in C.
 
> >blem. And you're returning the exclamation mark I also stripped.
> hw[10] = '\0';
 
hw points at the array that was created because of the existence of the string literal "hello world!".
 
"If the program attempts to modify such an array, the behavior is undefined." (C standard, 6.4.5p7)
Paavo Helde <eesnimi@osa.pri.ee>: Jan 23 06:58PM +0200

>>> file and which you can manipulate as you see fit.
 
>> Seems like a non-portable hack.
 
> Portable where?
 
This is a C++ group.
 
 
> The whole point of a private map is so you can manipulate
> the file contents in memory without having to labouriously read it all in
> first
 
This is achieved by a read-only memory map. On which a string_view would
work fine, coincidentally.
 
> and without changing the file itself. Its extremely useful.
 
Why should I want to manipulate the file contents when parsing it? Ah, I
know the answer, it comes from the camp who thinks copying a virtual
main memory page will be faster than passing some extra register
variables for keeping better track about the parsing process. Maybe 40
years ago on some hardware it had a point.
Muttley@dastardlyhq.com: Jan 23 05:00PM

On Mon, 23 Jan 2023 17:40:30 +0100
 
>>> In a C++-newsgroup and we discussed the topic of copy-allocations ...
 
>> When comparing C vs C++ a discussion of C is entirely appropriate.
 
>Yes, somewhere else in this thread.
 
Oh ok, now you're subdividing threads are you? You're the one who suggested
that using erase() and substr() was somehow just as efficient as returning
a pointer.
David Brown <david.brown@hesbynett.no>: Jan 23 06:02PM +0100

On 23/01/2023 15:15, Malcolm McLean wrote:
 
> The language states that a text literal in double quotes produces a nul-terminated
> string. I think that's the only place the C language itself defines a string. Otherwise
> it is purely a standard library concept.
 
You are jumbling several things a bit. I would recommend you open a
copy of the C standards (I don't think anything here has changed since
at least C99) and have a look.
 
You'll find there is /one/ C standard document (in different versions) -
the standard library is considered an integral part of the language.
Very occasionally it is useful to distinguish a "core C language" (that
is not a term from the standard) from things defined in the standard
library - this is not such an occasion.
 
A sequence of characters inside double quotation marks is a "string
literal". The section describing these lexical elements, 6.4.5,
describes the array and character sequence generated. The /definition/
of the term "string" is found in chapter 7, describing the library, not
in the section defining the term "string literal". The terms "string"
and "pointer to string" are mentioned in a number of places throughout
the document, not just in the library or in connection with string literals.
 
It's fair to say that there is little that you can do with a string in C
that does not involve library calls - basically, you can take a pointer
to a string and use it as a pointer to a character, and you have
initialisation from string literals. String handling is done using
library functions. But that does not in any way mean strings are not
defined in the C language, or not part of the C language.
Muttley@dastardlyhq.com: Jan 23 05:02PM

On Mon, 23 Jan 2023 08:40:36 -0800 (PST)
>literal "hello world!".
 
>"If the program attempts to modify such an array, the behavior is undefined."
>(C standard, 6.4.5p7)
 
That's true, my mistake. So we change the char* to
 
char hw[] = "hello world!";
David Brown <david.brown@hesbynett.no>: Jan 23 06:05PM +0100

On 23/01/2023 15:44, Scott Lurndal wrote:
>> simple way C strings are implemented. And it is easily solved by
>> calling "strlen".
 
> Easily solved, but performance for large strings is poor.
 
Sure.
 
There is no "perfect" way to implement a way of holding (general)
strings in a language. There are lots of ways to do it, but they all
have their disadvantages as well as advantages. If you want to work
efficiently with large strings, standard C strings is a poor choice of
format.
Muttley@dastardlyhq.com: Jan 23 05:18PM

On Mon, 23 Jan 2023 18:58:58 +0200
 
>>> Seems like a non-portable hack.
 
>> Portable where?
 
>This is a C++ group.
 
And? You are allowed to use standard OS C APIs in C++ in case you were
unaware.
 
>> first
 
>This is achieved by a read-only memory map. On which a string_view would
>work fine, coincidentally.
 
Maybe it would. But since you get char* out the box why bother?
 
>main memory page will be faster than passing some extra register
>variables for keeping better track about the parsing process. Maybe 40
>years ago on some hardware it had a point.
 
Putting a \0 at the end of some text you want to pass to a sub function is
standard practice. But the main use is having the entire file (from the programs
perspective) available as text in memory without having to some or all of it
yourself manually or do file seeking with fstream or some kind of file pointer
which is far less efficient and still involves paging into memory anyway.
 
Shall we assume you've never heard of mmap() because you know nothing about
unix systems programming and got confused?
scott@slp53.sl.home (Scott Lurndal): Jan 23 05:30PM


>>> You're returning a newly created string object, thereby inducing
 
>> This is C, not C++. ...
 
>In a C++-newsgroup and we discussed the topic of copy-allocations ...
 
What Muttley posted is perfectly legal C++ code.
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 23 09:37AM -0800

On Monday, 23 January 2023 at 17:05:27 UTC, David Brown wrote:
> have their disadvantages as well as advantages. If you want to work
> efficiently with large strings, standard C strings is a poor choice of
> format.
 
It depends how large.
If strings are very large then flat memory buffers are often the best way to go.
That prevents inadvertent copies.
If string are short, then it usually doesn't matter from a performance perspective,
so it's whether C strings or another format is easier to integrate with the exisiting
code.
If strings are medium length them then yes, C format is probably a poor choice,
because the strings are not so long that copying is prohibitively expensive,
but not so short than inefficient copying hardly matters. A format that allows
for efficient assignment, but is also easy to use, is likely better.
scott@slp53.sl.home (Scott Lurndal): Jan 23 05:37PM

>have their disadvantages as well as advantages. If you want to work
>efficiently with large strings, standard C strings is a poor choice of
>format.
 
Yes. It's really application dependent.
 
For example, when parsing a sequence of bytes, one may not actually
care about the length and rather just start parsing at the first byte
and stop when the nul-byte (or a parse error) is encountered.
 
One pass through the string, instead of a pass every time strlen() is called.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 23 06:39PM +0100

Am 23.01.2023 um 18:30 schrieb Scott Lurndal:
 
>>> This is C, not C++. ...
 
>> In a C++-newsgroup and we discussed the topic of copy-allocations ...
 
> What Muttley posted is perfectly legal C++ code.
 
Yes, but not in that context.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: