soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

I think references should have been const by default - 25 Updates

I think references should have been const by default

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 26 12:53AM +0100

On Mon, 25 Oct 2021 04:51:35 -0000 (UTC)

> "const means thread-safe" is not said in the context of const references,
> but in the context of const member functions, which is a completely
> different thing.

I think you are confused: the two go together. Where a const reference
references an object, the only member functions of the object that you
may call via that reference are const ones. const member functions are
not thread safe in the general case. If you are suggesting otherwise
you are wrong.

> (And, in this case, the idea is "const member functions *should be*
> re-entrant", rather than "const member functions are thread-safe".)

No, I was referring to misguided suggestions as to the latter.

> And when I said "const can make the program more efficient" I'm
> referring to compile-time literals.

That _is_ a completely different thing: compilers can certainly make
assumptions about literals.

Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:18AM

> characteristic of the environment (with or without the issuance of a
> diagnostic message), to terminating a translation or execution (with the
> issuance of a diagnostic message)."

No better example of "undefined behavior" causing a major problem than
that bug in the Linux kernel discovered some years ago, where the kernel
would deliberately dereference a null pointer (I don't remember anymore
for what reason), and gcc saw that it was a null pointer dereference,
which according to the C standard is undefined behavior, and since that
allows the compiler to do with it whatever it wants, it (if I remember
correctly) just optimized it away, causing the extraordinarily hard-to-find
bug in the kernel.

(Also, if I remember correctly, it caused quite a discussion about
whether compilers should actually be allowed to "do whatever they want"
with such code, or whether they should do as they are told.)

Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:29AM

> Any attempt to write to a read only program text area will result in a crash
> regardless of the language.

There's absolutely nothing requiring C (or C++) compilers to put string
literals in a read-only memory segment. They are free to put them in a
normal read/write memory segment if they so wish.

Nothing guarantees that the target architecture even *has* such a thing
as "read-only memory segments".

This means that your program may well work "correctly" in one target
architecture but not in another.

> C also provides the following initialisation which places the string
> (presumably) on the heap:

> char str[] = "hello world";

It cannot place it on the heap because that would just be a memory leak
(there would be nothing freeing it). It would allocate that array on
the stack, if it's inside a function (and if it's at the global scope,
whichever segment is dedicated to those).

And that string literal there, if it actually gets generated into the
final binary, will still be in read-only memory (if the architecture
supports such a thing). It's just that its contents are copied to the
array when the array is allocated on the stack.

(Btw, this is the reason why I say that C as "strings", rather than
strings. They are just char arrays, with a zero byte as an element
that by convention indicates the final character. This causes a
lot of confusion, especially since it induces many people to
think that a char* is a "string". Which it isn't. It's a pointer
to a value of type char. It *might* point to a null-terminated
char array, or it might not. It's not guaranteed that it's a
"string".)

Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:36AM

> I noticed you deftly bypassed the fact that 'const' for 'int' can be
> written either side of 'int', or both!

That's not really a problem in the right-to-left reading.

int const *ptr;

can be read as:

"ptr is a pointer to a (const int)."

and:

const int *ptr;

can be read as:

"ptr is a pointer to an int that's const (ie. a const int)."

David Brown <david.brown@hesbynett.no>: Oct 26 08:53AM +0200

On 26/10/2021 07:29, Juha Nieminen wrote:
> as "read-only memory segments".

> This means that your program may well work "correctly" in one target
> architecture but not in another.

There is also nothing to guarantee that attempting to write to read-only
memory will result in a "crash". It could result in nothing happening
at all (the write being ignored), or a hang, or a reset of the entire
system, or a write to somewhere different in memory. (I've worked with
systems with all four such behaviours to at least some extent.)

A particular /OS/ might guarantee that attempting to write to read-only
memory segments results in a particular handling of the process, but it
is certainly not guaranteed by C or C++.

And of course, the C compiler might not actually attempt to make the
write, but act as though it had. (I think that would be unlikely in
practice, but it could be done for strings local to a function.)

> (there would be nothing freeing it). It would allocate that array on
> the stack, if it's inside a function (and if it's at the global scope,
> whichever segment is dedicated to those).

(Hypothetically, it /could/ be allocated on the heap, or elsewhere, if
the compiler also generated code to free it appropriately. While almost
all C implementations use a stack for local data, there are a few
exceptions.)

David Brown <david.brown@hesbynett.no>: Oct 26 09:07AM +0200

On 26/10/2021 07:18, Juha Nieminen wrote:
> allows the compiler to do with it whatever it wants, it (if I remember
> correctly) just optimized it away, causing the extraordinarily hard-to-find
> bug in the kernel.

The compiler did not cause a bug in the kernel. There was a bug in the
source code - the programmer got the order of the code wrong, and
checked the pointer after using it. This was a simple mistake in the
code, and should have been spotted by the reviewer - it was an
embarrasing failure in the development chain of the kernel. (The review
and moderation process in the kernel development usually maintains very
high standards.)

The new optimisation in gcc did not /cause/ the bug, it merely changed
the /consequences/ of the bug. The optimisation was entirely valid.

It is, however, also reasonable for a project like an OS kernel to
accept that there is a risk of human error leading to bugs in the code,
and want to reduce the consequences that might result from such bugs.

But we can learn from our mistakes - the kernel gained the feature of
having a memory page at address zero mapped with no access, so that any
later attempt to dereference a null pointer would be caught. At that
point, -fdelete-null-pointer-checks can (and should) be re-enabled,
along with the warning "-Wnull-derefence" that was also added as a
consequence of this issue.

> (Also, if I remember correctly, it caused quite a discussion about
> whether compilers should actually be allowed to "do whatever they want"
> with such code, or whether they should do as they are told.)

The compiler /did/ do as it was told. It was not told to do what the
programmer wanted to tell it.

RacingRabbit@watershipdown.co.uk: Oct 26 08:18AM

On Mon, 25 Oct 2021 13:14:30 -0400

>> char str[] = "hello world";

>Such code cannot result in the string being placed in read-only memory,
>because it's perfectly legal to modify str. On the other hand, both of

Yes, that was my point. [] means modifyable, * means read only in every
C implementation I've ever used.

RacingRabbit@watershipdown.co.uk: Oct 26 08:21AM

On Mon, 25 Oct 2021 10:48:57 -0700

>> char str[] = "hello world";

>I suggest that you would benefit more here from asking questions than
>from making assertions.

I suggest you ease up on being patronising.

>That declaration does not place anything on the heap. The contents of
>str is placed on the stack if it appears within a function definition.
>or in the static data area if it appears outside a function definition.

Wherever its placed, the point is its modifyable unlike *str = which isn't.

>Others have addresses your errors regarding "const".

Not really. They're just trying to make a case for const being useful in C.
I've yet to see that.

RacingRabbit@watershipdown.co.uk: Oct 26 08:23AM

On Tue, 26 Oct 2021 05:29:31 -0000 (UTC)

>> char str[] = "hello world";

>It cannot place it on the heap because that would just be a memory leak
>(there would be nothing freeing it). It would allocate that array on

It wouldn't need to be free'd if it existed for the lifetime of the program.

>strings. They are just char arrays, with a zero byte as an element
>that by convention indicates the final character. This causes a

Wow, really? Who knew!

Juha Nieminen <nospam@thanks.invalid>: Oct 26 08:37AM

> Not really. They're just trying to make a case for const being useful in C.
> I've yet to see that.

It can catch errors where you accidentally try to modify the contents of,
for example, a string literal.

(This doesn't mean that you do like
char* str = "hello"; str[0] = 'H';
but it does mean that you might do like
doSomething("hello");
where that doSomething() actually modifies the data behind the pointer
it's given.)

It can also make code more efficient.

What more do you need?

Juha Nieminen <nospam@thanks.invalid>: Oct 26 08:40AM

>>strings. They are just char arrays, with a zero byte as an element
>>that by convention indicates the final character. This causes a

> Wow, really? Who knew!

A lot of beginner C programmers don't.

And some not-so-beginner C programmers either. (Well, they do tend to know
about the trailing-zero-byte thing, but otherwise they may have a
surprisingly poor grasp of what a "string" in C actually is, and may
even think that a char* is a "string" (which it most definitely is not).)

RacingRabbit@watershipdown.co.uk: Oct 26 09:02AM

On Tue, 26 Oct 2021 08:37:13 -0000 (UTC)
> doSomething("hello");
>where that doSomething() actually modifies the data behind the pointer
>it's given.)

fenris$ cat t.c
#include <stdio.h>

void func(char *str)
{
str[0] = 0;
}

int main()
{
char *str = "hello";
func(str);
puts("Worked");
return 0;
}
fenris$ cc t.c
fenris$ a.out
Bus error: 10
fenris$

Juha Nieminen <nospam@thanks.invalid>: Oct 26 11:14AM

> fenris$ a.out
> Bus error: 10

For starters, that's in no way guaranteed to happen. Learn standard C.

Secondly, if you think that a runtime diagnostic is as good as a compile-time
diagnostic, then you have still a LOT to learn about software development.
The earlier in the development process that a bug can be caught, the better.
This is basic software development 101.

The writing-to-a-string-literal might happen only in some cases, not always.
For example, it could depend on the particular contents of some of input
file, or a particular action by the user, a particular command line
parameter, or a myriad of other things that can vary from execution to
execution. In the worst case scenarios the error may happen sporadically
and without a clear pattern, which can make extraordinarily difficult to
debug. Counless hours could be spent in trying to find such an elusive
and obscure bug.

All of which could have been avoided if you just used 'const' and
turned on compiler warnings, and paid attention to them.

There's literally zero reason not to use 'const' for pointers that
are not intended to be used to modify the values they are pointing to.

Bart <bc@freeuk.com>: Oct 26 12:39PM +0100

>> because it's perfectly legal to modify str. On the other hand, both of

> Yes, that was my point. [] means modifyable, * means read only in every
> C implementation I've ever used.

You've misunderstood then.

But * and [] types are modifable:

char* s = "ABC";
puts(s);
*s = 'Z';

This shows ABC the first time it's executed. The second time it shows
ZBC; the code has changed the string literal! Where the same literal iS
shared across the program, it will change the value of "ABC" everywhere.

This is on those implementations that don't put ABC into readonly memory
(eg. tcc, bcc, DMC, lcc, msvc). Ones like gcc and clang will crash.

You can't compare that with this:

char t[] = "ABC";
puts(t)
t[0] = 'Z';

Here, "ABC" is left unmolested. But the reason is because the
initialisation /copies/ the literal string to the array. So it modifies
a copy. The declaration of s directly points it to the literal.

Ben Bacarisse <ben.usenet@bsb.me.uk>: Oct 26 02:29PM +0100

>> Yes, that was my point. [] means modifyable, * means read only in every
>> C implementation I've ever used.

> You've misunderstood then.

Yes, RR has misunderstood (or is expressing the point in a confusing
way).

> But * and [] types are modifable:

And this is bad wording. Some objects with pointer type are modifiable
and some are not. No objects with array types are modifiable. But in
fact you seem to be referring to the /target/ of pointer types (again,
some of which are modifiable and some are not) and to array /elements/
about which the same is also true. There is no general rule about "*
and [] types".

> char* s = "ABC";

This relies on an a conversion that is valid (bad unwise) in C and not
permitted in C++.

> puts(s);
> *s = 'Z';

This is undefined behaviour in both C and C++. The target of the
assignment (the first character of the string) is not a modifiable
object.

> This shows ABC the first time it's executed. The second time it shows
> ZBC; the code has changed the string literal!

It might show ABC again, or it may not get that far. Or, formally,
anything at all could happen.

> This is on those implementations that don't put ABC into readonly
> memory (eg. tcc, bcc, DMC, lcc, msvc). Ones like gcc and clang will
> crash.

It may vary depending on the command-line options, the platform and
compiler version. Talking about what "gcc" or "tcc" does is not very
helpful. Anyway, people should be encouraged to write, where possible,
code that does not depend on such things.

> initialisation /copies/ the literal string to the array. So it
> modifies a copy. The declaration of s directly points it to the
> literal.

Yes.

--
Ben.

Bart <bc@freeuk.com>: Oct 26 03:11PM +0100

On 26/10/2021 14:29, Ben Bacarisse wrote:
> way).

>> But * and [] types are modifable:

> And this is bad wording.

It's a modification of what RR said.

> Some objects with pointer type are modifiable
> and some are not. No objects with array types are modifiable.

I don't know what you mean by that. Unless it is that you can't directly
assign to a whole array object at once; only an element at a time. Or,
going the other way, when the array is a member of a struct and you
assign to the whole struct.

(I know that you can't make a whole array const, only the elements.)

>> char* s = "ABC";

> This relies on an a conversion that is valid (bad unwise) in C and not
> permitted in C++.

I tried it in C++ before posting (as I'd thought that "ABC" would have
type const char*) but it seemed to work. (Using -Wall -std=c++14.)

> compiler version. Talking about what "gcc" or "tcc" does is not very
> helpful. Anyway, people should be encouraged to write, where possible,
> code that does not depend on such things.

I'm writing about what is typically observed.

(I don't put string literals into a readonly segment because I haven't
got round to it yet.

It is surprising that a big compiler like MSVC doesn't do so either, but
apparently that's only done when optimising; rather odd.)

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:31AM -0400

> On Mon, 25 Oct 2021 13:14:30 -0400
> James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>> On 10/25/21 12:14 PM, RacingRabbit@watershipdown.co.uk wrote:
...
>> because it's perfectly legal to modify str. On the other hand, both of

> Yes, that was my point. [] means modifyable, * means read only in every
> C implementation I've ever used.

Incorrect. In most declarations, [] means array, and * means pointer.
Neither one means "read only".
I think you may be thinking of a different fact that has nothing to do
with read-only memory. Within the scope of an identifier that identifies
an array, that identifier can only ever identify that particular array.
An identifier that identifies a pointer to an object type need not point
at any actual object, and unless it itself is declared const, can be
changed to point at a different object. But that difference between
arrays and pointers has nothing to do with read-only memory. The address
of a named array is not necessarily stored in any pointer - it is
normally hard-coded into the machine language instructions that refer to
the array, so the fact that you can't change that address is not because
the address is stored in read-only memory.

Exception 1: it's not permitted to declare functions that take arrays as
arguments, but it is permitted to declare a function parameter as if it
were an array. Such a declaration is automatically converting into a
declaration of a pointer to the element type of an array. Thus, the
following two function declarations are functionally identical, despite
being syntactically different:

void func(int array[]);
void func(int *ptr);

Exception 2: in a function parameter declaration, the construct [*]
marks the corresponding dimension of the relevant array as having a
variably modified type with an unknown length for that dimension. This
feature cannot be used in the defining declaration for a function,
because the function definition requires that the variable length be
explicitly specified. It is still an array, and not in any sense a
pointer (unless the relevant dimension is the top-most one, in which
case exception 1 described above also applies).

Any attempt to modify the contents of a string literal is undefined. Any
attempt to modify an object whose definition is const-qualified is also
undefined. Those facts permit, but do not require, that those objects be
stored in read-only memory.

RacingRabbit@watershipdown.co.uk: Oct 26 02:35PM

On Tue, 26 Oct 2021 11:14:58 -0000 (UTC)
>> fenris$ a.out
>> Bus error: 10

>For starters, that's in no way guaranteed to happen. Learn standard C.

It is on *nix and thats good enough for me.

>Secondly, if you think that a runtime diagnostic is as good as a compile-time
>diagnostic, then you have still a LOT to learn about software development.

All I'm saying is the bug would exhibit itself pretty quickly.

>All of which could have been avoided if you just used 'const' and
>turned on compiler warnings, and paid attention to them.

I always have warnings on so const not required.

RacingRabbit@watershipdown.co.uk: Oct 26 02:36PM

On Tue, 26 Oct 2021 12:39:54 +0100
> *s = 'Z';

>This shows ABC the first time it's executed. The second time it shows
>ZBC; the code has changed the string literal! Where the same literal iS

I suggest you actually try running that code and see what happens.

RacingRabbit@watershipdown.co.uk: Oct 26 02:42PM

On Tue, 26 Oct 2021 10:31:41 -0400
>Neither one means "read only".
>I think you may be thinking of a different fact that has nothing to do
>with read-only memory. Within the scope of an identifier that identifies

No I'm not. The pointer will be pointing to a string literal in the program
static text area which is usually non modifiable.

>Exception 1: it's not permitted to declare functions that take arrays as
>arguments,

Since when?

fenris$ cat t.c
#include <stdio.h>

void func(int a[2][3])
{
printf("%d\n",a[1][2]);
}

int main()
{
int a[2][3];
a[1][2] = 123;
func(a);
return 0;
}
fenris$ cc t.c; a.out
123

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:42AM -0400

>> I suggest that you would benefit more here from asking questions than
>>from making assertions.

> I suggest you ease up on being patronising.

You'll get less patronizing responses when you cease displaying such an
abysmal understanding of C, while believing you understand it better
than others.

>> str is placed on the stack if it appears within a function definition.
>> or in the static data area if it appears outside a function definition.

> Wherever its placed, the point is its modifyable unlike *str = which isn't.

Your right, but for the wrong reasons. It's true that *str isn't
modifiable, but that's not just because of the "*", it's because str has
been initialized to point at the first character of a string literal. It
could equally easily have been initialized to point at modifiable
memory. Nothing about the str itself makes it read-only.

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:43AM -0400

On 10/26/21 5:02 AM, RacingRabbit@watershipdown.co.uk wrote:
...
> }

> int main()
> {

Try changing the following line:
> char *str = "hello";

to
char greeting[] = "hello";
char *str = greeting;

> puts("Worked");
> return 0;
> }

You shouldn't get a bus error this time. Do you understand why?

RacingRabbit@watershipdown.co.uk: Oct 26 02:48PM

On Tue, 26 Oct 2021 10:42:24 -0400

>You'll get less patronizing responses when you cease displaying such an
>abysmal understanding of C, while believing you understand it better
>than others.

Says the preening fool.

>Your right, but for the wrong reasons. It's true that *str isn't
>modifiable, but that's not just because of the "*", it's because str has
>been initialized to point at the first character of a string literal. It

So you disagree with what I said then say exactly the same thing yourself.

Ok, well thanks for that. Helpful.

Bart <bc@freeuk.com>: Oct 26 03:55PM +0100

>> This shows ABC the first time it's executed. The second time it shows
>> ZBC; the code has changed the string literal! Where the same literal iS

> I suggest you actually try running that code and see what happens.

What makes you think I didn't?

I actually listed the 7 compilers I tried it on, just at the point where
you must have stopped reading.

Oh, you mean you only tried it on one implementation?

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 11:03AM -0400

On 10/26/21 1:18 AM, Juha Nieminen wrote:

> (Also, if I remember correctly, it caused quite a discussion about
> whether compilers should actually be allowed to "do whatever they want"
> with such code, or whether they should do as they are told.)

That bug serves to illustrate a very important point about undefined
behavior. UB only refers to behavior that is not defined by the C
standard; the behavior might in fact be defined by some other document.
If such a document has authority over every place that your code needs
to work, there's absolutely nothing wrong with writing code with
undefined behavior that relies upon the definition of that behavior
provided by that document. But when you write such code, it's absolutely
essential that you know what the relevant definition is.

The developers in question were absolutely certain that they knew what
the defined behavior was for the platform that they were using: the
hardware had an instruction that could be used to load the data from a
specified memory location, and nothing problematic would occur if that
instruction was passed an address of 0.

There were two key mistakes in this thinking:
1. It's not the hardware that defines the behavior, it's the
implementation of C.

2. Popular misconceptions to the contrary notwithstanding, C is NOT a
portable assembler. C code does not instruct the compiler to generate a
particular set of machine code instructions. It only tells the
implementation what the desired behavior of the program is. An
implementation has no obligation to produce any specific set of machine
code instructions to achieve that goal. The only requirement is that the
observable behavior of the program (a term defined in 5.1.2.3p2, and
that definition is significantly more complicated than "behavior which
can be observed") must meet the requirements of the rest of the standard.

When the behavior is undefined, the standard doesn't impose any
requirements. In this particular case, the implementation provided it's
own definition of the behavior, and it was significantly more complex
than merely the single machine instruction that they expected to be
generated. Specifically, the defined behavior was to optimize all code
between the last time the pointer was updated, until the next time it
was updated, on the assumption that the pointer's value was not null.
Such an optimization is normally a good idea - if they had taken proper
care to prevent the pointer from having a null value, the code that was
removed would have been dead code - removing it sped up the program.

But they didn't take proper care to make sure the pointer was not null.
They dealt with that possibility only after dereferencing it, and as a
result, the code that they wrote to deal with that possibility got
optimized away.

The most ironic aspect of this problem was that the optimization that
revealed the bug in their code was not on by default. It was turned on
because they had explicitly requested it. Your obligation to know how an
implementation defines behavior that is undefined by the standard is
even higher when choosing non-default optimizations.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Tuesday, October 26, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

No comments:

Blog Archive

About Me