- I think references should have been const by default - 25 Updates
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 26 12:53AM +0100 On Mon, 25 Oct 2021 04:51:35 -0000 (UTC) > "const means thread-safe" is not said in the context of const references, > but in the context of const member functions, which is a completely > different thing. I think you are confused: the two go together. Where a const reference references an object, the only member functions of the object that you may call via that reference are const ones. const member functions are not thread safe in the general case. If you are suggesting otherwise you are wrong. > (And, in this case, the idea is "const member functions *should be* > re-entrant", rather than "const member functions are thread-safe".) No, I was referring to misguided suggestions as to the latter. > And when I said "const can make the program more efficient" I'm > referring to compile-time literals. That _is_ a completely different thing: compilers can certainly make assumptions about literals. |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:18AM > characteristic of the environment (with or without the issuance of a > diagnostic message), to terminating a translation or execution (with the > issuance of a diagnostic message)." No better example of "undefined behavior" causing a major problem than that bug in the Linux kernel discovered some years ago, where the kernel would deliberately dereference a null pointer (I don't remember anymore for what reason), and gcc saw that it was a null pointer dereference, which according to the C standard is undefined behavior, and since that allows the compiler to do with it whatever it wants, it (if I remember correctly) just optimized it away, causing the extraordinarily hard-to-find bug in the kernel. (Also, if I remember correctly, it caused quite a discussion about whether compilers should actually be allowed to "do whatever they want" with such code, or whether they should do as they are told.) |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:29AM > Any attempt to write to a read only program text area will result in a crash > regardless of the language. There's absolutely nothing requiring C (or C++) compilers to put string literals in a read-only memory segment. They are free to put them in a normal read/write memory segment if they so wish. Nothing guarantees that the target architecture even *has* such a thing as "read-only memory segments". This means that your program may well work "correctly" in one target architecture but not in another. > C also provides the following initialisation which places the string > (presumably) on the heap: > char str[] = "hello world"; It cannot place it on the heap because that would just be a memory leak (there would be nothing freeing it). It would allocate that array on the stack, if it's inside a function (and if it's at the global scope, whichever segment is dedicated to those). And that string literal there, if it actually gets generated into the final binary, will still be in read-only memory (if the architecture supports such a thing). It's just that its contents are copied to the array when the array is allocated on the stack. (Btw, this is the reason why I say that C as "strings", rather than strings. They are just char arrays, with a zero byte as an element that by convention indicates the final character. This causes a lot of confusion, especially since it induces many people to think that a char* is a "string". Which it isn't. It's a pointer to a value of type char. It *might* point to a null-terminated char array, or it might not. It's not guaranteed that it's a "string".) |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 05:36AM > I noticed you deftly bypassed the fact that 'const' for 'int' can be > written either side of 'int', or both! That's not really a problem in the right-to-left reading. int const *ptr; can be read as: "ptr is a pointer to a (const int)." and: const int *ptr; can be read as: "ptr is a pointer to an int that's const (ie. a const int)." |
| David Brown <david.brown@hesbynett.no>: Oct 26 08:53AM +0200 On 26/10/2021 07:29, Juha Nieminen wrote: > as "read-only memory segments". > This means that your program may well work "correctly" in one target > architecture but not in another. There is also nothing to guarantee that attempting to write to read-only memory will result in a "crash". It could result in nothing happening at all (the write being ignored), or a hang, or a reset of the entire system, or a write to somewhere different in memory. (I've worked with systems with all four such behaviours to at least some extent.) A particular /OS/ might guarantee that attempting to write to read-only memory segments results in a particular handling of the process, but it is certainly not guaranteed by C or C++. And of course, the C compiler might not actually attempt to make the write, but act as though it had. (I think that would be unlikely in practice, but it could be done for strings local to a function.) > (there would be nothing freeing it). It would allocate that array on > the stack, if it's inside a function (and if it's at the global scope, > whichever segment is dedicated to those). (Hypothetically, it /could/ be allocated on the heap, or elsewhere, if the compiler also generated code to free it appropriately. While almost all C implementations use a stack for local data, there are a few exceptions.) |
| David Brown <david.brown@hesbynett.no>: Oct 26 09:07AM +0200 On 26/10/2021 07:18, Juha Nieminen wrote: > allows the compiler to do with it whatever it wants, it (if I remember > correctly) just optimized it away, causing the extraordinarily hard-to-find > bug in the kernel. The compiler did not cause a bug in the kernel. There was a bug in the source code - the programmer got the order of the code wrong, and checked the pointer after using it. This was a simple mistake in the code, and should have been spotted by the reviewer - it was an embarrasing failure in the development chain of the kernel. (The review and moderation process in the kernel development usually maintains very high standards.) The new optimisation in gcc did not /cause/ the bug, it merely changed the /consequences/ of the bug. The optimisation was entirely valid. It is, however, also reasonable for a project like an OS kernel to accept that there is a risk of human error leading to bugs in the code, and want to reduce the consequences that might result from such bugs. But we can learn from our mistakes - the kernel gained the feature of having a memory page at address zero mapped with no access, so that any later attempt to dereference a null pointer would be caught. At that point, -fdelete-null-pointer-checks can (and should) be re-enabled, along with the warning "-Wnull-derefence" that was also added as a consequence of this issue. > (Also, if I remember correctly, it caused quite a discussion about > whether compilers should actually be allowed to "do whatever they want" > with such code, or whether they should do as they are told.) The compiler /did/ do as it was told. It was not told to do what the programmer wanted to tell it. |
| RacingRabbit@watershipdown.co.uk: Oct 26 08:18AM On Mon, 25 Oct 2021 13:14:30 -0400 >> char str[] = "hello world"; >Such code cannot result in the string being placed in read-only memory, >because it's perfectly legal to modify str. On the other hand, both of Yes, that was my point. [] means modifyable, * means read only in every C implementation I've ever used. |
| RacingRabbit@watershipdown.co.uk: Oct 26 08:21AM On Mon, 25 Oct 2021 10:48:57 -0700 >> char str[] = "hello world"; >I suggest that you would benefit more here from asking questions than >from making assertions. I suggest you ease up on being patronising. >That declaration does not place anything on the heap. The contents of >str is placed on the stack if it appears within a function definition. >or in the static data area if it appears outside a function definition. Wherever its placed, the point is its modifyable unlike *str = which isn't. >Others have addresses your errors regarding "const". Not really. They're just trying to make a case for const being useful in C. I've yet to see that. |
| RacingRabbit@watershipdown.co.uk: Oct 26 08:23AM On Tue, 26 Oct 2021 05:29:31 -0000 (UTC) >> char str[] = "hello world"; >It cannot place it on the heap because that would just be a memory leak >(there would be nothing freeing it). It would allocate that array on It wouldn't need to be free'd if it existed for the lifetime of the program. >strings. They are just char arrays, with a zero byte as an element >that by convention indicates the final character. This causes a Wow, really? Who knew! |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 08:37AM > Not really. They're just trying to make a case for const being useful in C. > I've yet to see that. It can catch errors where you accidentally try to modify the contents of, for example, a string literal. (This doesn't mean that you do like char* str = "hello"; str[0] = 'H'; but it does mean that you might do like doSomething("hello"); where that doSomething() actually modifies the data behind the pointer it's given.) It can also make code more efficient. What more do you need? |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 08:40AM >>strings. They are just char arrays, with a zero byte as an element >>that by convention indicates the final character. This causes a > Wow, really? Who knew! A lot of beginner C programmers don't. And some not-so-beginner C programmers either. (Well, they do tend to know about the trailing-zero-byte thing, but otherwise they may have a surprisingly poor grasp of what a "string" in C actually is, and may even think that a char* is a "string" (which it most definitely is not).) |
| RacingRabbit@watershipdown.co.uk: Oct 26 09:02AM On Tue, 26 Oct 2021 08:37:13 -0000 (UTC) > doSomething("hello"); >where that doSomething() actually modifies the data behind the pointer >it's given.) fenris$ cat t.c #include <stdio.h> void func(char *str) { str[0] = 0; } int main() { char *str = "hello"; func(str); puts("Worked"); return 0; } fenris$ cc t.c fenris$ a.out Bus error: 10 fenris$ |
| Juha Nieminen <nospam@thanks.invalid>: Oct 26 11:14AM > fenris$ a.out > Bus error: 10 For starters, that's in no way guaranteed to happen. Learn standard C. Secondly, if you think that a runtime diagnostic is as good as a compile-time diagnostic, then you have still a LOT to learn about software development. The earlier in the development process that a bug can be caught, the better. This is basic software development 101. The writing-to-a-string-literal might happen only in some cases, not always. For example, it could depend on the particular contents of some of input file, or a particular action by the user, a particular command line parameter, or a myriad of other things that can vary from execution to execution. In the worst case scenarios the error may happen sporadically and without a clear pattern, which can make extraordinarily difficult to debug. Counless hours could be spent in trying to find such an elusive and obscure bug. All of which could have been avoided if you just used 'const' and turned on compiler warnings, and paid attention to them. There's literally zero reason not to use 'const' for pointers that are not intended to be used to modify the values they are pointing to. |
| Bart <bc@freeuk.com>: Oct 26 12:39PM +0100 >> because it's perfectly legal to modify str. On the other hand, both of > Yes, that was my point. [] means modifyable, * means read only in every > C implementation I've ever used. You've misunderstood then. But * and [] types are modifable: char* s = "ABC"; puts(s); *s = 'Z'; This shows ABC the first time it's executed. The second time it shows ZBC; the code has changed the string literal! Where the same literal iS shared across the program, it will change the value of "ABC" everywhere. This is on those implementations that don't put ABC into readonly memory (eg. tcc, bcc, DMC, lcc, msvc). Ones like gcc and clang will crash. You can't compare that with this: char t[] = "ABC"; puts(t) t[0] = 'Z'; Here, "ABC" is left unmolested. But the reason is because the initialisation /copies/ the literal string to the array. So it modifies a copy. The declaration of s directly points it to the literal. |
| Ben Bacarisse <ben.usenet@bsb.me.uk>: Oct 26 02:29PM +0100 >> Yes, that was my point. [] means modifyable, * means read only in every >> C implementation I've ever used. > You've misunderstood then. Yes, RR has misunderstood (or is expressing the point in a confusing way). > But * and [] types are modifable: And this is bad wording. Some objects with pointer type are modifiable and some are not. No objects with array types are modifiable. But in fact you seem to be referring to the /target/ of pointer types (again, some of which are modifiable and some are not) and to array /elements/ about which the same is also true. There is no general rule about "* and [] types". > char* s = "ABC"; This relies on an a conversion that is valid (bad unwise) in C and not permitted in C++. > puts(s); > *s = 'Z'; This is undefined behaviour in both C and C++. The target of the assignment (the first character of the string) is not a modifiable object. > This shows ABC the first time it's executed. The second time it shows > ZBC; the code has changed the string literal! It might show ABC again, or it may not get that far. Or, formally, anything at all could happen. > This is on those implementations that don't put ABC into readonly > memory (eg. tcc, bcc, DMC, lcc, msvc). Ones like gcc and clang will > crash. It may vary depending on the command-line options, the platform and compiler version. Talking about what "gcc" or "tcc" does is not very helpful. Anyway, people should be encouraged to write, where possible, code that does not depend on such things. > initialisation /copies/ the literal string to the array. So it > modifies a copy. The declaration of s directly points it to the > literal. Yes. -- Ben. |
| Bart <bc@freeuk.com>: Oct 26 03:11PM +0100 On 26/10/2021 14:29, Ben Bacarisse wrote: > way). >> But * and [] types are modifable: > And this is bad wording. It's a modification of what RR said. > Some objects with pointer type are modifiable > and some are not. No objects with array types are modifiable. I don't know what you mean by that. Unless it is that you can't directly assign to a whole array object at once; only an element at a time. Or, going the other way, when the array is a member of a struct and you assign to the whole struct. (I know that you can't make a whole array const, only the elements.) >> char* s = "ABC"; > This relies on an a conversion that is valid (bad unwise) in C and not > permitted in C++. I tried it in C++ before posting (as I'd thought that "ABC" would have type const char*) but it seemed to work. (Using -Wall -std=c++14.) > compiler version. Talking about what "gcc" or "tcc" does is not very > helpful. Anyway, people should be encouraged to write, where possible, > code that does not depend on such things. I'm writing about what is typically observed. (I don't put string literals into a readonly segment because I haven't got round to it yet. It is surprising that a big compiler like MSVC doesn't do so either, but apparently that's only done when optimising; rather odd.) |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:31AM -0400 > On Mon, 25 Oct 2021 13:14:30 -0400 > James Kuyper <jameskuyper@alumni.caltech.edu> wrote: >> On 10/25/21 12:14 PM, RacingRabbit@watershipdown.co.uk wrote: ... >> because it's perfectly legal to modify str. On the other hand, both of > Yes, that was my point. [] means modifyable, * means read only in every > C implementation I've ever used. Incorrect. In most declarations, [] means array, and * means pointer. Neither one means "read only". I think you may be thinking of a different fact that has nothing to do with read-only memory. Within the scope of an identifier that identifies an array, that identifier can only ever identify that particular array. An identifier that identifies a pointer to an object type need not point at any actual object, and unless it itself is declared const, can be changed to point at a different object. But that difference between arrays and pointers has nothing to do with read-only memory. The address of a named array is not necessarily stored in any pointer - it is normally hard-coded into the machine language instructions that refer to the array, so the fact that you can't change that address is not because the address is stored in read-only memory. Exception 1: it's not permitted to declare functions that take arrays as arguments, but it is permitted to declare a function parameter as if it were an array. Such a declaration is automatically converting into a declaration of a pointer to the element type of an array. Thus, the following two function declarations are functionally identical, despite being syntactically different: void func(int array[]); void func(int *ptr); Exception 2: in a function parameter declaration, the construct [*] marks the corresponding dimension of the relevant array as having a variably modified type with an unknown length for that dimension. This feature cannot be used in the defining declaration for a function, because the function definition requires that the variable length be explicitly specified. It is still an array, and not in any sense a pointer (unless the relevant dimension is the top-most one, in which case exception 1 described above also applies). Any attempt to modify the contents of a string literal is undefined. Any attempt to modify an object whose definition is const-qualified is also undefined. Those facts permit, but do not require, that those objects be stored in read-only memory. |
| RacingRabbit@watershipdown.co.uk: Oct 26 02:35PM On Tue, 26 Oct 2021 11:14:58 -0000 (UTC) >> fenris$ a.out >> Bus error: 10 >For starters, that's in no way guaranteed to happen. Learn standard C. It is on *nix and thats good enough for me. >Secondly, if you think that a runtime diagnostic is as good as a compile-time >diagnostic, then you have still a LOT to learn about software development. All I'm saying is the bug would exhibit itself pretty quickly. >All of which could have been avoided if you just used 'const' and >turned on compiler warnings, and paid attention to them. I always have warnings on so const not required. |
| RacingRabbit@watershipdown.co.uk: Oct 26 02:36PM On Tue, 26 Oct 2021 12:39:54 +0100 > *s = 'Z'; >This shows ABC the first time it's executed. The second time it shows >ZBC; the code has changed the string literal! Where the same literal iS I suggest you actually try running that code and see what happens. |
| RacingRabbit@watershipdown.co.uk: Oct 26 02:42PM On Tue, 26 Oct 2021 10:31:41 -0400 >Neither one means "read only". >I think you may be thinking of a different fact that has nothing to do >with read-only memory. Within the scope of an identifier that identifies No I'm not. The pointer will be pointing to a string literal in the program static text area which is usually non modifiable. >Exception 1: it's not permitted to declare functions that take arrays as >arguments, Since when? fenris$ cat t.c #include <stdio.h> void func(int a[2][3]) { printf("%d\n",a[1][2]); } int main() { int a[2][3]; a[1][2] = 123; func(a); return 0; } fenris$ cc t.c; a.out 123 |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:42AM -0400 >> I suggest that you would benefit more here from asking questions than >>from making assertions. > I suggest you ease up on being patronising. You'll get less patronizing responses when you cease displaying such an abysmal understanding of C, while believing you understand it better than others. >> str is placed on the stack if it appears within a function definition. >> or in the static data area if it appears outside a function definition. > Wherever its placed, the point is its modifyable unlike *str = which isn't. Your right, but for the wrong reasons. It's true that *str isn't modifiable, but that's not just because of the "*", it's because str has been initialized to point at the first character of a string literal. It could equally easily have been initialized to point at modifiable memory. Nothing about the str itself makes it read-only. |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 10:43AM -0400 On 10/26/21 5:02 AM, RacingRabbit@watershipdown.co.uk wrote: ... > } > int main() > { Try changing the following line: > char *str = "hello"; to char greeting[] = "hello"; char *str = greeting; > puts("Worked"); > return 0; > } You shouldn't get a bus error this time. Do you understand why? |
| RacingRabbit@watershipdown.co.uk: Oct 26 02:48PM On Tue, 26 Oct 2021 10:42:24 -0400 >You'll get less patronizing responses when you cease displaying such an >abysmal understanding of C, while believing you understand it better >than others. Says the preening fool. >Your right, but for the wrong reasons. It's true that *str isn't >modifiable, but that's not just because of the "*", it's because str has >been initialized to point at the first character of a string literal. It So you disagree with what I said then say exactly the same thing yourself. Ok, well thanks for that. Helpful. |
| Bart <bc@freeuk.com>: Oct 26 03:55PM +0100 >> This shows ABC the first time it's executed. The second time it shows >> ZBC; the code has changed the string literal! Where the same literal iS > I suggest you actually try running that code and see what happens. What makes you think I didn't? I actually listed the 7 compilers I tried it on, just at the point where you must have stopped reading. Oh, you mean you only tried it on one implementation? |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 26 11:03AM -0400 On 10/26/21 1:18 AM, Juha Nieminen wrote: > (Also, if I remember correctly, it caused quite a discussion about > whether compilers should actually be allowed to "do whatever they want" > with such code, or whether they should do as they are told.) That bug serves to illustrate a very important point about undefined behavior. UB only refers to behavior that is not defined by the C standard; the behavior might in fact be defined by some other document. If such a document has authority over every place that your code needs to work, there's absolutely nothing wrong with writing code with undefined behavior that relies upon the definition of that behavior provided by that document. But when you write such code, it's absolutely essential that you know what the relevant definition is. The developers in question were absolutely certain that they knew what the defined behavior was for the platform that they were using: the hardware had an instruction that could be used to load the data from a specified memory location, and nothing problematic would occur if that instruction was passed an address of 0. There were two key mistakes in this thinking: 1. It's not the hardware that defines the behavior, it's the implementation of C. 2. Popular misconceptions to the contrary notwithstanding, C is NOT a portable assembler. C code does not instruct the compiler to generate a particular set of machine code instructions. It only tells the implementation what the desired behavior of the program is. An implementation has no obligation to produce any specific set of machine code instructions to achieve that goal. The only requirement is that the observable behavior of the program (a term defined in 5.1.2.3p2, and that definition is significantly more complicated than "behavior which can be observed") must meet the requirements of the rest of the standard. When the behavior is undefined, the standard doesn't impose any requirements. In this particular case, the implementation provided it's own definition of the behavior, and it was significantly more complex than merely the single machine instruction that they expected to be generated. Specifically, the defined behavior was to optimize all code between the last time the pointer was updated, until the next time it was updated, on the assumption that the pointer's value was not null. Such an optimization is normally a good idea - if they had taken proper care to prevent the pointer from having a null value, the code that was removed would have been dead code - removing it sped up the program. But they didn't take proper care to make sure the pointer was not null. They dealt with that possibility only after dereferencing it, and as a result, the code that they wrote to deal with that possibility got optimized away. The most ironic aspect of this problem was that the optimization that revealed the bug in their code was not on by default. It was turned on because they had explicitly requested it. Your obligation to know how an implementation defines behavior that is undefined by the standard is even higher when choosing non-default optimizations. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment