- std::hexfloat - 24 Updates
- std::hexfloat - 1 Update
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 07:33AM -0400 >>> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote: >>>> type punning through casting pointers." It doesn't work because it >>>> doesn't work, end of story. ... >>>> x = foo(reinterpret_cast<float*>(&x), &x); >>>> std::cout << x << "\n"; // Expect 0? >>>> } ... >>>> You will find many similar examples in articles on the internet about >>>> strict aliasing. ... >> 1 > All that proves is what we already know - optimisers occasionally try to be > too clever and get it wrong. Is that it? The code has undefined behavior - the standard imposes no requirements of any kind on the behavior of your program. As a result, there's no behavior that the optimizers could produce which would violate the standard's requirements. Since it doesn't violate those requirements, on what basis do you describe the behavior as "wrong"? Perhaps it doesn't meet your personal requirements for the behavior? ISO has the relevant authority to say what requirements apply to this code, you don't. |
blt_v1r412g9x@6qoiu1wf218aali.ac.uk: May 23 11:35AM On Thu, 23 May 2019 23:32:20 +1200 >> Since when is deferencing a memory address undefined behaviour? The memory >> value is zero yet the binary is returning 1. The compiler fucked up. >Read the bit you snipped. I did. The compiler fucked up. Not producing assembly with logic that matches the C++ then claiming its undefined behaviour as some kind of get out of jail free card is unacceptable. |
David Brown <david.brown@hesbynett.no>: May 23 01:36PM +0200 > at the pointer address you'll see it is zero so the optimiser is actually > putting the value 1 aside and returning that instead of returning the > deferenced memory value. That is incorrect behaviour. The memory dump is irrelevant. We are talking about C and C++, high level languages - not assembly. You are wrong, the compiler is right. >> unreliable, buggy, and inconsistent. > Oh spare me, I was probably programming assembly when you were still learning > what PRINT "Hello world" did. That is unlikely, but not impossible. I started programming in assembly some 35 years ago. But a pissing competition will not help here - looking at the C standards to read what they say might help. Look at the output from gcc to see what that compiler actually generates will help. Do you really think this is a bug in gcc's optimiser? And coincidentally, clang has the same bug despite a completely different architecture? A bug that has lasted for decades, in simple code? As I have said to others, it is fine to be of the opinion that optimisation based on type-based alias analysis is a bad idea. But it is sheer obstinate idiocy to claim it doesn't occur in real code with real compilers, or that it is not allowed by the C and C++ standards. >> Good programmers make mistakes too, of course, but they try to learn >>from them - and from information they are given by other people. > If you say so. You don't agree? |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 23 12:37PM +0100 On Thu, 23 May 2019 11:03:37 +0000 (UTC) > to someone else - dump the memory at the address just before the return and > you'll find that it is zero yet the opimised code incorrectly returns the > value of 1. The compiler obviously did a look ahead and got it wrong. You have a bizarrely eccentric view of how compilers work, and how C++ works. What both compilers _probably_ did (they aren't obliged to explain themselves, only produce code conforming with the C++ standard) is to see that the function 'foo' takes two pointer arguments, one float* and one int*. They therefore, in reliance on the aliasing rules in the standard, presume in compiling 'foo' that the two pointer arguments do not refer to the same object (here is the programmer's error). In analysing the function 'foo' they are therefore entitled to reorder the assignments to the memory regions pointed to, and put the assignment to the int* last, which is the return value of 'foo'. On analysing 'main' they see that the assignment to the memory region pointed to by the float* is never used, so they elide it completely. So you are just left with an assignment of 1 to 'x', which the program duly prints. > Yes, thats what it did and it got it wrong. It its naively assuming the > value returned should be 1 whereas the correct dereferenced value is zero. > Its things like this that make using optimisation a trial and error process. No, it is your pig-headed incompetence which makes it a trial and error process for you: pig-headed because you have now been told numerous times why it doesn't work and you continue to argue against all reason. Incompetence because you deliberately ignore the compiler warnings and think you know better (and deliberately ignore the standard also, it appears). Such optimizations are not unusual. C (but not C++) takes it further by introducing a 'restrict' keyword which informs that compiler that two or more arguments of the _same_ type do not alias the same object. I presume you feel entitled to ignore the restrict keyword on C library function signatures as well, if you code in C. |
David Brown <david.brown@hesbynett.no>: May 23 01:40PM +0200 >>> No, its optimisers being too clever and getting it wrong. >> How can the compiler get undefined behaviour wrong? > Since when is deferencing a memory address undefined behaviour? Since you tried to do so in an undefined way. > The memory > value is zero yet the binary is returning 1. The compiler fucked up. No, the programmer got it wrong. Welcome to the world of programming, where you have to follow the rules of the language you are using. If you don't follow the rules, you have undefined behaviour. When you have undefined behaviour, weird things can happen. That includes inconsistent results, or results that appear to be two different things at the same time. It also includes demons coming out your nose. And it includes people getting in a fluster when they find that some of their assumptions about programming are completely wrong. |
blt_zr@iim5jolly.ac.uk: May 23 11:44AM On Thu, 23 May 2019 07:33:45 -0400 >of any kind on the behavior of your program. As a result, there's no >behavior that the optimizers could produce which would violate the >standard's requirements. Since it doesn't violate those requirements, on So the optimisers just make it up as they go along? A sane approach would be to error and not produce any output but it does - it produces assembly code that has a different logic to the source code. |
blt_kt57lqEv@pn06iw4mz5_tuua48tr.co.uk: May 23 11:47AM On Thu, 23 May 2019 13:36:54 +0200 >> deferenced memory value. That is incorrect behaviour. >The memory dump is irrelevant. We are talking about C and C++, high >level languages - not assembly. You are wrong, the compiler is right. When did C become a high level language? Its whole raison d'etre is to the metal coding you doughnut. Its mid level at best. >> what PRINT "Hello world" did. >That is unlikely, but not impossible. I started programming in assembly >some 35 years ago. About the same time as me then. >Do you really think this is a bug in gcc's optimiser? And gcc has had plenty of bugs in the past, what makes you think its bug free now? >You don't agree? Have a guess. |
David Brown <david.brown@hesbynett.no>: May 23 01:59PM +0200 On 23/05/2019 11:27, Bonita Montero wrote: >> microcontrollers with paged memory, near and far pointers, and pointers >> to different types of memory. ... > Name some of them. I have already mentions the AVR, as one of the smallest architectures supported by gcc. Amongst the attributes (gcc extensions) for variables that affect pointer types here are "progmem", "io", and "absdata". These pointers are completely incompatible with each other. It also supports named address spaces, with "__memx" object pointers being 24-bit while normal object pointers are 16-bit. On 8051 processors, you usually have a range of different memory types, with corresponding compiler extensions and pointer types. <http://www.keil.com/support/man/docs/c51/c51_le_memtypes.htm> <http://www.keil.com/support/man/docs/c51/c51_le_ptrs.htm> That compiler supports "idata" pointers that are 1 byte long, "xdata" pointers that are 2 bytes, and "generic" pointers that are 3 bytes long. Such limited 8-bit microcontrollers are becoming less common (as small ARM devices take over), but the 8-bit cpu market revenue is still a good deal higher than that of 32-bit and 64-bit cpu market, as I understand it. |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:08AM -0400 On 5/23/19 7:27 AM, blt_4zlek_0g@ivbo5ok_l3.ac.uk wrote: ... > Since when is deferencing a memory address undefined behaviour? Whenever the standard says that it is. In this case, it's undefined behavior because it violates the aliasing rules (6.10p8). Other cases where dereferencing a memory address has undefined behavior include: 1. The memory address is a null pointer. 2. The memory address used to point at an object whose lifetime has ended. 3. The memory address points at a location outside of the array that whose location was used to calculate the address. 4. If a side effect on the specified memory location is unsequenced relative another side effect or a value computation on the object, and they are not potentially concurrent with each other. Either a side-effect or the value computation might involve dereferencing a memory address. 5. If the dereference involves modifying the memory location, and the location is either a string literal or an object whose definition is const. 6. The memory address was the result of a request to allocate 0 bytes. 7. If the dereference was for reading the value of an object, and he object is uninitialized. 8. The definition of the object the pointer points at is volatile-qualified, and the pointer itself is not. |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:11AM -0400 On 5/23/19 7:35 AM, blt_v1r412g9x@6qoiu1wf218aali.ac.uk wrote: ... > I did. The compiler fucked up. Not producing assembly with logic that matches > the C++ The C++ code has undefined behavior. That means that matching the C++ is trivial: any behavior the compiler wishes to produce matches the C++. > ... then claiming its undefined behaviour as some kind of get out of jail > free card is unacceptable. Except, of course, when the behavior actually is undefined, as it is in this case. As far as the compiler is concerned, "undefined behavior" is a "get out of jail free" card. The standard imposes no requirements on the behavior, of any kind. If that isn't "free", what is? |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:18AM -0400 > On Thu, 23 May 2019 07:33:45 -0400 > James Kuyper <jameskuyper@alumni.caltech.edu> wrote: ... >> behavior that the optimizers could produce which would violate the >> standard's requirements. Since it doesn't violate those requirements, on > So the optimisers just make it up as they go along? No - the compilers do whatever is convenient for them to do which produces behavior matching the standard's requirements, when it imposes any, while not caring about what happens in those cases where the standard imposes no requirements. You're in just as much trouble if they produce results you didn't intend because they were deliberately producing such results, or if it was because they didn't both caring about what happens in the case where no requirements apply. The latter is by far the more likely case. A sane approach would be > to error and not produce any output but it does - it produces assembly code > that has a different logic to the source code. The source code has no logic - it has undefined behavior. The only reason why a+b has the behavior of adding the value of a to the value of b is that the standard requires that to be the behavior of that construct. The standard doesn't require any particular behavior for the construct you're using - it's meaningless to talk about the "logic" described by that source code. |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:20AM -0400 On 5/23/19 4:41 AM, blt_naqv25mlmM@xl5ztir5.gov.uk wrote: ... > Fine, if we're moving away from von neumann architectures then you can > obviously have different sized instruction and data pointers. But even then > a float and and int would have the same sized pointer. There's no such requirement in the standard, and real-world implementations have had pointers that had different sizes, most notably implementations where T* was larger if alignof(T) is not a multiple of the word size. |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:23AM -0400 On 5/23/19 5:04 AM, David Brown wrote: ... > In C, pointers to float and int (and any other object type, and void) > are the same size. That is not required by the C standard, and a fully conforming implemenation is permitted to violate that expectation. Some real world implementations have had sizeof(T*) depend upon whether or not _Alignof(T) is a multiple of the word size. |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:29AM -0400 On 5/23/19 5:13 AM, David Brown wrote: > On 23/05/2019 03:12, James Kuyper wrote: ... > the conversions are only valid if alignments are valid - this would, I > think, allow a system such as you describe to have longer pointers to > char (and void) than pointers to larger objects. The alignment issue certainly does allow that, but it's not the only thing that does so. Keep in mind that you can convert any char value to long double and back again without change of value - conversion doesn't require that the sizes be the same. The key issue is compatibility - the standard doesn't say so directly, but what it does say aboue compatible types implies that they must basically have the same size and representation. If types are not specified as being compatible with each other, they are generally allowed to have different sizes and representations (unless specified otherwise). Most pairs of pointer types are incompatible with each other. |
David Brown <david.brown@hesbynett.no>: May 23 02:33PM +0200 >> level languages - not assembly. You are wrong, the compiler is right. > When did C become a high level language? Its whole raison d'etre is to the > metal coding you doughnut. Its mid level at best. From day 1 of its inception. It is a high level language that can be used to avoid the need of low level languages (basically, assembly) in many situations. The language is defined in terms of the behaviour on an abstract machine, not in terms of the target hardware (though many details are left for the implementation, and expected to match the hardware - so that it can be fast and efficient). > About the same time as me then. >> Do you really think this is a bug in gcc's optimiser? And > gcc has had plenty of bugs in the past, what makes you think its bug free now? I don't think gcc is bug-free. But this is not a bug - it is a feature. It was discussed at length before being introduced to gcc about 20 years ago, there is an option to enable or disable the feature, and it is documented. It is not a bug. (Warnings about undefined behaviour could always be improved in the compiler - this is something the gcc developers are always working on.) The C language standard says that accessing data via pointers of incompatible types is undefined behaviour - it does this /precisely/ to offer more optimisation opportunities to compilers. Many quality compilers take advantage of this when generating code. For the most part, you never notice - these optimisations are just one of a huge number of optimisations that have little influence individually, but combined make a big difference to code performance. Some compilers, on the other hand, assume that their users don't understand the finer details of the language and get this kind of thing wrong. So they don't do as much optimisation as they could - they make such undefined behaviour defined. Implementations can do that, of course, but it has the side-effect of teaching programmers that this is what is supposed to happen, or how C is supposed to behave - as well as limiting the opportunities for generating more efficient code and helping debug code better. >> You don't agree? > Have a guess. Let's break this down. Which part do you disagree with? 1. "Good programmers make mistakes too". 2. "Good programmers try to learn from their mistakes". 3. "Good programmers try to learn from information given by other people." Or perhaps you only disagree with the inferences you took from that - that you are a bad programmer because you refuse to learn from your mistakes and refuse to learn from what others here have told you? I didn't say as much, but if you want to read that between the lines, I won't disagree. (Of course, if this thread has made you think, and realised that you are wrong here, then I will be happy.) |
James Kuyper <jameskuyper@alumni.caltech.edu>: May 23 08:35AM -0400 > On Thu, 23 May 2019 13:36:54 +0200 > David Brown <david.brown@hesbynett.no> wrote: ... >> The memory dump is irrelevant. We are talking about C and C++, high >> level languages - not assembly. You are wrong, the compiler is right. > When did C become a high level language? At it's conception. It's one of the lowest-level high level languages, but it has always been a high level language. A implementation is free to translate C code into any sequence of machine code it likes, so long as the observable behavior of the program matches the requirements of the standard. That wouldn't be the case if C were a low level language, and it's precisely that feature that you're objecting to. The C standard imposes no requirements on the code you're talking about, so producing observable behavior that's consistent with those requirements is trivial. ... >> Do you really think this is a bug in gcc's optimiser? And > gcc has had plenty of bugs in the past, what makes you think its bug free now? He's not saying it's bug-free, he's saying that this can't be a bug. The behavior is undefined, so nothing that the compiler does with the code can count as a erroneous. |
David Brown <david.brown@hesbynett.no>: May 23 02:41PM +0200 > So the optimisers just make it up as they go along? A sane approach would be > to error and not produce any output but it does - it produces assembly code > that has a different logic to the source code. The assembly code produced has the same logic as the source code - /any/ assembly code would be equally valid for source code with undefined behaviour. I think we can all agree that the best situation is when the compiler can warn you about the undefined behaviour. This is, unfortunately, surprisingly difficult. In simple test cases like this, it may be obvious to the human reader that there is a problem. But warnings on obvious bad code are not particularly helpful - the useful thing is warnings on non-obvious bad code. And generating warnings that catch something like this, without also reporting false positives, is extremely difficult. (There is nothing wrong with the "foo" function, for example. And an explicit cast like this is often viewed by tools as an indication that the programmer knows they are doing something dangerous, but knows what they are doing.) Feel free to report the code to gcc as a "missing warning" issue. |
scott@slp53.sl.home (Scott Lurndal): May 23 01:05PM >Fine, if we're moving away from von neumann architectures then you can >obviously have different sized instruction and data pointers. But even then >a float and and int would have the same sized pointer. Who said anything about a non-von neumann architecture? The Burroughs system in question is certainly von neumann. Here's the description of the function call (Virtual Enter) and return instructions: http://vseries.lurndal.org/doku.php?id=instructions:ven http://vseries.lurndal.org/doku.php?id=instructions:ret |
scott@slp53.sl.home (Scott Lurndal): May 23 01:14PM >> what PRINT "Hello world" did. >That is unlikely, but not impossible. I started programming in assembly >some 35 years ago. 43 years for me (PAL-D on a PDP-8 running the TSS/8.24 timesharing operating system). |
Bart <bc@freeuk.com>: May 23 02:16PM +0100 On 23/05/2019 13:33, David Brown wrote: > From day 1 of its inception. It is a high level language that can be > used to avoid the need of low level languages (basically, assembly) in > many situations. I agree with blt_kt57lqEv. I feel there is a need for a language which is a two or three steps up from ASM (one step up would be a variety of HLA). You don't want a language to get above itself and make out that it's higher level than it really is. (That's C. I don't actually know whether C++ is that much higher level or not.) BTW here's an example of mixed pointer access with my scripting language, which is probably higher level than either C or C++. Raw pointer access is rare, but it is possible: x := 1.0 px := makeref(&x,real64) pa := makeref(&x,int64) So px is a double* pointer, pa is a int64* pointer, both pointing to the same 64-bit memory location. This is what C/C++ says is undefined behaviour if you try and access that memory? Let's try it: println px^ # (^ means deref) read as float println pa^:"h" # reas as int and display as hex Output is: 1.000000 3ff0000000000000 This is exactly what you might expect. Why isn't this UB in this language? Why does it not cause demons to fly out of people's noses? UB appears to be just an invention of C, and adopted by C++, to make programming harder, and to make people who understand it all think they're smarter. |
David Brown <david.brown@hesbynett.no>: May 23 04:13PM +0200 On 23/05/2019 15:16, Bart wrote: >> used to avoid the need of low level languages (basically, assembly) in >> many situations. > I agree with blt_kt57lqEv. Well, I don't think there is any fixed definition about what is a "high level language" and what is a "low level language", so any discussion is going to be subjective on these terms. There are some points on which there is no doubt, however, given that they are recorded in the history of the C language and/or the standards: C has never been a kind of "high level assembler", but was designed to greatly reduce the need to write code in assembly. (This is the intention - despite this, a certain proportion of C programmers believe it /is/ a "high level assembler".) C is defined in terms of an abstract machine, not any physical machine, and explicitly makes a number of points "undefined behaviour" even though a behaviour could reasonably by defined for most or all real implementations. C has many implementation-dependent features that are intended to match closely to hardware, so that efficient code can be generated by compilers. C supports portable coding using standardised features, and it also supports non-portable coding using implementation-specific or target-specific features and extensions. This is intentional in the design of the language, and both kinds of coding are perfectly valid. > from ASM (one step up would be a variety of HLA). You don't want a > language to get above itself and make out that it's higher level than it > really is. That may be the case, though I have not felt the need of such a language myself. C is low enough level for most of my use. (There are always a few things that can be done in assembly but not in C.) I am reliant on being able to do non-portable implementation-specific programming in C, however - portable standards-only C coding would not be sufficient for my use. You might like to view "gcc C with extensions targetting specifically the ARM Cortex-M4" as an example of a language that lies between assembly and "standards-compliant portable C". If you do, then I appreciate your point. (Obviously for different targets, or different compilers, there are different variants of C - this is just the particular variant I am using most at the moment.) > (That's C. I don't actually know whether C++ is that much higher level > or not.) That depends on how you define "higher level", I would say. C++ lets you do anything you can do in C (and "gcc C++ with extensions targeting a particular cpu" lets you do anything you can do in the equivalent C variant). But C++ also supports many higher level language features and programming styles. It is intended to be a language that supports programming in a wide variety of levels, styles and paradigms. > 3ff0000000000000 > This is exactly what you might expect. Why isn't this UB in this > language? Why does it not cause demons to fly out of people's noses? Turn the question on its head. Why would one want to do this in the first place? Floating point formats are completely different from integer formats - it very rarely makes sense to mix the two. So the language assumes by default that you do /not/ mix the two. For the unusual cases, it provides methods that /do/ allow cross-type access (primarily using unions and char* access, especially memcpy). Compilers will invariably also support using "volatile" accesses here, though I am not sure the standard actually requires it until C17. For efficient code generation, it is very useful for the compiler to know when pointers cannot alias. The type-based rules are just part of this system. Other rules include those on pointer arithmetic and bounds, those covering "const", and the "restrict" modifier. Any language will have some control over what pointers can alias. Consider code like this: void inc_array(int * p, int n) { int i; for (i = 0; i < n; i++) { p[i]++; } } A reasonable implementation for that would be: inc_array(int*, int): test esi, esi jle .L1 lea eax, [rsi-1] lea rax, [rdi+4+rax*4] .L3: add DWORD PTR [rdi], 1 add rdi, 4 cmp rdi, rax jne .L3 .L1: ret A compiler might also choose to hold "n" in a register and decrement it. You would not expect the compiler to generate code puts "i" and "n" on the stack and re-loads them from memory at every loop, just in case "p" happened to point to the memory location for these two objects on the stack. You expect that they can go in registers and will not be aliased by p. But very often, the compiler doesn't have as good information as it would like regarding aliasing. It needs all the help it can get, including from types. > UB appears to be just an invention of C, and adopted by C++, to make > programming harder, and to make people who understand it all think > they're smarter. No, it is a way of making it clear that some things that can be expressed syntactically in a language have no sensible meaning - and of letting the compiler use that to generate more efficient code and aid the developer in finding bugs. |
David Brown <david.brown@hesbynett.no>: May 23 04:19PM +0200 On 23/05/2019 15:14, Scott Lurndal wrote: >> That is unlikely, but not impossible. I started programming in assembly >> some 35 years ago. > 43 years for me (PAL-D on a PDP-8 running the TSS/8.24 timesharing operating system). Unfortunately for me, they did not have a PDP-8 at my kindergarten, as I would have been 3 at that time! |
David Brown <david.brown@hesbynett.no>: May 23 04:23PM +0200 On 23/05/2019 14:23, James Kuyper wrote: > implemenation is permitted to violate that expectation. Some real world > implementations have had sizeof(T*) depend upon whether or not > _Alignof(T) is a multiple of the word size. Yes, I have just recently realised that, as a result of other posts in this thread. The conversion rules of 6.3.2.3 say that you have to be able to convert a pointer to one type of object to a pointer to a different type type and back again, getting the original value unchanged. I had taken that to imply that the sizes have to match. But there is the restriction that the conversion is only valid if the alignments are valid, and that allows you to have different sizes for the pointers. (Types with greater alignment requirements can have smaller pointers.) |
David Brown <david.brown@hesbynett.no>: May 23 04:30PM +0200 On 23/05/2019 14:29, James Kuyper wrote: > thing that does so. Keep in mind that you can convert any char value to > long double and back again without change of value - conversion doesn't > require that the sizes be the same. The conversion is both ways, however. You can't convert a long double to a char and back again without a change of value (unless you had a /really/ odd system!). Thinking a little more, however, even two-way conversion requirements are not enough to force the size to be the same - it would be possible for pointers to different types with the same alignment requirements to have different representations or encodings, with different sizes, as long as they each supported the same range of valid values. |
ram@zedat.fu-berlin.de (Stefan Ram): May 23 01:27PM >Whenever the standard says that it is. In this case, it's undefined >behavior because it violates the aliasing rules (6.10p8). 6.10p8? It might have been 3.10p8, but now (in n4800) it is 7.2.1p11 (if I did correctly guess what you intended to refer to). |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment