- std::hexfloat - 24 Updates
- Now what? - 1 Update
Paavo Helde <myfirstname@osa.pri.ee>: May 20 07:19AM +0300 On 20.05.2019 0:21, Alf P. Steinbach wrote: > general type punning except by way of `memcpy`. > I disagree with that interpretation: it's totally impractical, so IMO it > can't be the /intent/. And what makes this so impractical? IMO, a 'memcpy' is a great way to tell the reader that one is doing something hackish. Also, memcpy is recognized specially by compilers and can be optimized away where appropriate. You are against 'memcpy' only because you are *used to* do the type punning by some other way and are now reluctant to change your habits. From https://en.cppreference.com/w/cpp/language/reinterpret_cast : "The purpose of strict aliasing and related rules is to enable type-based alias analysis, which would be decimated if a program can validly create a situation where two pointers to unrelated types (e.g., an int* and a float*) could simultaneously exist and both can be used to load or store the same memory." In C++20 we will have std::bit_cast() which will make the intent even clearer than 'memcpy'. |
David Brown <david.brown@hesbynett.no>: May 20 08:15AM +0200 On 19/05/2019 23:21, Alf P. Steinbach wrote: >> types as you did does not work on anything but the most limited of >> compilers, and usually only with optimisations disabled.) > C supports type punning via unions. This is actually not allowed in C90, IIRC. > C++ does not. You are right, I should have thought of that. Using unions here is not strictly portable, but works on all practical compilers. (Let me know if I am wrong here!) The code could never be fully portable anyway. > C++ does not, in the strictest interpretation of the formal, support > general type punning except by way of `memcpy`. Or other char* access. > I disagree with that interpretation: it's totally impractical, so IMO it > can't be the /intent/. It is very rare that you need to mess with types like this. memcpy would have worked fine. Modern compilers would optimise memcpy away entirely in a situation like this. (But it can be a pain for older or weaker compilers.) > impractically rigid literal interpretations of the formal rules, so I > would absolutely not be surprised if they assume the aforementioned one > too. Nonsense - that's just an excuse people use when their incorrect code fails on optimising compilers. And in this case it is particularly wrong - gcc makes it clear that "type punning via unions" is a common technique and supported by the compiler (even without "-fno-strict-aliasing"). <https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html> <https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning> > I don't know of any way to tell the C++ compiler that look, these > two pointers are of different types but access the same bytes in memory. There is no standard way that I know about. (There are non-standard ways, such as gcc's "may_alias" attribute and its "-fno-strict-aliasing" flag.) > dereferencing that 2nd pointer and accessing the pointee, will never be > executed. Then the compiler can optimize it away. That's what g++ does > in a number of cases, so perhaps also in this one, if one's unlucky. That is correct regarding UB. But if the behaviour is defined in the documentation, it is not undefined. clang follows gcc in such cases, but I can't be entirely sure that other C++ compilers say they allow union-based type punning. I /believe/ they do, but there may be exceptions. |
blt_8G1dvs@eifgienqb.org: May 20 09:19AM On Sun, 19 May 2019 21:10:44 +0200 >> virtual function table would be fun to implement for a start) and the STL and >> 2011+ would be next to impossible. >Well, it turns out that C++ /is/ used regularly on such systems. People I've worked with PICs. I've yet to see anyone use C++ on them. >(except perhaps std::array) will be limited. But there is a lot of C++ >used nonetheless. (A great many C++11 features are zero cost at >run-time, at least with a reasonable compiler.) And a lot arn't. >> You could, but my example was 1 line which was the point. >One /incorrect/ line is not very useful. It is better to have a few >lines that work than a single line that does not. Just because you don't like it doesn't make it incorrect. >(And that's not just theory - faffing around with casting pointer types >as you did does not work on anything but the most limited of compilers, >and usually only with optimisations disabled.) Oh rubbish. Its worked on every compiler I've ever used. |
Bonita Montero <Bonita.Montero@gmail.com>: May 20 11:20AM +0200 >> useless manipulator. What on earth is the point of it? > You can serialize floating-point-values loss-free in ACII-files > with it. I have to correct myself: floating-point-values are de-facto always encoded with base 2 and every base-2-value, even with fractions, is representable in base 10 because both share the prime-factor 2 of the base. So the advantage is simply that the value might be encoded shorter than in base-10. |
blt_f429k@5r_jdp24fz5.gov.uk: May 20 09:23AM On Mon, 20 May 2019 07:19:41 +0300 >validly create a situation where two pointers to unrelated types (e.g., >an int* and a float*) could simultaneously exist and both can be used to >load or store the same memory." Do you ever get the feeling that a lot of C++ people are afraid of pointers and direct memory access? One wonders why they don't just use Java. >In C++20 we will have std::bit_cast() which will make the intent even >clearer than 'memcpy'. Doesn't sound any clearer to me. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 12:29PM +0100 > On 19/05/2019 23:21, Alf P. Steinbach wrote: <cut> >> C supports type punning via unions. > This is actually not allowed in C90, IIRC. I don't think that's true. Can you say more? >> C++ does not. > You are right, I should have thought of that. Can someone point at the bit (or bits) in the standard that make the difference here? I find the C++ standard just big enough that I'm never sure I've seen all the relevant parts to answer any particular question! <cut> >> I don't know of any way to tell the C++ compiler that look, these two >> pointers are of different types but access the same bytes in memory. > There is no standard way that I know about. That made me sit up! I think some words have got lost. I think you and Alf are talking about pointers /declared/ to have different "target" types, though I am not 100% sure. Given union { char b[sizeof (float)]; float f; } u; u.b and &u.f are pointers of different type that access the same bytes in memory. It would be very odd if this were not allowed. Unions /are/ the standard way to tell the compiler that different lvalue expressions will access the same memory. <cut> -- Ben. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 12:41PM +0100 > the base. > So the advantage is simply that the value might be encoded shorter > than in base-10. Small nit: they all use a power of 2 as the base (old IBM FP used 16). The encoding is a bit of a red-herring. If some system used a power of 3, even if encoded with base 2, the values would not be finitely representable in base 10 (without using tricks like "0.3..."). -- Ben. |
Bo Persson <bo@bo-persson.se>: May 20 01:44PM +0200 On 2019-05-20 at 13:29, Ben Bacarisse wrote: > the standard way to tell the compiler that different lvalue expressions > will access the same memory. > <cut> You are allowed to access the different members, but not at the same time. C++ specifies that an "active member" is the one that was most resently written to. That is then the only member that can be read. https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior A complication is that the major compilers chose to implement the C rules as an extension (for C compatibility?). So it might work anyway... Bo Persson |
David Brown <david.brown@hesbynett.no>: May 20 02:00PM +0200 >>> 2011+ would be next to impossible. >> Well, it turns out that C++ /is/ used regularly on such systems. People > I've worked with PICs. I've yet to see anyone use C++ on them. PICs (the traditional PICs - not things like the PIC32) are at the bottom end of 8-bit devices. They can barely be used with normal C. People use C++ on the AVRs, which are 8-bit, and have good gcc support. (But no exceptions or RTTI, and much of the C++ standard library is missing.) Vendors like IAR have C++ for a range of different 8-bit microcontrollers. 16-bit processors are quite rare these days, but devices like the msp430 can be very useful. And they can be programmed in C++, again using gcc or IAR. I am not by any means saying that C++ is the most common choice of language for 8-bit and 16-bit devices - nor am I saying that 8-bit and 16-bit are common targets for C++. I am merely saying that quite a lot of people successfully and regularly use C++ on such devices. >> used nonetheless. (A great many C++11 features are zero cost at >> run-time, at least with a reasonable compiler.) > And a lot arn't. Looking briefly through a C++11 feature list (from the Wikipedia article - it's a reasonable summary), I can see that /all/ the language features are zero cost. Lambdas, auto, constexpr, range-based for, template aliases, literals, static assertions, etc. - all zero cost. Threading may have indirect costs (like thread-safe static initialisation). Some standard library changes may have a cost, but many of these actually improve efficiency. Can you give examples of C++11 features that are costly for small systems, compared to how they might have been written in C or pre-C++11? Noting, as a I said, that things like the container libraries are often not implemented on small C++ targets - and are rarely used even if they /are/ implemented. >> One /incorrect/ line is not very useful. It is better to have a few >> lines that work than a single line that does not. > Just because you don't like it doesn't make it incorrect. It is incorrect because it is incorrect - whether you or I like it or not. In C and C++, you can't use a pointer to one type to access data of a different type (outside certain exceptions) - that applies even if you cast the pointer types. In the C standards, this is in 6.5p7. I don't know where it is expressed in the C++ standards - perhaps someone more familiar with them can say. (And I'm sure someone will say if I'm wrong.) >> as you did does not work on anything but the most limited of compilers, >> and usually only with optimisations disabled.) > Oh rubbish. Its worked on every compiler I've ever used. Then you haven't used many compilers - certainly not two of the three most used C++ compilers (gcc and clang). The details of what they do with type alias violations depends on the exact code and the option flags - if it is just as efficient to give you the code you apparently expect, then they usually will. But try this code - it's simple code so you can use <https://godbolt.org> to test it with different compilers and options: int foo(void) { float f = 31.234; *(int*) &f = 123; return f; } With -O2, gcc and clang both give: xor eax, eax ret clang doesn't even give you a warning, even with -Wall -Wextra enabled. It is quite simple - do not cast the types of pointers like this. The result will, at best, be entirely dependent on the compiler and flags. At worst, it will work "fine" until you make other changes to other parts of the code, and knock-on effects cause the compilation here to change. |
David Brown <david.brown@hesbynett.no>: May 20 03:07PM +0200 On 20/05/2019 13:29, Ben Bacarisse wrote: >>> C supports type punning via unions. >> This is actually not allowed in C90, IIRC. > I don't think that's true. Can you say more? I didn't have a C90 reference handy, but I've looked it up now. In 3.3.2.3, accessing a member of a union when a value has been stored in a different member is implementation defined. (In C99 and C11, in 6.5.2.3, there is a footnote saying exactly how "type punning" behaves.) It would have been better to say that type punning via unions is not clearly allowed by the C90 standard, than to say the standard disallows it (since it is implementation defined, not undefined). > Can someone point at the bit (or bits) in the standard that make the > difference here? I find the C++ standard just big enough that I'm never > sure I've seen all the relevant parts to answer any particular question! I am in the same position regarding C++ - and it doesn't help that the standard changes every three years. My understanding is that C++ followed C90 here, and did not update with the footnote and clarification from C99. I would also be happy to see a clear reference from the standards. (en.cppreference.com is clear on the matter, and they are usually right. Look under "Explanation" in <https://en.cppreference.com/w/cpp/language/union> ) > in memory. It would be very odd if this were not allowed. Unions /are/ > the standard way to tell the compiler that different lvalue expressions > will access the same memory. Yes, I think we are talking about slightly different things. unions tell the compiler that you have different objects at the same address, and thus a piece of memory can be accessed by different lvalue expressions. They don't necessarily say what happens when you access something that was written as a different field in the union (I can't find a description of that anywhere in the C++14 document). But I don't think you can take a pointer-to-float, and a pointer-to-int, and tell the compiler that they point to the same object in any standard way. |
Paavo Helde <myfirstname@osa.pri.ee>: May 20 04:52PM +0300 On 20.05.2019 15:00, David Brown wrote: > xor eax, eax > ret > clang doesn't even give you a warning, even with -Wall -Wextra enabled. This example does not demonstrate what you think it does. If it returned 31 then you could say the compilers have "unexpectedly" optimized away the invalid "*(int*) &f = 123;" line. However, returning 0 is perfectly expected because writing int 123 over a float produces a float value 1.724e-43#DEN which of course gets converted to 0 in the output. So the compiler has dutifully carried out the type punning as written, even if it was not obliged to. What's there to complain about? |
blt_i0d@ij4x0pfb0g9yrlo3p9w.com: May 20 03:00PM On Mon, 20 May 2019 13:44:18 +0200 >A complication is that the major compilers chose to implement the C >rules as an extension (for C compatibility?). So it might work anyway... Given how many C++ programs #include C code, any C++ compiler that didn't follow the rules of C - as long as they didn't conflict with those of C++ - would soon be supplanted by something else. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 04:37PM +0100 > You are allowed to access the different members, but not at the same > time. C++ specifies that an "active member" is the one that was most > resently written to. That is then the only member that can be read. Sure, but isn't writing to a member consider to be "access"? It is in C parlance. > A complication is that the major compilers chose to implement the C > rules as an extension (for C compatibility?). So it might work > anyway... I'm not talking about type punning. It was the blanket ban on access using pointers of different types the made me take note. -- Ben. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 04:47PM +0100 > different member is implementation defined. (In C99 and C11, in > 6.5.2.3, there is a footnote saying exactly how "type punning" > behaves.) I consider that to be allowed. There's no way that C can't specify the result, so this is as "defined" a construct as it can. > It would have been better to say that type punning via unions is not > clearly allowed by the C90 standard, than to say the standard disallows > it (since it is implementation defined, not undefined). C does not use the term allowed and disallowed. Did you mean it would have been better to make it undefined? If so, I disagree (but that's hardly important). > (en.cppreference.com is clear on the matter, and they are usually right. > Look under "Explanation" in > <https://en.cppreference.com/w/cpp/language/union> ) That says its undefined, but the C++ standard, unlike the C one, does not have an annex listing all UB constructs with normative references. Oh well... I don't doubt anyone's word about this, I'd just like to see the wording. <cut> -- Ben. |
"Öö Tiib" <ootiib@hot.ee>: May 20 09:40AM -0700 On Monday, 20 May 2019 18:37:29 UTC+3, Ben Bacarisse wrote: > > resently written to. That is then the only member that can be read. > Sure, but isn't writing to a member consider to be "access"? It is in C > parlance. In C++ the [class.union] is relatively short IIRC. Union has at most one /active/ non-static data member at any one time. Most generally we must use placement new expressions (and if there was previous active member then explicit destructor calls) to change the active members. However if the previous active member was of standard layout then there are no need to call (pseudo)destructor to it and when newly active member is of standard layout then we can simply assign to it instead of that placement new. Reading from not active member is undefined ... besides that standard layout members that have common initial sequence can be used to inspect each other within limits of that common initial sequence. For common usage the std::variant is lot more convenient to use than union (but takes more memory) so union can be perhaps used as performance optimization or for compatibility with other programming languages. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 20 05:49PM +0100 On Mon, 20 May 2019 14:00:16 +0200 > On 20/05/2019 11:19, blt_8G1dvs@eifgienqb.org wrote: [snip] > don't know where it is expressed in the C++ standards - perhaps someone > more familiar with them can say. (And I'm sure someone will say if I'm > wrong.) It is §3.10/10 of C++14, and §6.10/8 of C++17. They are modelled directly on the C equivalent. The number of people who don't trouble themselves to understand the strict aliasing rules of C and C++ is surprising. The standard- conforming way of dealing with aliasing is to use memcpy() instead of a cast. As memcpy() is an intrinsic on VS and a built-in in gcc/clang, it will be optimized out where a cast would (but for strict aliasing) work, and will still work where casting wouldn't (such as when casting would result in misalignment). An alternative if using gcc or clang is to type-pun through a union (the code emitted will be identical to using memcpy()). As a worst option you can use the -fno-strict-aliasing gcc/clang extension and have all aliasing optimizations switched off. That still wouldn't save you when alignment is an issue on the platform in question though. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 20 06:08PM +0100 On Mon, 20 May 2019 09:23:28 +0000 (UTC) > Do you ever get the feeling that a lot of C++ people are afraid of pointers > and direct memory access? One wonders why they don't just use Java. That's the wrong way around. What is a lot more frightening is incompetent programmers writing code which depends on pointer casts without realising that their code (a) gives undefined behaviour, and (ii) is total crap, and might break on a compiler upgrade or on a change of optimization level. You said elsewhere that "Its worked on every compiler I've ever used" but it absolutely doesn't work on recent versions of gcc and clang without the (non-standard) -fno-strict-aliasing switch, if you have optimizations switched on. |
scott@slp53.sl.home (Scott Lurndal): May 20 05:24PM >without realising that their code (a) gives undefined behaviour, and >(ii) is total crap, and might break on a compiler upgrade or on a >change of optimization level. Perhaps if one didn't hire incompetent programmers, one would need not worry about programmers writing "total crap"? |
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 06:26PM +0100 嘱 Tiib <ootiib@hot.ee> writes: <I've cut attributions because some seem to have got lost> >> >>> This is actually not allowed in C90, IIRC. >> >> I don't think that's true. Can you say more? >> >>>> C++ does not. I've cut the quoted text because your reply appears to be about the above, not what I wrote (though I may have misunderstood). <cut> > (pseudo)destructor to it and when newly active member is of > standard layout then we can simply assign to it instead of > that placement new. Thanks, yes, I read that part, but I could not find where a plain write sets the active member. It may be wrapped up in other more general text about assignment, placement new, or some such. > Reading from not active member > is undefined ... Do you know where this is stated? -- Ben. |
blt_xfnfdm@x1c.ac.uk: May 20 05:46PM On Mon, 20 May 2019 18:08:19 +0100 >without realising that their code (a) gives undefined behaviour, and >(ii) is total crap, and might break on a compiler upgrade or on a >change of optimization level. Perhaps use Java. Seriously. >but it absolutely doesn't work on recent versions of gcc and clang >without the (non-standard) -fno-strict-aliasing switch, if you have >optimizations switched on. fenris$ cc -v Apple LLVM version 10.0.1 (clang-1001.0.46.4) Target: x86_64-apple-darwin18.5.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin fenris$ cat t.c #include <stdio.h> int main() { float f = 1.234; int i = *(int *)&f; printf("%f\n",*(float *)&i); return 0; } fenris$ cc t.c -O3 fenris$ ./a.out 1.234000 baldur$ cc --version gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812] Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. baldur$ cc -O3 t.c baldur$ ./a.out 1.234000 Sorry, what was that you were saying? |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 20 07:04PM +0100 On Mon, 20 May 2019 17:46:56 +0000 (UTC) > baldur$ ./a.out > 1.234000 > Sorry, what was that you were saying? Sorry, what exactly do you think you were proving? Your crap code with undefined behaviour looks as if it is too inconsequential for g++ to optimize against it. gcc/g++ will however warn that your code is non fit for purpose - it tells you that it breaks strict aliasing rules. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: May 20 11:20AM -0700 > stored in a different member is implementation defined. (In C99 > and C11, in 6.5.2.3, there is a footnote saying exactly how "type > punning" behaves.) Despite that, the semantics for union member access is exactly the same in C90 as it is in C99. Discussion notes in some of the documents in www.open-std.org (sorry I don't have any more specific reference) make this clear. The wording in C90 saying "implementation-defined" is there because what value results depends on the representations of the types involved. Note that the footnote uses the term "type punning" (without quotes), much like the footnote in N1256. The type punning semantics is what was (implicitly?) assumed in K&R C. If this rule had been changed in C90, or changed between C90 and C99, such a change surely would have been mentioned in the Rationale documents. AFAICT there isn't any. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: May 20 12:06PM -0700 > the difference here? I find the C++ standard just big enough that > I'm never sure I've seen all the relevant parts to answer any > particular question! In N4659: 12.3 p1 says in part: In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended (6.8). For a union member or subobject thereof, 6.8 p1 says in part: [I]ts lifetime only begins if that union member is the initialized member in the union (11.6.1, 15.6.2), or as described in 12.3. 12.3 p5 says in part: In an assignment expression of the form E1 = E2 that uses either the built-in assignment operator (8.18) or a trivial assignment operator (15.8), for each element X of S(E1), if modification of X would have undefined behavior under 6.8, an object of the type of X is implicitly created in the nominated storage; no initialization is performed and the beginning of its lifetime is sequenced after the value computation of the left and right operands and before the assignment. [ Note: This ends the lifetime of the previously-active member of the union, if any (6.8). --end note ] To see the note is right, we return to section 6.8 p1, which says in part: The lifetime of an object o of type T ends when: [...] (1.4) -- the storage which the object occupies is released, or is reused by an object that is not nested within o (4.5) Note also 6.8 p5, which says in part: A program may end the lifetime of any object by reusing the storage which the object occupies [...] So assigning to a union member, which reuses the storage of any other (non-static) members in the same union, ends their lifetimes, which consequently makes them not be active. Does this mean trying to read them is undefined behavior? Yes, it does, under the general rule that objects may not be accessed after their lifetimes are ended. These cases are spelled out in excruciating detail in section 6.8, paragraphs 4, 6, and 7. Section 6.8 p4 says in part: The properties ascribed to objects and references throughout this International Standard apply for a given object or reference only during its lifetime. [ Note: In particular, [...] there are significant restrictions on the use of the object, as described below [...] --end note ] Section 6.8 p6 says in part: [A]fter the lifetime of an object has ended [...] any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. [...] The program has undefined behavior if: [...] (6.2) -- the pointer is used to access a non-static data member or call a non-static member function of the object, [...] Section 6.8 p7 says in part: Similarly, [...] after the lifetime of an object has ended [...], any glvalue that refers to the original object may be used but only in limited ways. [...] The program has undefined behavior if: (7.1) -- the glvalue is used to access the object, [...] Taken together I think these passages make the case pretty airtight. |
Manfred <noname@invalid.add>: May 20 09:38PM +0200 On 5/20/19 7:26 PM, Ben Bacarisse wrote: > I've cut the quoted text because your reply appears to be about the > above, not what I wrote (though I may have misunderstood). > <cut> <recut> >> Reading from not active member >> is undefined ... > Do you know where this is stated? I think it is a consequence of the first sentence: 9.5 p1: "...at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time." Which would imply that reading from a non-active member is the same as reading from an uninitialized variable. But, 9.2 p19 explicitly allows the case of reading the common initial sequence. |
"Öö Tiib" <ootiib@hot.ee>: May 20 05:49AM -0700 On Saturday, 18 May 2019 01:15:40 UTC+3, Daniel wrote: > message format. I don't understand why this project has to have its own > format, for what reason does it have to have its own format? That pretty much > rules out having users. When a message is say (in average 30 times) smaller than (say that of protobuf) then it is good. However we can't even measure any of that. I've told that to woodbrian several times. The code he generates neither takes in nor outputs something useful. It doesn't really matter if the "useful" is widespread exchange format (CBOR or JSON or ASN.1 or etc.) used directly or there is "woodbrian unique protocol" with round-trip-conversion module to/from some real thing. I think it is like Flibble said that woodbrian does not want advice. He just wants to advertise his site and software. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment