Monday, February 15, 2021

Digest for comp.lang.c++@googlegroups.com - 3 updates in 1 topic

mickspud@potatofield.co.uk: Feb 15 05:16PM

On Mon, 15 Feb 2021 17:55:58 +0100
>memcpy is going to give you the same code.
 
>The key difference is that casting pointer types then using them to
>access data is often lying to the compiler - for all but a handful of
 
Sorry? Its standard C. Perhaps its frowned on in C++ but I've been doing
network programming for a couple of decades and this method is used all over
the place. No one does 50 memcpys if there's a memory structure with 50
fields in it just for the sake of ivory tower correctness, you'd have to
be insane. A structure only has to be correct once in the header, memcpys have
to be correct everywhere you use them.
 
If you don't believe me have a look in any of the /usr/include/linux network
header files and then go through this and check out the casting to structs:
 
https://github.com/torvalds/linux/blob/master/net/ipv4/tcp.c
 
>exceptions, it is behaviour undefined by the standard. This means you
>can easily get something that works fine in your simple tests, but fails
>in more complex situations when code is inlined, link-time optimised, or
 
Rubbish. Maybe in Windows but that doesn't concern me.
 
>> for numeric values but why bother plus its unlikely to be very efficient.
 
>The point is that you have to have code for accessing the fields, you
>can't just use them directly. And when you have a an accessor function
 
Wtf are you taking about? You just access them as structure fields. There
may be a small cost in deferencing but there's a large gain in code
readability and correctness.
 
>more portable. People have been writing code to access network-defined
>or file format defined structures since C has been in existence, and
>#pragma pack is neither necessary nor sufficient for the task.
 
Whether its pragma pack or attribute packed, its used a lot in Linux.
 
$ pwd
/usr/include/linux
$ grep __attribute__ *.h | grep packed | wc -l
239
 
But what do they know?
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 15 03:54PM -0500

> On Mon, 15 Feb 2021 17:55:58 +0100
> David Brown <david.brown@hesbynett.no> wrote:
...
 
> Sorry? Its standard C. Perhaps its frowned on in C++ but I've been doing
> network programming for a couple of decades and this method is used all over
> the place.
 
The C++ rules are stricter than the C rules, but it's also a problem in
C. Type punning is standard C, but there are restrictions on when it can
safely be used. Those restrictions are defined in terms of the
"effective type" of a piece of memory. For objects with a declared type,
the effective type is the same as the declared type. For memory with no
declared type (which basically means dynamically allocated memory), the
effective type is set by the last store into that memory using a
non-character type T. If you used an lvalue of type T to store the
value, then the memory has an effective type of T. If you use methods
such as memcpy() or memmove(), to copy an entire object over into such
memory, or if you copied it over as an array of character type, that
memory acquires the same effective type as the object it was copied from.
 
The relevant rule violated by many kinds of type punning is the
anti-aliasing rule:
 
"An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types: 88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of
the object,
— a type that is the signed or unsigned type corresponding to the
effective type of the object,
— a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
— a character type." (C standard, 6.5p7).
 
Since that "shall" occurs outside of a constraints section, type punning
that violates the above rule has undefined behavior. Here's an example
that shows what can go wrong as a result of violating that rule. Given:
 
U func(T *pt, U *pu){
*pt = 0;
return *pu;
}
 
then *pt acquires the effective type of T. If U is not one of the types
permitted by the anti-aliasing rule, a compiler is not obligated to
consider the possibility that pt and pu might point to overlapping
blocks of memory. It could, therefore, delay the write to *pt until
after it has read the value of *pu. In such a simple piece of code, it's
unlikely to do so, but in more complicated code there's a very good
chance of such optimizations occurring.
 
Unions provide a way to avoid this problem (see 6.5.2.3p3, and pay
attention to footnote 95), but that way only works if the object is
question is actually of the union type, and only if the declaration of
that union is in scope at the point where the problem could otherwise occur.
 
...
>> can easily get something that works fine in your simple tests, but fails
>> in more complex situations when code is inlined, link-time optimised, or
 
> Rubbish. Maybe in Windows but that doesn't concern me.
 
It's not just Windows - compilers that take advantage of the
anti-aliasing rules to optimize code generation are quite common.
Paavo Helde <myfirstname@osa.pri.ee>: Feb 16 12:00AM +0200

>> can easily get something that works fine in your simple tests, but fails
>> in more complex situations when code is inlined, link-time optimised, or
 
> Rubbish. Maybe in Windows but that doesn't concern me.
 
FYI, the biggest "culprit" in this area has been gcc in recent years. It
is keen to optimize away things which are formally UB, like infinite
loops. For some pointer conversions it helpfully warns you that it is
planning to break your code ("dereferencing type-punned pointer will
break strict-aliasing rules"). For some other kind of UB one might not
get so lucky.
 
MSVC, on the other hand, is generally much more careful to keep alive
tons of crap code produced by hordes of cowboy programmers during last
decades, only because such code accidentally happened to work at some
time in the past.
 
And yes, this is C++, not C, the rules are different.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: