Saturday, February 20, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

Manfred <noname@add.invalid>: Feb 17 05:20PM +0100

On 2/16/2021 10:35 PM, Chris Vine wrote:
> obscure technical rules launched at programmers because a compiler
> vendor has asserted that it might make 1% of code 0.5% faster seems to
> me to be the wrong balance.
 
Valid point.
std::launder is IMO a good example of bad design.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 17 11:22AM -0500

> from one pointer type to another the resulting pointer has the same
> memory address and I'm trying to think in what circumstances it
> wouldn't. I've asked for an example and no one has provided one.
 
No one is suggesting that the pointer would point to a different memory
address (though that is in fact a potential problem in some cases - see
below). They're saying that you cannot safely use the pointer that
results from the conversion to access the memory it points at.
Implementations are allowed to perform optimizations that ignore the
possibility that two pointers to different types might point at the same
location in memory, depending upon what relationships those two types
have to each other.
 
With regards to the "same memory address": conversion of a pointer to
one type into a pointer to a different type which has alignment
requirements violated by the original pointer has undefined behavior,
which in particular allows for the possibility that the resulting
pointer does not point at the same memory location.
As an example of the reason why this rule exists, there have been
implementations targeting machines with large word sizes which have
pointers to word-aligned types that have fewer bits (and even, in some
cases, fewer bytes) than pointers to types with smaller alignment
requirements. Conversion of a pointer that doesn't point at the
beginning of a word to a pointer type that can only represent positions
at the beginning of a word CANNOT result in a pointer to the same location.
 
<pedantic>Even when pointer conversions have defined behavior, in
general that definition only says that if the resulting pointer value
gets converted back to the original pointer type, it will compare equal
to the original.
There are subtle differences between the C and C++ standards about these
issues, but neither standard specifies where the resulting pointer
points except in some special cases. Personally, I think they should
specify that it points at the same location, but they don't. The C
exceptions are easier to describe, so I'll list them here:
1. converting a pointer to an array into a pointer of the element type
of the array results in a pointer pointing at the first element of the
array.
2. Converting to a pointer to a struct type into a pointer to the
pointer to the type of the first member of the struct results in a
pointer to that member.
3. Conversion of a pointer to a pointer to character type results in a
pointer to the first byte of the object.
Each of these conversions are reversible.
</pedantic>
David Brown <david.brown@hesbynett.no>: Feb 17 09:35PM +0100

On 17/02/2021 19:43, Manfred wrote:
 
>>> Except for any cache line(s) evicted as a result of the memcpy, which
>>> may indeed have an overall performance cost.
 
> I think this still does not solve the issue at the language level.
 
What issue? The "cache lines" Scott referred to don't exist at the
"language level".
 
> The standard does not mandate the behaviour that is shown by your
> example, so even if compilers do compensate for inefficiency of the
> code, this does not make the language good.
 
That is correct. The standard only mandates that they will work with
the same effect (if the non-standard pragma were removed).
 
> less efficient than getb1 because it introduces some unneeded extra
> storage and an extra function call - albeit at the level of the abstract
> machine, with no benefit in readability or robustness against bugs.
 
In these examples, getb1() is the simplest and clearest. Consider
instead that we had:
 
int32_t getb1(const void* p) {
return *(const int32_t *)p;
}
 
int32_t getb2(const void* p) {
int32_t x;
memcpy(&x, p, sizeof x);
return x;
}
 
(and so on for the others).
 
The results from gcc are the same - a single "mov" instruction. Here
getb2() /does/ have an advantage over getb1() in that it is valid and
fully defined behaviour even if the parameter did not point to a
properly aligned int - perhaps it points into a buffer of unsigned char
of incoming raw data. In that case, it /is/ more robust - it will work
even if changes to the code and build process (such as using LTO) give
the compiler enough information to know that a particular call to
getb1() is undefined behaviour and can lead to unexpected failures. I
don't like code that appears to work in a simple test case, but has
subtle flaws that mean it might not work in all cases.
 
And no, at least for this compiler, getb2() is not less efficient than
getb1(). I really don't care about a couple of microseconds of the
compiler's time - the efficiency I care about is at run-time.
 
(There's no argument about the readability.)
 
 
> The fact that the compiler puts a remedy to this by applying some
> operation that is hidden to the language specification does not make the
> language itself any more efficient.
 
This particular branch of the thread was in response to Scott's strange
claim that using memcpy() in the code caused cache lines to be evicted.
David Brown <david.brown@hesbynett.no>: Feb 17 09:41PM +0100

On 17/02/2021 18:28, Chris Vine wrote:
> (In fact, in C++17 it has to be an intrinsic because memcpy cannot be
> implemented as a function using standard C++17 without undefined
> behaviour, but that is a bug in the standard rather than a feature.)
 
That is not quite accurate. In gcc, memcpy with known small sizes
(usually regardless of alignment) is handled using a built-in version
that is going to be as good as it gets - memcpy(&a, &b, sizeof a) is
going to be roughly like "a = b" (but skipping any assignment operator
stuff). That won't necessarily apply to all other compilers, though gcc
is not alone in handling this.
 
I don't think memcpy can be implemented (or duplicated) in pure C or C++
of any standard, with all aspects of the way it copies effective types,
but I am not sure on that. However, that doesn't mean it has to be an
"intrinsic" - it just means it has to be implemented using extensions in
the compiler, or treated specially in some other way by the tool.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 17 01:07PM -0800

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]
> generally not a problem: the implementation defines the behavior that
> the standard leaves undefined, in precisely the way you presumably
> thought it was required to be defined.
 
Do most implementations actually *define* (i.e., document) the behavior
of passing a non-void* pointer with a %p format specifier? I haven't
checked, but it's plausible that most of them implement it in the
obvious way but don't bother to mention it. Since the behavior is
undefined, not implementation-defined, implementations are not required
to document it.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 17 10:49PM -0500

On 2/17/21 4:07 PM, Keith Thompson wrote:
>> thought it was required to be defined.
 
> Do most implementations actually *define* (i.e., document) the behavior
> of passing a non-void* pointer with a %p format specifier?
 
I meant "define" only in the sense that they actually do something
useful, not that they've necessarily publicized that fact. In practice,
on systems where all pointer types have the same representation, they'd
have to go out of their way to make such code break.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 17 10:57PM -0500

On 2/17/21 1:43 PM, Manfred wrote:
> On 2/17/2021 5:30 PM, David Brown wrote:
>> On 17/02/2021 17:00, Scott Lurndal wrote:
...
> The standard does not mandate the behaviour that is shown by your
> example, so even if compilers do compensate for inefficiency of the
> code, this does not make the language good.
 
The point is, the standard doesn't mandate that behavior for ANY of the
functions he defined. The standard doesn't even address the issue,
beyond giving implementations the freedom to generate any code that has
the same required observable behavior.
 
> The fact that the compiler puts a remedy to this by applying some
> operation that is hidden to the language specification does not make the
> language itself any more efficient.
 
The language itself is efficient because it's been quite deliberately
and carefully designed to allow implementations that are as efficient as
this one is. It's also inefficient, in that it doesn't prohibit
implementations that would convert all four functions into the same code
you'd naively expect to see generated for getb3().
David Brown <david.brown@hesbynett.no>: Feb 18 09:05AM +0100

On 17/02/2021 22:17, Chris Vine wrote:
> And I don't think that C++03 was the source of this but if it was, I am
> absolutely certain that the authors of C++03 didn't think they were
> declaring past (and future) accepted practice to be invalid.
 
To my understanding (and I freely admit I didn't follow the changes in
C++ over time to the same extent as I did C), one of the changes in
C++03 was to give a clear (well, as clear as anything in these standards
documents...) specification of the memory model. Until then, a lot more
had been left up to chance of implementation.
 
At this time, compilers were getting noticeably smarter and doing more
optimisation - and processors were getting more complex (speculative
execution, out of order, multiprocessing, and so on). The old "it's all
obvious, right?" attitude to memory and object storage was not good enough.
 
It was not about trying to make old code invalid, it was about trying to
say what guarantees the compiler had to give you no matter what
optimisations it used or what processor you ran the code on. And like
many aspects of the C and C++ standards, it was based on trying to
understand what compilers did at the time, and what a substantial number
of programmers wrote - trying to get a consistent set of rules from
existing practice. Of course it is inevitable that some existing
practices and compilers would have to be changed.
 
(Again, I am /not/ claiming they got everything right or ideal here.)
 
> want to construct an object in uninitialized memory and cast your
> pointer to the type of that object? Fine, you complied with the
> strict-aliasing rule.
 
The "-fno-strict-aliasing" flag in gcc is a great idea, and I would be
happy to see a standardised pragma for the feature in all (C and C++)
compilers. But it is not standard. So in the beginning, there was the
"strict aliasing rule" ("effective type rules" is perhaps more accurate,
or "type-based aliasing rules"). There was no good standard way to get
around them - memcpy was slow (compilers were not as smart at that
time), there was no "-fno-strict-aliasing" flag to give you guarantees
of new semantics, and even union-based type punning was at best
"implementation defined". People wrote code that had no
standards-defined behaviour but worked in practice because compilers
were limited.
 
The code was /wrong/ - but practicalities and real life usually trump
pedantry and nit-picking, and code that /works/ is generally all you need.
 
 
Later C and C++ standards have gradually given more complete
descriptions of how things are supposed to work in these languages.
They have added things that weren't there in the older versions. C++03
did not make changes so that "X x(1); new (&x) X(2); x.foo();" is
suddenly wrong. It made changes to say when it is /right/ - older
versions didn't give any information and you relied on luck and weak
compilers. And C++17 didn't change the meanings here either - it just
gave you a new feature to help you write such code if you want it.
 
 
> Examples appeared in all the texts and websites,
> including Stroustrup's C++PL 4th edition. No one thought C++03 had the
> effect you mention.
 
Lots of C and C++ books - even written by experts - have mistakes. They
also have best practices of the day, which do not necessarily match best
practices of now. And they are certainly limited in their prediction of
the future.
 
Remember, the limitation that is resolved by adding std::launder stems
from C++03 (because before that, no one knew what was going on as it
wasn't specified at all), but the need for a fix was not discovered
until much later. That's why it is in C++17, not C++03. C++ and C are
complex systems - defect reports are created all the time.
 
> conceptual error, casting (!) the compiler's job (deciding whether code
> can be optimized or not) onto the programmer. Write a better compiler
> algorithm or don't do it at all.
 
I appreciate your point, and I am fully in favour of having the compiler
figure out the details rather than forcing the programmer to do so. I
don't know if the standard could have been "fixed" to require this here,
but certainly that would have been best.
 
However, it is clear that the use-cases of std::launder are rare - you
only need it in a few circumstances. And the consequences of letting
the compiler assume that const and reference parts of an object remain
unchanged are huge - devirtualisation in particular is a /massive/ gain
on code with lots of virtual functions. Do you think it is worth
throwing that out because some function taking a reference or pointer to
an object might happen to use placement new on it? How many times have
you used classes with virtual functions in your code over the years?
How many times have you used placement new on these objects?
 
If people needed to add std::launder a dozen times per file, I could
understand the complaints. If the new standards had actually changed
existing specified behaviour, I could understand. If they had changed
the meaning or correctness of existing code, I'd understand. (And that
/has/ happened with C++ changes.) But here it is just adding a new
feature that will rarely be needed.
David Brown <david.brown@hesbynett.no>: Feb 18 10:09AM +0100

On 17/02/2021 22:05, Bo Persson wrote:
 
 
> https://en.cppreference.com/w/cpp/numeric/bit_cast
 
> constexpr double f64v = 19880124.0;
> constexpr auto u64v = std::bit_cast<std::uint64_t>(f64v);
 
Yes, std::bit_cast will be useful in some cases.
 
I don't know if the practice will be much neater than memcpy for cases
such as reading data from a buffer - you'd still need somewhat ugly
casts for accessing the data (such as to reference a 4-byte subsection
of a large unsigned char array). Of course that kind of stuff can be
written once in a template or function and re-used.
 
It would have the advantage over memcpy of being efficient even on
weaker compilers, as it will not (should not!) lead to a function call.
But are there compilers that can't optimise simple memcpy and also
support C++20?
 
I think the clearest use-case for bit_cast will be in situations where
you would often use union-based type punning in C:
 
uint32_t float_bits_C(float f) {
union { float f; uint32_t u; } u;
u.f = f;
return u.u;
}
 
uint32_t float_bits_Cpp(float f) {
return std::bit_cast<uint32_t>(f);
}
 
or if you want to combine them :-) :
 
uint32_t float_bits(float f) {
union { float f; uint32_t u; } u;
u.f = f;
#ifdef __cplusplus
u.u = std::bit_cast<uint32_t>(u.f);

No comments: