Sunday, March 1, 2020

Digest for comp.lang.c++@googlegroups.com - 11 updates in 3 topics

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 29 07:52PM -0500

Chris Vine wrote:
 
> This helpfully identifies the source of your thinking. I am somewhat
> reluctant to continue this because I think from our previous exchanges
> it is likely to be fruitless,
I hope not but it depends on your definition of fruitless.
 
 
> §6.8/10 (an object may be accessed through a glvalue of, and
> therefore aliased by a pointer to, its dynamic type or amongst other
> things a char, unsigned char or std::byte type)
this does not seem relevant as reinterpret_cast does not try to access any
object by glvalue it simply converts that glvalue itself (the pointer); and the
subsequent attempt to access the object is done by a pointer already converted
to the "object dynamic type", so you have nothing to prove here.
 
(*) Also, just for the record, &s in the above code snippet is *not* a pointer
to an object of type std::byte which you seem to imply -- please also see below.
 
> operand - say, in an array of unsigned char or std::byte, as the
> corollary of §4.5/3).
 
> You will I suspect be inclined to dismiss all this.
No. I would rather use it to try to pinpoint the root cause of the disagreement.
I think (please correct me if I am wrong) that you believe that given this code:
 
(**)
struct T2 { double d = 2.0; }; // line 1
struct alignas(T2) T1 { char ca[8]; }; // line 2
static_assert(sizeof(T1) == sizeof(T2)); // line 3
static_assert(alignof(T1) == alignof(T2)); // line 4
T1* const p1 = new T1; // line 5
p1->~T1(); // line 6
new (p1) T2; // line 7
T2* const p2 = reinterpret_cast<T2*>(p1); // line 8
cout << p2->d * 2.0 << endl; // line 9
 
and assuming it compiles (i.e. size and alignment are ok; also let's pretend
that launder and aliasing do not exist; we may have separate disagreement about
these but IMHO they are irrelevant here), the Standard guarantees that p2 points
to a [complete] object o2 of class T2 and hence the code "will print 777".
 
To the contrary, I think that the closest thing to the above that the Standard
guarantees is that the "pointer values" of p1 and p2 are same. This is not
enough and here is the proof: (text below are direct citations, from n4849):
 
(a) [definition of memory address]:
6.7
Memory and objects
6.7.1
Memory model
1 The fundamental storage unit in the C ++ memory model is the byte. ... The
memory available to a C ++ program consists of one or more sequences of
contiguous bytes. Every byte has a unique address.
 
...
6.8.2 Compound types:
 
(b) [definition of pointer value via memory address (a value of a pointer
type])] (continuation of paragraph 3 after (3.4))
3
...
A value of a pointer type that is a pointer to or past the end of an object
represents the address of the first byte in memory (6.7.1) occupied by the
object ...
 
(c) [definition of when reinterpret_cast can be used to convert pointers to
objects] end-of
6.8.2-4:
If two objects are pointer-interconvertible, then they have the same address,
and it is possible to obtain a pointer to one from a pointer to the other via a
reinterpret_cast (7.6.1.9).
 
(d) [an example of objects' having same address (hence, by definition, pointers
to these objects have same pointer value) but being not
pointer-interconvertible], ibid
[Note: An array object and its first element are not pointer-interconvertible,
even though they have the same address. — end note]
 
BTW see (*) above about the &s from ccpreference example under discussion not
being a pointer to std::byte. Only mentioning this here as you seem to imply
(please correct me that I am wrong) that accessing an object via the result of
reinterpret_cast from a pointer to std::byte, char etc. when the object is not
pointer-interconvertible with any of these types is somehow "more legal". (I
happen to think these accesses have undefined behaviors; these types are only
discussed specially because of being minimal units of memory and having minimal
alignment).
 
You may object that in practice the pointers representing same addresses are
identical and hence differentiating between the pointers of type "pointer to T"
with same "pointer values" and those pointing to same object does not make sense
but consider these 2 counterarguments first:
 
(e) compiler can produce better (e.g. faster or smaller) code if it does not
need to guarantee any defined behavior to the code that may access objects
through a pointer obtained via reinterpret_cast.
 
(f) consider CPU architecture (simplified) with:
- 64-bit pointers and 63-bit memory addresses;
- each instruction (OP) consisting of OP code, a register operand and a memory
operand (MEND);
- CPU operating on the objects of types naturally mapping to C++ integral,
pointer and floating point types where U64 maps to uint64_6, MEND to pointer and
DBL to double
- OP codes specifying operation and operand size but not type
- two ALUs, ALU0 handling floating point objects and ALU1 all others
- most significant bit (MSB) of a MEND specifying the ALU where CPU scheduler
should direct the OP for processing; and the 63 LSBs representing the address in
memory.
 
Now try to mentally run the code snippet (**) on a computer system with the CPU
described above:
 
(g) assume for the sake of example that the operator new in line 5 returns
0x0000000000000010; (wisely basing the decision to not set MSB on knowing that
the first member of T is an array of char (i.e. not a double or array thereof)
and hence it will likely be sent to ALU0
 
(h) what should reinterpret_cast in line 8 return? It is clearly more efficient
to do nothing than do something so unless the Standard makes it to return a
pointer that is pointer-interconvertible to a pointer to double (which the
Standard does not), it will return 0x0000000000000010.
NOTE: In optimized code this reinterpret_cast will be a no-op (which is how
reinterpret_cast has always been and how it has been intended to be -- a rare
unity); compiler will simply take note that to get hold of `p2' from this point
in the source code and on it needs to use same register that it already
allocated for `p1' and to which it has already loaded the `p1' (Save a register
-- save the World (C) :-) ).
 
(i) what effect should line 9 produce? I think it may even print something..
maybe.. unless it crashes before.. but unlikely 4.. but still possible.
"Classic" UB as it is often explained in C++ primers (if explains at all).
 
(j) You may argue that this is such a weird architecture as in (f) above,
although possible (why not?), is uncommon. I do not think it is weird, but ok
for the sake of the example; but as for the commonality I will refer you to a
very broadly used i8086 in "real CPU mode" where every address in its 1-Mb
address space has a variable number of representations (usually >1 and high)
depending on its value and, furthermore, all representations of an address can
be used for some purposes but only certain -- for other purposes.
 
> similar example (using std::uninitialized_copy()) and I suspect he
> knows more about it than you do.
 
> Thirdly you might then respond "that's an argument ad verecundiam",
Yes. People make mistakes. A productive person makes more mistakes than an
average person. Of course s/he also produces many more useful
goods/services/ideas than an average person.
 
> taken the effort to cover this, particularly given that the standard
> explicitly permits the construction of objects in arrays of unsigned
> char or std::byte.
 
Let me try address this argument. Its essence is (please correct me if I am
wrong), that it impossible to access the results of placement new without
stashing the returned pointer or using reinterpret_cast somewhere and by that
over-complicating the program. I happen to believe that it is, on the contrary,
possible wherever the desired behavior is actually achievable in C++. I would
not take on proving such a broad statement "in general" but I am willing to
solve a specific puzzle (i.e. refactor given code with reasonable *expected*
behavior to ditch the use of reinterpret_cast on not pointer-interconvertible
objects and (hopefully, maybe) even improve its readability) as long it is short
and to the point and do it without resorting to stashing the result of the
placement new (other than maybe for illustration).
 
As an example, below is a copy of the code from the thread "Union type punning
in C++ redux" where the OP asked about a variant class code they implemented
with reinterpret_cast as follows:
 
// original OP's code begins ----------------
#include <string>
#include <new>
#include <algorithm>
#include <cstring>
 
struct B
{
uint8_t tag;
uint8_t extra;
uint64_t n;
 
B(uint64_t n, uint8_t extra = 0)
: tag{1}, n(n), extra(extra)
{
 
}
};
 
struct C
{
uint8_t tag;
double d;
 
C(double d)
: tag{2}, d(d)
{
 
}
};
 
class V
{
static constexpr size_t data_size = std::max(sizeof(B), sizeof(C));
static constexpr size_t data_align = std::max(alignof(B), alignof(C));
 
typedef typename std::aligned_storage<data_size, data_align>::type data_t;
 
data_t data_;
 
public:
V(uint8_t n)
{
::new(&data_) B(n);
}
V(double d)
{
::new(&data_) B(d);
}
~V()
{
switch (tag())
{
case 1:
reinterpret_cast<const B*>(&data_)->~B();
break;
case 2:
reinterpret_cast<const C*>(&data_)->~C();
break;
default:
break;
}
}
uint8_t tag() const
{
uint8_t t;
std::memcpy(&t, &data_, sizeof(uint8_t));
return t;
}
};
// original OP's code ends ----------------
 
// the replacement for code of V begins ----------------
// (the rest of the original code can be used "as-is";
// you will notice I changed the type of parameter
// `n' of V2() but that is only because I believed
// it was the typo on behalf of the OP so that change seems
// irrelevant to the topic
class V2 {
union {
B b;
C c;
};
public:
V2(uint64_t n): b(n) { }
V2(double d): c(d) { }
~V2() {
switch (tag()) {
case 1:
b.~B();
break;
case 2:
c.~C();
break;
default:
break;
}
}
uint8_t tag() const { return b.tag; }
};
// replacement for code of V ends ----------------
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 29 07:56PM -0500

Pavel wrote:
 
[snipped]
> that launder and aliasing do not exist; we may have separate disagreement about
> these but IMHO they are irrelevant here), the Standard guarantees that p2 points
> to a [complete] object o2 of class T2 and hence the code "will print 777".
[snipped]
correction, I meant to say the code "will print 4".
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Mar 01 11:06AM

On Sat, 29 Feb 2020 19:52:18 -0500
> To the contrary, I think that the closest thing to the above that the Standard
> guarantees is that the "pointer values" of p1 and p2 are same. This is not
> enough and here is the proof: (text below are direct citations, from n4849):
 
[snip]
 
> If two objects are pointer-interconvertible, then they have the same address,
> and it is possible to obtain a pointer to one from a pointer to the other via a
> reinterpret_cast (7.6.1.9).
 
[snip]
 
 
> (e) compiler can produce better (e.g. faster or smaller) code if it does not
> need to guarantee any defined behavior to the code that may access objects
> through a pointer obtained via reinterpret_cast.
 
The C++ standard does very definitely guarantee that the pointer values
are the same - §8.2.9/13 of C++17 read with the other provisions to
which I have referred. Were the types to be pointer-interconvertible a
reinterpret_cast would be enough - §8.2.9/13. If they are not, as here,
then you have to use std::launder also (I am now satisfied that
cppreference.com is right about that and my complaints on that were
wrong). §21.6.4 directly covers this case. It prevents the kind of
optimization to which you refer.
 
In fact, until C++23 comes out, constructing nested objects using
placement new within char arrays constructed on free store and then
accessing them with reinterpret_cast and std::launder is the only
technically correct way you can construct containers which allocate
their memory dynamically, including implementing std::vector:
https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
 
I am satisfied that you are wrong and everyone else is right.
 
Try to trim your posts to a reasonable size.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Mar 01 11:44AM

On Sun, 1 Mar 2020 11:06:08 +0000
> The C++ standard does very definitely guarantee that the pointer values
> are the same - §8.2.9/13 of C++17 read with the other provisions to
> which I have referred.
 
By the way the pointer values to which I was referring were those in
the cppreference.com. Your version has the problem that the type
providing storage is neither the same type as the replacement object nor
an array of unsigned char or std::byte.
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Mar 01 02:02PM -0500

Chris Vine wrote:
> the cppreference.com. Your version has the problem that the type
> providing storage is neither the same type as the replacement object nor
> an array of unsigned char or std::byte.
 
I addressed the insignificance of the difference for the matter in hand in my
previous point.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Mar 01 10:13PM

On Sun, 1 Mar 2020 14:02:00 -0500
> > an array of unsigned char or std::byte.
 
> I addressed the insignificance of the difference for the matter in hand in my
> previous point.
 
I don't think so. Your version has undefined behaviour. Instead of
assigning and dereferencing a pointer value your program could play
Yankee Doodle on your sound card. On the other hand, the
cppreference.com code (constructing the object nested in an array of
std::byte and accessing it through std::launder) has defined behaviour.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Mar 01 10:26PM

On Sun, 1 Mar 2020 22:13:32 +0000
> Yankee Doodle on your sound card. On the other hand, the
> cppreference.com code (constructing the object nested in an array of
> std::byte and accessing it through std::launder) has defined behaviour.
 
I should add that I still think that this distinction made in C++17
between (i) pointer-interconvertible types and (ii) types constructed at
the same address with placement new in an array of unsigned char or
std::byte (which is explicitly permitted by the standard), is very
unfortunate - thank you for indirectly drawing my attention to this, as
I was not previously alive to it.
 
This is because I cannot see why in case (ii), applying
reinterpret_cast is insufficient and you need also to apply
std::launder. It should be obvious to the compiler that placement new
mutates memory and take it from there. But probably there are some
desirable compiler optimizations behind it.
David Brown <david.brown@hesbynett.no>: Mar 01 02:46PM +0100

On 29/02/2020 08:34, Mr Flibble wrote:
> by mistake. NEVER send password in the clear from the CLIENT computer
> even over encrpypted links.
> Clue: you can hash a hash.
 
Exactly. It is always surprising how many people miss this point. (And
it should be needless to add that you should pick a good, random salt.)
 
The bad guys don't break into Facebook, Google, banks, and the like by
hacking those servers and reading out their data. They break into some
numpty web site that has stored user names, email addresses and clear
text passwords (because the muppet running them thinks Wireguard or DH
is magic that keeps it all safe). Then they guess that the user has the
same password on other services.
 
In virtually all cases, a standard https connection is secure enough for
transferring the password hash from the client to the server. It is
almost certainly better than making up your own system because you have
heard that Diffie-Hellman key exchange is good, or because you have read
about Wireguard while totally misunderstanding its purpose.
 
And after the initial transfer of the hash, you never have to transfer
it again, meaning the bad guys only have once chance to eavesdrop on it.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Mar 01 12:59PM -0800

On 3/1/2020 5:46 AM, David Brown wrote:
> same password on other services.
 
> In virtually all cases, a standard https connection is secure enough for
> transferring the password hash from the client to the server.
 
Agreed.
 
 
> It is
> almost certainly better than making up your own system because you have
> heard that Diffie-Hellman key exchange is good
 
Actually, DH key exchange can be very dangerous. One needs to know how
to use it. Large secure primes come to mind.
 
 
 
, or because you have read
David Brown <david.brown@hesbynett.no>: Mar 01 10:47PM +0100

On 01/03/2020 21:59, Chris M. Thomasson wrote:
>> you have heard that Diffie-Hellman key exchange is good
 
> Actually, DH key exchange can be very dangerous. One needs to know how
> to use it.  Large secure primes come to mind.
 
Friendly reminder - it was /you/ who brought up DH!
 
Anything in the cryptographic world can be "dangerous". A little
knowledge can be a very bad thing here. You are far better off
forgetting you ever heard of "Diffie-Hellman" or key exchange - use a
good TLS implementation and let people who know what they are doing
implement the security. (The authors of such software are not
infallible either, so track updates and fixes for the software.)
gazelle@shell.xmission.com (Kenny McCormack): Mar 01 02:27PM

In article <877e0dajow.fsf@nosuchdomain.example.com>,
>[...]
 
>David, I never see this person's posts unless someone like you insists
>on quoting them. Please stop bypassing my filters.
 
This is about the most "beta" response/post I've ever seen on Usenet.
 
Blaming another poster for the inadequacies of your killfile mechanism?
 
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/GodDelusion
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: