Thursday, June 18, 2020

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

Juha Nieminen <nospam@thanks.invalid>: Jun 18 06:51AM

> suppose is one reason why the standard committee should be careful
> before spraying it over the standard and undermining programmers' well
> established past practices.
 
Maybe they should use two categories of UB, one that means "how this should
be compiled is completely undefined and therefore the compiler can do
whatever it wants", and another that means "the compiler must do exactly
what the code is telling to, even if the results are undefined and most
probably incorrect".
 
Dereferencing a null pointer would be of the latter category: Sure, it's
"undefined behavior", but the compiler would still have to do exactly as
told: Read what's at address 0 (or whatever a "null pointer" points to).
Don't optimize it away or do anything else.
 
Much of the leeway that the C++ standard gives compilers comes from the
same leeway that the C standard gives compilers, which originates from
the principle that the languages should be as portable as possible, even
to extremely esoteric hypothetical computers that work completely
differently from your common computer.
 
This principle might have been sound in the late 70's and early 80's,
when there was a huge variety of computer architectures and it was a
mystery which direction computer architectures would go to. Nowadays,
however, there's very little need to have theoretical support for
hypothetical esoteric architectures that don't exist and are unlikely
to ever exist.
 
(I think the standardization committee is slowly moving towards narrowing
these things. I think there has been discussion that 2's complement
arithmetic should become mandated by the standard, because there's no
practical need to keep supporting anything else.)
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 12:09PM +0100

On Thu, 18 Jun 2020 06:51:51 +0000 (UTC)
> these things. I think there has been discussion that 2's complement
> arithmetic should become mandated by the standard, because there's no
> practical need to keep supporting anything else.)
 
That's an interesting idea. However the compiler writers seem to have
significant input into the C++ standard (I think that is where the
rather odd requirements concerning the std::launder optimization
barrier came from) and I doubt you would get these ideas past their
desire for optimization opportunities.
 
I think my general view is that if you want to write close-to-the-metal
code these days you are better doing it in C rather than C++. There is
no way in the world that the C committee would break long-standing
practices with changes like the malloc/trivial type changes we have been
discussing with C++17. With C++, you have the worry that the committee
is going to saw your low-level code off at the knees again when the next
standard comes out.
 
And for higher level stuff where maximum efficiency is not needed,
garbage collected languages can do the business. Since nearly all such
languages have C FFIs, the sweet spot these days for some cases can be
C for any code which needs to have maximum efficiency and/or interact
with the hardware, with a garbage collected language on top of that for
the rest. However I imagine lots of people will disagree with that:
garbage collection does have disadvantages in terms of memory
requirements and latency and you have to be disciplined when using the
FFIs for such languages if interacting with malloc'ed memory at the C
end. Possibly C for the low-level stuff and C++ for the higher level
stuff may be an alternative choice for those who don't want garbage
collection.
Juha Nieminen <nospam@thanks.invalid>: Jun 18 11:34AM

> C for any code which needs to have maximum efficiency and/or interact
> with the hardware, with a garbage collected language on top of that for
> the rest.
 
I don't agree. C is too simplistic for large complex projects (mainly
due to its lack of RAII), and C code tends to be really complex,
hard-to-read and error-prone (no matter how much C programmers claim
otherwise).
 
Especially nowadays that data-oriented-programming (and design principles
closely related to it) is becoming more and more popular, which sees
programs becoming significantly faster and more efficient at number-crunching,
it's important to have a language that both allows for a very low-level
approach at handling and manipulating data, and at the same time provides
a way to make your code more abstract, safer and easier to use.
Bo Persson <bo@bo-persson.se>: Jun 18 01:41PM +0200

On 2020-06-18 at 08:51, Juha Nieminen wrote:
> the principle that the languages should be as portable as possible, even
> to extremely esoteric hypothetical computers that work completely
> differently from your common computer.
 
The rules were not formulated for hypothetical computers, but for the
(then) mainstream members of the 68000 family, as well as segmented
memory on Intel 286.
 
Loading an invalid pointer into an address register would trap at the
hardware level. As would trying to load a segment descriptor for a
deallocated segment.
 
> however, there's very little need to have theoretical support for
> hypothetical esoteric architectures that don't exist and are unlikely
> to ever exist.
 
There could be an opportunity to revise this now, but why would we like
to allow new code to misbehave in ways that were not possible earlier?
 
> these things. I think there has been discussion that 2's complement
> arithmetic should become mandated by the standard, because there's no
> practical need to keep supporting anything else.)
 
This is more than discussions, and already part of the upcoming C++20
standard. The compiler writers couldn't name any 1's complement hardware
they intended to support. :-)
 
 
Bo Persson
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 17 09:53AM -0400

On 6/17/20 2:47 AM, Juha Nieminen wrote:
 
>> Not in this case.
 
> What do you mean? The compiler is *always* allowed to do whatever it wants
> if something is UB.
 
No, they aren't. Even though "undefined behavior" is defined as
"behavior for which this international standard imposes no requirements"
(3.27), there is still always the requirement that the implementation
produce the behavior that Scott Newman expects it to produce. What
happens when implementations violate Scott's expectations is unclear -
so far as I know, whatever it is hasn't happened yet - but he's made it
quite clear that it's not allowed.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 17 09:53AM -0400

On 6/17/20 5:49 AM, Chris Vine wrote:
> On Wed, 17 Jun 2020 06:36:24 +0000 (UTC)
> Juha Nieminen <nospam@thanks.invalid> wrote:
...
> suppose is one reason why the standard committee should be careful
> before spraying it over the standard and undermining programmers' well
> established past practices.
 
The thing is, if those past practices were indeed well-established, that
was a serious problem. The particular case Juha was talking about
involved C code that dereferenced a pointer that might have been null,
and not bothering to test whether or not it was null until afterwards.
 
Dereferencing a pointer has never had defined behavior in any version of
the C standard, not even K&R C (though particular implementations of C
have sometimes defined it). In this particular case, the behavior
defined by the implementation for such a dereference was that it would
cause that pointer to be treated as if it was guaranteed to be non-null
until such time as it was next changed. This allowed the compiler to
speed up the program skipping tests of whether or not the pointer was
null, not even generating any code for the branch that was skipped. That
was optional behavior for that compiler, not the default behavior, and
from what I've heard, the developers had deliberately turned on that option.
 
They had the false idea that the hardware defined what the behavior
would be, and didn't bother to check. It's always the implementation
that defines the behavior, not the hardware. Persistent rumor to the
contrary notwithstanding, C is not a "portable assembly language". An
implementation is under no obligation to generate the same machine code
that you naively expect it to, so long as the code it actually generates
has the required observable behavior - and there is by definition no
required observable behavior when the code has UB.
 
Good general rule: don't write code with behavior that the relevant
standard fails to define, unless something else defines the behavior.
And if something else does define the behavior, check to make sure that
the definition it provides is what you want it to be. It's entirely your
own fault if something goes wrong due to the implementation producing
exactly the behavior it documents.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 02:40PM +0100

On Wed, 17 Jun 2020 09:53:31 -0400
> was a serious problem. The particular case Juha was talking about
> involved C code that dereferenced a pointer that might have been null,
> and not bothering to test whether or not it was null until afterwards.
[snip]
 
My reference to past practices wasn't about derefencing a null pointer,
and you have (I hope unintentionally) snipped the posting to look as if
it was. The particular context of my reference was about constructing a
trivial type in malloc'ed memory without placement new. Prior to C++17
that was generally thought to be OK and appears in Stroustrup's TC++PL
fourth edition.
 
The point with which I was agreeing, was that compilers have liberty
to, for example, optimize out undefined behaviour instead of, in the
case of a null pointer dereference, immediately crashing with a segfault
(when crashing would be kinder). They do indeed have that liberty.
Possibly all undefined behaviour is the programmer's fault. Possibly it
isn't - I would suggest that it isn't the programmer's fault that the
functions in §23.10.10.2 to §23.10.10.6 of C++17 and the equivalents in
C++20 all have undefined behaviour, not that further undefined
behaviour arises if you attempt to access other than the first item of
the returned collection.
 
Either way, Juha's point was that it would be nicer to do literally what
the programmer says in a case of undefined behaviour. It would be
nicer, but as I subsequently said such a rule is unlikely to happen.
It is unlikely to happen because compiler writers like to optimize and
more to the point programmers are programming to a virtual machine and
not to a hardware specification.
boltar@nowhere.co.uk: Jun 18 03:19PM

On Thu, 18 Jun 2020 11:34:47 +0000 (UTC)
>> with the hardware, with a garbage collected language on top of that for
>> the rest.
 
>I don't agree. C is too simplistic for large complex projects (mainly
 
Linus Torvalds would disagree with you. And so would the kernel teams at
Microsoft and Apple.
 
>due to its lack of RAII), and C code tends to be really complex,
>hard-to-read and error-prone (no matter how much C programmers claim
>otherwise).
 
Error prone, yes and no depending on the developer. Hard to read? Compared to
modern C++ its simple and elegant.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 18 11:20AM -0400

On 6/18/20 2:51 AM, Juha Nieminen wrote:
...
> whatever it wants", and another that means "the compiler must do exactly
> what the code is telling to, even if the results are undefined and most
> probably incorrect".
 
There's a fundamental problem with that - in order for any such
specification to make sense, either the standard or the implementation
needs to define what "the code is telling [the compiler] to [do]" means.
If something does define that, then the behavior is no longer undefined.
It's standard-defined or implementation-defined, respectively.
 
> "undefined behavior", but the compiler would still have to do exactly as
> told: Read what's at address 0 (or whatever a "null pointer" points to).
> Don't optimize it away or do anything else.
 
That's a good example of what I'm talking about - the standard does NOT
currently say that dereferencing a null pointer causes it to "read
what's at address 0 (or whatever a "null pointer" points to". What it
actually says is much more abstract:
 
"... the null pointer value of that type ... is distinguishable from
every other value of object pointer or function pointer type." (7.11p1)
 
Section 8.3.1p1 describes the semantics of dereferencing a pointer: "the
result is an lvalue referring to the object or function to which the
expression points.". Since a null pointer cannot point at any function
or object, neither of those cases apply, and the behavior of
dereferencing a null pointer is undefined because "... this
International Standard omits any explicit definition of behavior ..."
(3.27).
 
This means that the null pointer of a given type could be something
quite distinct from any other pointer to that type - it could, for
instance, refer to the address 0xFFFFFFFFFFFFFFFF on a machine where
valid address are required to be less than 0x8000000000000000 - there
isn't any actual memory at that location. However, on many (most?)
implementations, the null pointer of any given type does point at a
specific address in memory, and dereferencing a null pointer has the
effect of treating the memory in that location as though it contained an
object or function of that type. This is permitted, since the behavior
of such code is undefined, but only if the C++ implementation makes sure
that any object or function whose address can be obtained by well-formed
code must be distinguishable from the corresponding null pointer.
It might, for instance, reserve that address for some other use that is
hidden to user code with defined behavior.
 
...
> however, there's very little need to have theoretical support for
> hypothetical esoteric architectures that don't exist and are unlikely
> to ever exist.
 
I think the one of the LEAST likely possibilities is that all hardware
in the future will always have an architecture fully compatible with
that of our current machines. One of the committees explicit goals is to
avoid unnecessarily restricting the range of platforms where it's
possible to create a efficient fully-conforming implementation of C++.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 18 12:07PM -0400

On 6/18/20 9:40 AM, Chris Vine wrote:
 
> My reference to past practices wasn't about derefencing a null pointer,
> and you have (I hope unintentionally) snipped the posting to look as if
> it was.
 
I kept everything you wrote, and the entire paragraph written by Juha
that immediately preceded it. If that wasn't sufficient context to
clarify what you were referring to, you should have been more specific.
I certainly had no idea that you were referring to anything that wasn't
mentioned in Juha's paragraph that I quoted.
 
> ... The particular context of my reference was about constructing a
> trivial type in malloc'ed memory without placement new.
Prior to C++17
> the returned collection.
 
> Either way, Juha's point was that it would be nicer to do literally what
> the programmer says in a case of undefined behaviour.
 
"What the programmers says" when the programmer writes some C++ code is
what the C++ standard says that the code means. When the behavior is
undefined, the standard explicitly fails to say what the code means, so
"literally what the programmer says" becomes meaningless.
 
For example, consider E1 << E2 for a case where the promoted operands
both have signed integral or unscoped enumeration types. In that case,
no operator overloads apply, just the following paragraphs from section 8.8:
 
"... The behavior is undefined if the right operand is
negative, or greater than or equal to the length in bits of the promoted
left operand."
"The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are zero-filled. ... If E1 has a signed type and non-negative value, and
E1 × 2 E2 is representable in the corresponding unsigned type of the
result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined."
 
Note: in the actual standard, "2 E2" is actually a 2 followed by a
superscript E2, formatting that doesn't survive cut-and-paste into my
newsreader.
 
The behavior of E1 << E2 is NOT defined as using the target platform's
native shift instruction (if it even has one for the appropriate integer
type). Only the required behavior is specified, not how that behavior is
to be achieved, and that behavior is not specified when
 
* E2 is negative
* E2 is greater than or equal to the length in bits
* E1 is negative
* E1*2^E2 isn't representable in the corresponding unsigned type.
 
Therefore, despite claims that some people have made to the contrary,
you cannot deduce that an implementation should produce the same results
in any of the undefined cases that would have occurred if it had used
that instruction.
 
> ... It would be
> nicer, but as I subsequently said such a rule is unlikely to happen.
> It is unlikely to happen because compiler writers like to optimize
 
Compiler writers like to optimize because many compiler users like their
code optimized. You might not like any particular optimization, but I
guarantee you that if it is supported, someone does like it. If it's
optional (as many are), you can also deduce that there's someone who
doesn't like it.
 
> and
> more to the point programmers are programming to a virtual machine and
> not to a hardware specification.
 
I agree - that is very much "more to point".
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 06:05PM +0100

On Thu, 18 Jun 2020 12:07:04 -0400
> clarify what you were referring to, you should have been more specific.
> I certainly had no idea that you were referring to anything that wasn't
> mentioned in Juha's paragraph that I quoted.
 
You should have had more than an idea.
 
You cut out this of mine:
 
'Technically, constructing the 'header' object in the malloc'ed buffer
is also reputed to be undefined behaviour if you do it otherwise that
through placement new, even though 'header' is a trivial type.'
 
You cut out this from Juha:
 
'Thinking about it, it might actually have merit to worry about such
things being UB, no matter how "technically" and how obscure the
rule may be.
 
One could easily just think like "who cares if it's "technically" UB?
There's no practical implementation where it would cause anything
else than intended behavior."
 
The problem is, UB allows the compiler to do whatever it wants.
Including not doing what the programmer "intended" for it to do.
Tehcnically speaking if the compiler detects UB, it's allowed to
think "this is UB, I don't need to do anything here, I'll just skip
the whole thing and optimize it all away". Suddenly you might find
yourself with an incredibly obscure "compiler bug" where the compiler
isn't generating the code you wrote... when in fact it's not a
compiler bug at all'
 
which is exactly what I was referring to when I said I agree that is
"the problem with undefined behaviour". As a matter of comprehension,
my reference to "the problem with undefined behaviour" was clearly not
concerned only with his single paragraph of corroborative detail
concerning a null dereference in the linux kernel.
 
Your explanation is implausible.
 
> what the C++ standard says that the code means. When the behavior is
> undefined, the standard explicitly fails to say what the code means, so
> "literally what the programmer says" becomes meaningless.
[more of same snipped]
 
I know all that. I was explicitly commenting on his identification of
"the problem with undefined behaviour". I was not commenting on Juha's
proposal for what to do about it. I have already told you I don't
think that would work.
 
By the way, your time stamps seem to be wrong. Do you want to check
that you have the right time zone set?
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 06:20PM +0100

On Thu, 18 Jun 2020 18:05:04 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]
> "the problem with undefined behaviour". I was not commenting on Juha's
> proposal for what to do about it. I have already told you I don't
> think that would work.
 
In case help with context is needed on this, I hope you will have
deduced that I mean "I was explicitly commenting 'in my posting to Juha
to which you have referred' on his identification ...". I have of
course commented to you on whether his proposal would work or not (and
to him in a different posting): I don't want to start another series of
posts on that aspect which miss a point which would otherwise be
contextually clear.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 18 01:55PM -0400

On 6/18/20 1:05 PM, Chris Vine wrote:
> On Thu, 18 Jun 2020 12:07:04 -0400
> James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...
> By the way, your time stamps seem to be wrong. Do you want to check
> that you have the right time zone set?
 
The time stamp in the message you were responding to was
 
Date: Thu, 18 Jun 2020 12:07:04 -0400
 
I've got my system set to automatically set the date and time based upon
an external time source - I'm not sure which external time source it's
using - it should be configurable, but I haven't found that
configuration option.
I have the time zone set to EDT, and it doesn't change when I tell it to
use an automatically assigned time zone. When I did a Google for "time
zone", at the top of the page Google identified EDT as the correct time
zone for my area (which it identifies down to the city level by name).
It also says that EDT is GMT-4, which matches the -0400 in my timestamp.
The time of 12:07:04 for that message is consistent with what I remember
about the time when I posted that message.
I'm curious - what specific feature of my time stamp made you think it
might be wrong?
 
What I see on <https://www.time.gov/> is consistent with what my
computer says. Shortly before sending this message, that site said that
the current time is 1:55 PM.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 07:14PM +0100

On Thu, 18 Jun 2020 13:55:19 -0400
 
> What I see on <https://www.time.gov/> is consistent with what my
> computer says. Shortly before sending this message, that site said that
> the current time is 1:55 PM.
 
On looking further at the timestamps, they are OK but your posting of
Wed, 17 Jun 2020 09:53:31 -0400 was not injected into the network
until Thu, 18 Jun 2020 12:51:53 -0000, nearly 23 hours later. So I
didn't reply before you sent it as I thought, but nearly 24 hours after
you sent it. You may have a problem with your posting server.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 18 03:45PM -0400

On 6/18/20 2:14 PM, Chris Vine wrote:
>> ...
>>> By the way, your time stamps seem to be wrong. Do you want to check
>>> that you have the right time zone set?
...
> until Thu, 18 Jun 2020 12:51:53 -0000, nearly 23 hours later. So I
> didn't reply before you sent it as I thought, but nearly 24 hours after
> you sent it. You may have a problem with your posting server.
 
Oh yes, now that I know what you're talking I understand part of what's
going on. I tried to post two different messages yesterday, both of
which resulted in messages saying that there was an "NNTP error" of some
kind. I told Thunderbird to send them later, and it periodically tried
and failed to send those messages. It didn't actually succeed until
nearly a day after the problem first happened. What I don't know is why
sending of those messages failed.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 18 01:10PM -0700

On 6/16/2020 4:24 PM, Chris Vine wrote:
>> thought that POD would be different in a sense.
 
> Yes, although you don't actually need to call the destructor because
> your types are trivial.
 
Okay. For some reason I was thinking if a ctor is called, then a dtor
must be called, or the object is still considered in a constructed
state? Was wondering about UB if a dtor is not called.
 
 
> std::malloc, placement new your 'header' struct into it and (as you do
> at present) cast the buffer part to char* if you really want char*
> instead of unsigned char* for the buffer.
 
Okay.
 
 
> As I have indicated in those
> posts I think that cast is valid but you never know with C++17/20: if
> the committee don't understand the rules who are we to say.
 
Damn. Just a little rant, joking in a sense: I wonder if they consider
an unsigned char an object that needs to have its ctor called?
 
So I create an array of unsigned char from std::malloc, does each
element needs its ctor called? The array form of placement new?
 
 
> As I have also mentioned in other posts I agree with your sentiments
> about PODs/trivial types. That seems to me to be another fail in the
> standard. It ought to be valid in my view, but it isn't.
 
Imvvho, the POD case should be special in a sense. Just like C.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 18 10:12PM +0100

On Thu, 18 Jun 2020 13:10:37 -0700
 
> Okay. For some reason I was thinking if a ctor is called, then a dtor
> must be called, or the object is still considered in a constructed
> state? Was wondering about UB if a dtor is not called.
 
No, the destructor of a trivial type does not need to have its
(non-existent) destructor called. Its lifetime ends when its storage's
lifetime ends or the storage is reused, by rule.
 
> an unsigned char an object that needs to have its ctor called?
 
> So I create an array of unsigned char from std::malloc, does each
> element needs its ctor called? The array form of placement new?
 
The point here is that std::malloc does not of itself create anything
you can lawfully iterate over (ridiculous I know, but there we are). If
you use malloc to provide storage for an array you have to use array
placement new to establish the array of type in that memory, whether
trivial (such as your array of char) or non-trivial, post C++17.
Without that you cannot iterate over it lawfully, let alone dereference.
 
There is however a significant problem with this requirement relating
to array placement new. (Read with P0593 you could barely make it up:
readers may think I am being hard on the curators of the C++ standard
with my various posts on its inadequacies and breakages, but I am not.)
This problem is that the new[] expression is allowed by the standard to
request additional memory of operator new[] than the size of the array
constructed in it in order to accomodate a cookie, say for storing the
array size so that the correct number of destructors can be called by
any subsequent delete[] expression. The delete[] expression is never
applied to objects alloced by placement new - you call destructors by
hand - but the additional storage for cookies is still required. With
g++ under the Itanium ABI this cookie is the size of size_t (8 on
64bit, 4 on 32bit) for arrays of non-trivial types, and 0 for arrays of
trivial types. So with arrays of non-trivial types you need a formally
unknowable (but happily ABI defined though not implementation defined)
additional storage for the new[] expression, including placement new[]
expression. You can see where this is going.
 
So if you must use array placement new, add sizeof(size_t) for your
buffer size, and for preference (2 x size_t) to take account of possible
padding, and hope for the best.
 
But you should instead do what I suggested. Either use C (see below)
or allocate your memory as 'new unsigned char[sz]' instead of
'std::malloc(sz)'. Then it _is_ an array at its inception and you can
lawfully iterate over it by char. Also the standard explicitly allows
you to construct other objects in the array of unsigned char using
placement new (but to reference that object through the buffer with
reinterpret_cast you have to use std::launder).

> > about PODs/trivial types. That seems to me to be another fail in the
> > standard. It ought to be valid in my view, but it isn't.
 
> Imvvho, the POD case should be special in a sense. Just like C.
 
To be honest, maybe the optimal approach is to put your specialist
allocator in a .c file, compile it with gcc, put an extern C guard
around its headers and call it up in your C++ code. It is then
guaranteed by the C standard to work, save for the still unresolved
problems for C++ about non-arrays in P0593 for which no compiler vendor
in the world is going to break your code, because none of their system
libraries would be usable in C++ if they did and they would disappear
as quickly as Ratner's prawn sandwiches.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 18 02:26PM -0700

On 6/18/2020 2:12 PM, Chris Vine wrote:
> This problem is that the new[] expression is allowed by the standard to
> request additional memory of operator new[] than the size of the array
> constructed in it in order to accomodate a cookie, [...]
 
Wait a minute. Sorry for the quick response, working on some other
things right now... However, thank you Chris Vine, I now remember that
the array version of new seems to create a header of its own. It knows
how many dtors to call! Way back, many years ago, I stumbled on a
compiler error. I posted about it. Will try to find the post. Thanks
again. Will have more time tonight to give you a proper response.
 
The error was that MSVC failed to give the original allocation size.
 
[...]
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 17 04:51PM -0700

On 6/17/2020 3:30 PM, Bonita Montero wrote:
>> They do for x86. Well, except the acquire and release variants.
 
> They have acquire and release semantics for x86 and ARM
Windows can run on PPC, and they have acquire release semantics in their
Interlocked Instructions. Did you ever think about why they have the
acquire release variants for the atomic RMW's, even though they are
useless on x86? forget about the intrinsic for a moment:
 
https://docs.microsoft.com/en-us/previous-versions/windows/desktop/legacy/ms683594(v=vs.85)
 
On x86 this makes no sense because a LOCK'ed atomic RMW is a full
membar. It can be used for sequential consistency, its very expensive
but it can work.
 
Xenon in xbox 360s are from IBM and based on PPC.
 
They created the acquire release variants to get around having to use
full membars on every damn atomic RMW.
 
>> atomic RMW's.
 
> My code runs with MSVC and gcc. And both have full membars with their
> CMPXCHG-intrinsics. So I don't know why you complain something here.
only on x86. Its a bad line of thinking wrt assuming atomic RMW's always
have membars.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 17 04:52PM -0700

On 6/17/2020 3:31 PM, Bonita Montero wrote:
 
>>> That are all dead CPUs.
 
>> PPC is dead?
 
> Almost. It is mostly replaced in embedded-hardware by ARM-chips.
 
ARM's can be weakly ordered as well.
 
https://developer.arm.com/docs/100941/0100/the-memory-model
 
Its good to learn how to program for them.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 18 08:13AM +0200

> Did you ever think about why they have the
> acquire release variants for the atomic RMW's, even though they are
> useless on x86? forget about the intrinsic for a moment:
 
That's with the additional intrinsics for ARM.
 
> On x86 this makes no sense ...
 
It would make sense since acquire-behaviour isn't only the physical
behaviour of the CPU but also the logical behaviour of the compiler,
i.e. where it places the istructions before or after the physical
barrier.
 
>> My code runs with MSVC and gcc. And both have full membars with their
>> CMPXCHG-intrinsics. So I don't know why you complain something here.
 
> only on x86.
 
No, on all CPUs.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 18 08:14AM +0200

>> Almost. It is mostly replaced in embedded-hardware by ARM-chips.
 
> ARM's can be weakly ordered as well.
 
I know, but the intrinsics in my code have a full fence on all CPUs.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 18 12:46PM -0700

On 6/17/2020 11:13 PM, Bonita Montero wrote:
>> acquire release variants for the atomic RMW's, even though they are
>> useless on x86? forget about the intrinsic for a moment:
 
> That's with the additional intrinsics for ARM.
 
Indeed.
 
 
> behaviour of the CPU but also the logical behaviour of the compiler,
> i.e. where it places the istructions before or after the physical
> barrier.
 
on x86, acquire release is implied wrt atomic loads and stores, and a
full membar is implied for LOCK'ed atomic RMW's.
 
Funny think on x86, LOCK'ed atomic RMW on memory that straddles a cache
line will implement a full bus lock. Bad, but can be useful for certain
exotic algorithms.
 
 
>>> CMPXCHG-intrinsics. So I don't know why you complain something here.
 
>> only on x86.
 
> No, on all CPUs.
 
GCC has these:
 
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
 
Which can work well with weak memory models. Wrt pure C++, try to think
about coding it up where the atomic RMW's are all relaxed. Then use
atomic_thread_fence in the right places. Remember, the acquire membar
goes _after_ the atomic RMW for acquire semantics. The release membar
goes _before_ the atomic RMW for release semantics.
 
For simple atomic loads and stores:
 
load acquire, does the load, then the membar.
 
store release, does the membar, then performs the store.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jun 18 12:48PM -0700

On 6/17/2020 11:14 PM, Bonita Montero wrote:
>>> Almost. It is mostly replaced in embedded-hardware by ARM-chips.
 
>> ARM's can be weakly ordered as well.
 
> I know, but the intrinsics in my code have a full fence on all CPUs.
 
That's fine, but its not ideal. Its like creating an algorithm where
everything is seq_cst. Yes it works, but its not ideal wrt performance.
 
A full fence is expensive.
Frederick Gotham <cauldwell.thomas@gmail.com>: Jun 18 03:06AM -0700

This thread is a follow-up to two previous threads:
 
8th June - Change one function pointer in Vtable
 
https://groups.google.com/forum/#!topic/comp.lang.c++/q_QY4zNnLJ4
 
11th June - Working code for runtime Vtable alteration (MS-Windows, Linux)
 
https://groups.google.com/forum/#!topic/comp.lang.c++/ZV80wcglsNU
 
This time around I'm using Linux signals to simulate a hardware interrupt so that the original code in 'main' can remain intact.
 
I start off with the following simple program which prints incrementing numbers to the screen (either in decimal or hexadecimal):
 
 
#include <chrono> // milliseconds
#include <thread> // this_thread::sleep_for
#include <iostream> // cout, cin, endl, flush
#include <ios> // dec, hex
 
using std::cout;
using std::cin;
using std::endl;
 
struct NumberPrinter {
unsigned i;
virtual void Print(void) = 0;
};
 
struct DecimalNumberPrinter : NumberPrinter {
void Print(void) override;
};
 
struct HexadecimalNumberPrinter : NumberPrinter {
void Print(void) override;
};
 
void DecimalNumberPrinter ::Print(void) { cout << std::dec << i++ << endl; }
void HexadecimalNumberPrinter::Print(void) { cout << "0x" << std::hex << i++ << endl; }
 
auto main(void) -> int
{
NumberPrinter *p;
 
cout << "Enter 1 for Decimal, or 2 for Hexadecimal: " << std::flush;
 
unsigned choice;
cin >> choice;
 
if ( 2 == choice )
p = new HexadecimalNumberPrinter;
else
p = new DecimalNumberPrinter;
 
p->i = 0;
 
for ( ; /* ever */ ; )
{
p->Print();
 
std::this_thread::sleep_for(std::chrono::milliseconds(100u));
}
}
 
 
We can compile this single source file to a full program as follows:
 
g++ -o prog prog.cpp
 
We can run it at the command line, select option (1) for Decimal, and then if we hit Ctrl + C, it kills the program.
 
But instead of compiling it to a full program, we can instead just make an object file:
 
g++ -c prog.cpp
 
and so then we will have an object file "prog.o".
 
Without altering the original source file, I now want to introduce a second source file:
 
 
#include <csignal> /* signal */
#include <cstdint> /* see 'using std::' below */
using std::int32_t;
using std::uint32_t;
using std::uint64_t;
using std::uintptr_t;
 
void Interrupt_Routine(int); /* Defined at the bottom of this file */
 
// The next line will be executed before 'main'
void (*const executed_before_main)(int) = std::signal(SIGINT, Interrupt_Routine);
 
struct NumberPrinter {
unsigned i;
virtual void Print(void) = 0;
};
 
struct DecimalNumberPrinter : NumberPrinter {
void Print(void) override;
};
 
struct HexadecimalNumberPrinter : NumberPrinter {
void Print(void) override;
};
 
struct VTable {
void (*funcptr[1u])(void);
};
 
// The following two lines are required function declarations
extern "C" uint32_t sysconf(int32_t name);
extern "C" int32_t mprotect(uint64_t addr, uint64_t len, int32_t prot);
 
void Set_Writeability_Of_Memory(void (**const p)(void), bool const writeable)
{
uintptr_t const page_size = sysconf( 30 /*_SC_PAGE_SIZE*/);
 
union {
void *p_start_of_page;
uintptr_t i_start_of_page;
};
 
p_start_of_page = p;
 
i_start_of_page -= (i_start_of_page % page_size);
mprotect(i_start_of_page, page_size, 1u /*PROT_READ*/ | (writeable ? 2u /*PROT_WRITE*/ : 0u));
}
 
bool Try_Replace_Entry_In_VTable(VTable *const pvtable, void (*const before)(void), void (*const after)(void))
{
unsigned const how_many_pointers_to_try = 5u;
 
for (unsigned i = 0; i != how_many_pointers_to_try; ++i)
{
if ( before == pvtable->funcptr[i] )
{
Set_Writeability_Of_Memory(&pvtable->funcptr[i], true);
pvtable->funcptr[i] = after;
Set_Writeability_Of_Memory(&pvtable->funcptr[i], false);
return true;
}
}
 
return false;
}
 
void Interrupt_Routine(int)
{
static bool alternator = false;
 
alternator = !alternator;
 
DecimalNumberPrinter obj; // This object is needed for its vtable pointer
 
void (*const address_of_decimal_func)(void) = reinterpret_cast<void(*)(void)>(&DecimalNumberPrinter ::Print);
void (*const address_of_hexadecimal_func)(void) = reinterpret_cast<void(*)(void)>(&HexadecimalNumberPrinter::Print);
 
void (*const before)(void) = (alternator ? address_of_decimal_func : address_of_hexadecimal_func);
void (*const after)(void) = (alternator ? address_of_hexadecimal_func : address_of_decimal_func );
 
// - - - - - - - - Pointer to vtable might be at the beginning of the object
 
if ( sizeof(obj) < sizeof(void(*)(void)) )
return;

VTable *pvtable = reinterpret_cast<VTable*>( *reinterpret_cast<void**>(&obj) );
 
if ( Try_Replace_Entry_In_VTable(pvtable, before, after) )
{
return;
}
 
// - - - - - - - - Or let's try at the end
 
if ( sizeof(obj) < sizeof(void(*)(void))+1u )
return;
 
pvtable = reinterpret_cast<VTable*>( reinterpret_cast<char*>(&obj + 1u) - sizeof(void*) );
 
// Watch out for unaligned memory access in the next line
if ( Try_Replace_Entry_In_VTable(pvtable, before, after) )
{
return;
}
 
// - - - - - - - - Or at the end but before the final padding bytes
 
if ( sizeof(obj) < sizeof(void(*)(void))+2u )
return;
 
for (unsigned i = 0; i != (sizeof(obj) - 1u - sizeof(void(*)(void))); ++i)
{
pvtable = reinterpret_cast<VTable*>( reinterpret_cast<char*>(pvtable) - 1u );
 
// Watch out for unaligned memory access in the next line
 
if ( Try_Replace_Entry_In_VTable(pvtable, before, after) )
{
return;
}
}
}
 
 
Now if we compile these two source files to object files:
 
g++ -c prog.cpp
g++ -c improviser.cpp
 
We can then create a full program:
 
g++ -o prog prog.o improviser.o
 
If we run this program at the command line, select (1) for Decimal, and then hit Ctrl + C, it alternates between decimal and hexadecimal every time we hit Ctrl + C.
 
What I have achieved here is that I don't have to edit the object files of the original program -- which I realise isn't exactly the same as not having to edit the executable binary of the original program -- but it is certainly a step in the right direction.
 
Next what I'll have to try do is compile the original program:
 
g++ -o prog prog.cpp
 
And then somehow combine this executable binary with another object file (which won't be straight forward because the program's entry point has changed).
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: