Wednesday, November 9, 2022

Digest for comp.lang.c++@googlegroups.com - 12 updates in 3 topics

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 08 03:39PM -0800

On 11/8/2022 3:29 AM, Juha Nieminen wrote:
> of, and genuinely made programming easier. C++14 and C++17 fixed and patched
> many of the minor problems and defects that turned out to exist in C++11,
> so C++17 felt like "what C++11 should have been in the first place".
[...]
 
I was really excited and happy when C++ finally made atomics and membars
part of the actual standard, C++11 iirc. Before that, I would have to
code these things up in assembly language.
"Öö Tiib" <ootiib@hot.ee>: Nov 09 12:22AM -0800

On Tuesday, 8 November 2022 at 15:26:21 UTC+2, Stuart Redmann wrote:
> > std::vector is magical class now.
> Could you elaborate this? Or maybe post a link? A quick web search yields
> nothing usable :-/
 
Before C++11 an element of vector had to be CopyAssignable and
CopyConstructible and that was constraining but doable.
 
After C++11 it was required to be Erasable and more constraints were split
between individual methods. At the same time the object lifetime grew dim
and complicated so it was unclear how the requirements can be met without
magic when just requiring Erasable. In C++14 it felt still fixable by correcting
some problems.
 
C++17 however added explicit undefined behaviors to nasty places (that
compilers can exploit in optimizations) and so made quite explicit that what
is required takes magic. The std::launder() added does look from afar that
it might help but on closer inspection does not. For example the P0532 of
Nikolai Josuttis elaborates how launder does not help at all on example of
vector.
David Brown <david.brown@hesbynett.no>: Nov 09 09:29AM +0100

On 09/11/2022 00:39, Chris M. Thomasson wrote:
>> patched
>> many of the minor problems and defects that turned out to exist in C++11,
>> so C++17 felt like "what C++11 should have been in the first place".
 
I agree with that.
 
A challenge for C++ is that even when a new and better feature is added,
the older and clumsier methods still have to be supported. This also
means that syntax can be awkward because it can't conflict with existing
syntax, and the details get more complex all the time.
 
 
> I was really excited and happy when C++ finally made atomics and membars
> part of the actual standard, C++11 iirc. Before that, I would have to
> code these things up in assembly language.
 
Standard atomics would be great if they worked for my targets. The gcc
implementations (and I haven't seen any others) for "advanced" use
(read-modify-write, or sizes larger than a standard register) is
completely broken for single-core systems, and even on multi-core
systems it is limited if you use thread priorities. The trouble with
them is that no one has addressed the elephant in the room - in general,
you need OS support and locks to implement large atomics.
Sam <sam@email-scan.com>: Nov 09 07:59AM -0500

Juha Nieminen writes:
 
> C++20, however, doesn't feel like this anymore. It has a few new features
> that genuinely help in programming, but most of it feels like just adding
> features for the sake of adding them. C++23 even moreso.
 
Or the features were specifically added to make sucky operating systems suck
a little less. Specifically: co-routines. Microsoft hijacked the
standardization process to push through co-routines, because real multiple
execution threads on MS-Windows blows chunks, and the OS can only implement
co-routines in a passable manner.
scott@slp53.sl.home (Scott Lurndal): Nov 09 02:39PM


> The trouble with
>them is that no one has addressed the elephant in the room - in general,
>you need OS support and locks to implement large atomics.
 
IFF the target architecture doesn't have a comprehensive set of
atomic access instructions, perhaps.
 
ARMv8 LSE, for example, has individual instructions for most of the
gcc atomic intrinsics (e.g. __sync_fetch_and_add will generate a single
LDADD atomic instruction). The instructions support the common
arithmetic operations (add, or, etc).
 
Before LSE, the ARMv8 implementations were built using the arm
LL/SC equivalent (load exclusive/store exclusive) instructions.
David Brown <david.brown@hesbynett.no>: Nov 09 05:05PM +0100

On 09/11/2022 15:39, Scott Lurndal wrote:
> arithmetic operations (add, or, etc).
 
> Before LSE, the ARMv8 implementations were built using the arm
> LL/SC equivalent (load exclusive/store exclusive) instructions.
 
 
You are more familiar with the details of these things than most people,
so I hope you (or someone else) will correct me if my logic below is wrong.
 
 
There's no problem when the target has a single unbreakable instruction
for the action. And LL/SC are fine for atomic loads or stores of
different sizes.
 
But LL/SC is not sufficient for read-modify-write sequences of a size
larger than can be handled by a single atomic instruction.
 
Imagine you have a processor that can atomically read or write an
unsigned integer type "uint". Your sequence for "uint_inc" will be :
 
retry:
load link x = *p
x++
if (store conditional *p = x fails) goto retry
 
 
If two processes try this, they can interleave and be started or stopped
without trouble - the result will be an atomic increment.
 
Now consider a double-sized type containing two "uint" fields:
 
retry:
load link x_lo = *p
x_hi = *(p + 1)
x_lo++
if (!x_lo) x_hi++
if (store conditional *p = x_lo fails) goto retry
*(p + 1) = x_hi
 
If the process executing this is stopped after the first write, and a
second process is run that calls a similar function, then the new
process will see a half-changed value for the object resulting in a
corrupted object. Resumption of the first process will half-change the
value again. Different combinations of using "store_conditional" on the
two stores will result in similar problems.
 
The only way to make a multi-unit RMW operation work is if other
processes are /blocked/ from breaking in during the actual write
sequence. Reads and the calculation can be re-retried, but not the
writes - they must be made an unbreakable sequence. And that, in
general, means a lock and OS support to ensure that the locking process
gets to finish.
 
 
The gcc implementation of atomic operations (larger than can be handled
with a single instruction) uses simple user-space spin locks (the lock
can be accessed atomically - with an LL/SC sequence, for the ARM).
 
If one process tries to access the atomic while another process has the
lock, it will spin - running a busy wait loop. As long as these
processes are running on different cores, there's no problem with one
core running a few rounds of a tight loop while another core does a
quick load or store. Given that contention is rare and cores are often
plentiful, this results in a very efficient atomic operation. But it
can deadlock - a process could take the spin lock and then get
descheduled by the OS, and other threads wanting the lock could be
activated. If these fill up the cores (maybe you have multiple threads
all using the same supposedly lock-free atomic structure), you are screwed.
 
And if you have only one core (like almost all microcontrollers), and
the thread that has the lock is interrupted by an interrupt routine that
wants to access the same atomic variable, you are /really/ screwed.
This can happen with such simple code as a 64-bit atomic counter in an
interrupt routine that is also accessed atomically from a background task.
 
 
It's very unlikely that you'll hit a problem, but it is possible. To
me, that is useless - atomics need guaranteed forward progress. That
means the std::atomic<> stuff needs to use OS-level locks for advanced
cases that can't be handled directly by instructions or LL/SC sequences,
or for a microcontroller you'd want to disable interrupts around the
access. The alternative is to refuse to compile the operations and only
support atomics that are smaller or simpler.
scott@slp53.sl.home (Scott Lurndal): Nov 09 05:58PM


>There's no problem when the target has a single unbreakable instruction
>for the action. And LL/SC are fine for atomic loads or stores of
>different sizes.
 
Here's the code generated by GCC for
 
q = __sync_fetch_and_add(&q, 1u);
 
 
Without LSE (atomics) support:
 
401034: 885ffc60 ldaxr w0, [x3]
401038: 11000401 add w1, w0, #0x1
40103c: 8804fc61 stlxr w4, w1, [x3]
c01040: 35ffffa4 cbnz w4, 401034 <main+0x34>
 
 
With LSE (atomics) support:
 
12c: b8e10001 ldaddal w1, w1, [x0]
 
> if (!x_lo) x_hi++
> if (store conditional *p = x_lo fails) goto retry
> *(p + 1) = x_hi
 
For such sequences, one uses the LL/SC as a spinlock;
acquire the spinlock, perform the non-atomic operation
and release the spinlock. On uniprocessor systems,
alternate mechanisms like disabling interrupts are the
common solution.
 
Although in this case, using a wider type if available is a
better option.
 
>descheduled by the OS, and other threads wanting the lock could be
>activated. If these fill up the cores (maybe you have multiple threads
>all using the same supposedly lock-free atomic structure), you are screwed.
 
This is a typical priority inheritance problem.
 
 
>And if you have only one core (like almost all microcontrollers), and
>the thread that has the lock is interrupted by an interrupt routine that
>wants to access the same atomic variable, you are /really/ screwed.
 
To be fair, the programmer should be aware of these issues and not
use mechanisms subject to deadlock. As noted above, the typical
solution is to disable interrupts during a critical section.
 
 
 
>It's very unlikely that you'll hit a problem, but it is possible.
 
Famous last words, indeed.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 09 11:47AM -0800

On 11/9/2022 12:29 AM, David Brown wrote:
> systems it is limited if you use thread priorities.  The trouble with
> them is that no one has addressed the elephant in the room - in general,
> you need OS support and locks to implement large atomics.
 
Are you referring to double-width compare-and-swap (DWCAS)? C++ should
be able to handle it directly using the processors instruction set. Say
C++ on a modern 64 bit x64 system, well CMPXCHG16B should be used for a
double word. Double word in the sense that they are two _contiguous_
words. In other words, a lock-free CAS of a double word on a 64 bit x64
should use CMPXCHG16B.
scott@slp53.sl.home (Scott Lurndal): Nov 09 08:43PM

>double word. Double word in the sense that they are two _contiguous_
>words. In other words, a lock-free CAS of a double word on a 64 bit x64
>should use CMPXCHG16B.
 
David works with low-end embedded processors, as I understand it, with
limited and/or restricted instruction sets.
olcott <polcott2@gmail.com>: Nov 09 01:12PM -0600

void D(void (*x)())
{
H(x, x);
}
 
If H correctly simulates D would the simulated D ever stop running
without being aborted?
 
A PhD computer scientist seems to believe that (ignoring stack overflow)
D correctly simulated by H would stop running and terminate normally
without having its simulation aborted.
 
*Is he correct or just trolling me*
 
--
Copyright 2022 Pete Olcott "Talent hits a target no one else can hit;
Genius hits a target no one else can see." Arthur Schopenhauer
Michael S <already5chosen@yahoo.com>: Nov 08 03:37PM -0800

> > implementations on all x86-64 systems that I ever tested.
> The MinGW (that is, MinGW64) g++ implementation was not reasonable, and
> probably isn't today either.
 
IMO, you are confused.
From the very beginning mingw64 std::chrono::steady_clock() is based on
QueryPerformanceCounter().
It's not the best possible clock available on Windows, but certainly not broken.
It has sub-microsecond resolution. Typically ~331ns.
 
Most likely you had seen std::chrono::high_resolution_clock broken (and
this one is broken indeed) and remembered that *something* is broken,
but forgot what is broken and what not.
 
 
> PS: I'm sorry that I by mistake sent essentially this reply via mail,
> presumably to an unused spam-dump e-mail address, but. Not sure how
> Thunderbird managed to trick me into doing that.
 
That's o.k.
This mail address is not totally unused, but it is used may be once or
twice per year.
"Öö Tiib" <ootiib@hot.ee>: Nov 09 06:14AM -0800

On Wednesday, 9 November 2022 at 01:37:23 UTC+2, Michael S wrote:
 
> Most likely you had seen std::chrono::high_resolution_clock broken (and
> this one is broken indeed) and remembered that *something* is broken,
> but forgot what is broken and what not.
 
From that steady_clock we are supposed to get time since arbitrary time
point of past. No years and months that OP had. That is useful only in
context of checking durations of works or time-ordering of events of
(threads of) current process. If used as (parts of) file names (like OP
seemed to want) the ordering and time distances become wrong with
restart of system.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: