soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Thread-safe initialization of static objects - 25 Updates

Thread-safe initialization of static objects

Michael S <already5chosen@yahoo.com>: Sep 21 02:59AM -0700

On Thursday, September 21, 2023 at 12:03:01 PM UTC+3, Bonita Montero wrote:

> > Actually, considering that contention is very rare in practice, ...

> The initialization might last an arbitrary time, so spinning
> is inacceptable.

Why is it unacceptable? The impact of needless wake up of the waiting thread
once per tick is negligible in terms of system performance and not too
horrible in terms power consumption. The impact of needless sleep till the
next tick is also insignificant under assumption that initialization took long time.

I'd call it non-ideal when better solutions available, but perfectly acceptable otherwise.

> > So even use of Futex | Critical Section | SRW lock, although it make
> > a lot of sense in practice, is not necessary from theoretical P.O.V.
> Why should a SRW-lock make sense here ?

Because SRW lock in exclusive mode is better critical section than critical
section. On newer versions of Windows critical sections exist purely for
backward compatibility.
That's what I was told unofficially by knowledgeable Microsoft guy.

> If the object isn't initialized
> iz can't be read meanwhile. And a futex isn't a replacement for a mutex
> but just a faster slow path for a mutex.

Are you sure?
My impression was that fast path (uncontended case) is also faster due
to absence of syscall overhead.

David Brown <david.brown@hesbynett.no>: Sep 21 01:19PM +0200

On 21/09/2023 11:59, Michael S wrote:
> horrible in terms power consumption. The impact of needless sleep till the
> next tick is also insignificant under assumption that initialization took long time.

> I'd call it non-ideal when better solutions available, but perfectly acceptable otherwise.

I don't know why Bonita dislikes spinning and sleeping, but I know why
/I/ don't like it as a solution. Sleeping does not affect thread
priority in the way mutexes do. That means if one low priority thread
starts the initialisation and takes the lock, then gets pre-empted by a
higher priority thread that tries to take the lock, your whole system
can deadlock. (The high priority thread may be sleeping, but there may
be a mid-priority thread that can run and blocks the low-priority thread
from running.)

This is going to be a more likely failure situation if you have fewer
(or only one) cores. But the same compilers and toolchains are used for
a wide variety of systems - you can't pick a solution that might fail on
some systems. And note that these kinds of failures never turn up in
testing - they are rare, and only happen at the worst possible moment
after deployment.

This is particularly a risk for real-time systems and embedded systems,
where you are more likely to have fewer cores (perhaps only one), or
maybe some of your cores are dedicated to real-time tasks and
unavailable for other threads. You will also likely have high priority
and real-time priority threads, which will block the low-priority
threads. And on such systems there might be very serious costs or
safety risks involved in system hangs - you want to be sure that they
are not possible, to the best of practical abilities, rather than
designing in a solution that has a definite non-zero chance of failure.

Using mutexes avoids this problem, because if a high priority thread is
waiting on a mutex held by a low priority thread, the holder's priority
gets boosted to that of the high priority thread until it releases the lock.

Spinning should only be used when you have full control of the
priorities of threads that may take the spin lock, and can be sure it is
safe.

Richard Damon <Richard@Damon-Family.org>: Sep 21 07:36AM -0400

On 9/20/23 11:42 PM, Bonita Montero wrote:
>> other threads get a chance to run, and then check the initialization
>> status again.

> A yield yould be to inefficient.

Why?

You can't do anything until the object you are waiting to be initialized
gets initialized, so your "efficiency" isn't important.

Remember, the ONLY reason you need to wait is because you came to an
object that is actively in the process of being initialized, so a fairly
rare condition.

If it hasn't been initialized yet, you don't wait, you just claim the
initialization and go, or if the objects initialization is complete, you
just use it.

Also, Yield can't be less efficient than blocking on a Mutex, as the
implementation of the mutex blocking needs to yield too.

As has elsewhere been mentioned, you typically wait a little bit in a
spin wait to see if it will be quick or slow, so "Yield" is only done on
the "Slow" path, and it must be ok to be slow on the slow path, as it is
the slow path.

Richard Damon <Richard@Damon-Family.org>: Sep 21 07:42AM -0400

On 9/21/23 4:03 AM, Bonita Montero wrote:
> return c; } ) == end( mem );
> cout << (zero ? "all zero" : "has non-zero") << endl;
> }

As it must by the standard.

Note, depending on the OS and implementation, that zeroing will either
be done by having the loaded program image just contain a great big
block of zeros, or (to save image size at the cost of CPU time) the
system might just put "static" objects that need to be zero initialized
into one common segment and zero-fill as an extension of the loading
process, or in some cases the OS can just promise that new segments for
the program are just always zero-filled.

Michael S <already5chosen@yahoo.com>: Sep 21 05:01AM -0700

On Thursday, September 21, 2023 at 2:19:59 PM UTC+3, David Brown wrote:

> Spinning should only be used when you have full control of the
> priorities of threads that may take the spin lock, and can be sure it is
> safe.

The problem you mention does not apply to general-purpose systems
like Windows, Linux, BSD-linage Unixes, Solaris e.t.c. They all have
built-in avoidance of deadlocks caused by priority-inversion. Most
commonly it's done by applying random priority boosts.

Also I don't see how mutex semantics can possibly help. The common
problem scenario is that low-priority thread that is doing initialization is
preempted when it *does not* hold the mutex. The mutex is held for
two very short durations of flags update. The preemption is far more
likely to happen during the middle phase - constructor itself.

On real-time system with 1 or 2 processors you probably want
completely different design in which a mutex is held during all duration
of initialization, but you can't expect such solution to be included
in g++/clang++/MSVC compiler support libraries.

Which again brings us to the point made by myself and few others:
don't do it! Don't use function-local static objects with constructors.
Or, at least don't use them in multitasking scenarios. Compiler's one
time initialization infrastructure can be super-robust, but being generic
it's unlikely to be optimal answer in any concrete scenario.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 02:34PM +0200

Am 21.09.2023 um 11:59 schrieb Michael S:

> Why is it unacceptable? ...

Because the initializing thread might be scheduled away, thereby
keeping other threads spinning.

> I'd call it non-ideal when better solutions available, but perfectly acceptable otherwise.

It's inacceptable.

> Because SRW lock in exclusive mode is better critical section than critical
> section. ...

No, if you only lock exclusively there's no difference.

> Are you sure?

The fast path of mutexes is fast anyway, there's no need for further
speed ups. The futex accellerates the slow path and is a replacement
for the binary semaphore attached to a mutex.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 02:37PM +0200

Am 21.09.2023 um 13:19 schrieb David Brown:

> I don't know why Bonita dislikes spinning and sleeping, ...

I only dislike spinning in userspace except when there's a
limit to go for the slow path. The thread holding a spinlock
can be scheduled away an arbitrary time, that's unacceptable.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 02:40PM +0200

Am 21.09.2023 um 13:36 schrieb Richard Damon:

> Why?

Because the initialization might only take a microsecond or less
and a yield is a full timeslice, on Linux and Windows usually one
millisecond.

David Brown <david.brown@hesbynett.no>: Sep 21 03:03PM +0200

On 21/09/2023 14:01, Michael S wrote:
> like Windows, Linux, BSD-linage Unixes, Solaris e.t.c. They all have
> built-in avoidance of deadlocks caused by priority-inversion. Most
> commonly it's done by applying random priority boosts.

As far as I know, such random priority boosts will not boost a
non-realtime priority thread to real-time priority on Linux (and
presumably not on other systems). That would negate the whole concept
of realtime priority threads.

> completely different design in which a mutex is held during all duration
> of initialization, but you can't expect such solution to be included
> in g++/clang++/MSVC compiler support libraries.

Yes, you would need the mutex to be held during critical stages of the
initialisation.

I can understand that this would be expensive compared to spinlocks that
would work fine for "normal" systems (especially with the random boosts
you described). And I understand that toolchains have to be optimised
for normal situations, not things that are extremely rare even in
particular niche situations.

But getting threading, locking and synchronisation details right is very
difficult - most programmers don't get it right. But you can't tell
that you have a problem by testing the code, as failures typically
require extreme bad luck in timing. So it always worries me when I see
something that is hidden, which most programmers will assume "just
works, by some compiler magic", but which can be a problem in some
circumstances.

(As another example, gcc's "libatomic" uses spinlocks - if you use
std::atomic types on a microcontroller in situations where atomics would
be useful, they will hang your system any time there is a coincidental
access.)

> Or, at least don't use them in multitasking scenarios. Compiler's one
> time initialization infrastructure can be super-robust, but being generic
> it's unlikely to be optimal answer in any concrete scenario.

Yes, that is one option. You can also use "constinit" statics safely.
And gcc has a "-fno-threadsafe-statics" option (also useable as a
pragma, but unfortunately not as a neater __attribute__) that puts the
responsibility of ensuring things are threadsafe onto the user.

David Brown <david.brown@hesbynett.no>: Sep 21 03:05PM +0200

On 21/09/2023 14:40, Bonita Montero wrote:

> Because the initialization might only take a microsecond or less
> and a yield is a full timeslice, on Linux and Windows usually one
> millisecond.

"Yield" won't take a timeslice unless there is something else of the
same priority, waiting to run on the same core. And if that's the case,
then fine - that thread is clearly equally important.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 04:17PM +0200

Am 21.09.2023 um 15:05 schrieb David Brown:

> "Yield" won't take a timeslice unless there is something else of the
> same priority, waiting to run on the same core. ...

Ok, you're right, but the code would be still unacceptable under load.

Michael S <already5chosen@yahoo.com>: Sep 21 07:30AM -0700

On Thursday, September 21, 2023 at 4:03:46 PM UTC+3, David Brown wrote:
> non-realtime priority thread to real-time priority on Linux (and
> presumably not on other systems). That would negate the whole concept
> of realtime priority threads.

If you have enough of ready real time threads to occupy all CPUs for
unacceptably long time then, indeed, you have a problem.
But to me it looks unlikely that deadlock due to priority inversion is your
main problem in such scenario.

> I can understand that this would be expensive compared to spinlocks that
> would work fine for "normal" systems (especially with the random boosts
> you described).

What's advocated here by most posters and what's seems to be implemented
by major toolchains is not a choice between speenlocks and one or pair of mutexes
or equivalents. The spinlocks is mere something that we mention to argue against
Bonita's suggestion that the C++ Standard in its current form can be impossible
to implement.
Bonita aside, the real choice is between 4 options:
(A) Mutex per object. Held for all duration of the constructor.

(B) Serializing all initialization of function-local static objects with one global mutex.
Again, the mutex held throughout all duration of the constructor.

(C) Few flags and possible a little more of auxiliary states (like thread Id in my
proposed solution) per objects. Access to flags is guarded by global mutex,
but initialization itself is not guarded. The are several possible solution for
wake-up of contending threads of waiting for finish of initialization. Among such
solutions, spinlocks, either classical or amended by sleep() call are possible, but
least likely on full-featured OSes. The most natural solution [on f-f OS] is wait on
per-object conditional variable with global mutex.

(D) Creative combinations of A and B, e.g. small pool of mutexs distributed
dynamically between objects. Like A and B, mutex held throughout all duration
of the constructor.

All major toolchains appear to use variants of (C)

> > time initialization infrastructure can be super-robust, but being generic
> > it's unlikely to be optimal answer in any concrete scenario.

> Yes, that is one option. You can also use "constinit" statics safely.

I am not aware of constinit. Is it something new, like C++14 or later?

Michael S <already5chosen@yahoo.com>: Sep 21 07:48AM -0700

On Thursday, September 21, 2023 at 3:40:37 PM UTC+3, Bonita Montero wrote:

> Because the initialization might only take a microsecond or less
> and a yield is a full timeslice, on Linux and Windows usually one
> millisecond.

If initialization is short then contention is very unlikely. Wasting a
a full timeslice once in a blue moon is o.k.
If initialization is long then contention is likely, however wasting a
a full timeslice after long initialization is o.k.
Not an excellent solution, but acceptable.

scott@slp53.sl.home (Scott Lurndal): Sep 21 02:54PM

>>>> They are only set to zero if they are not initialized explicitly.

>>> I'm not talking about what the standad says but how it's implemented
>>> with all operating systems that support virtual memory.

Nonsense. As Pavel pointed out in the part this thread you
snipped, both pthread_once and pthread_mutex can be statically
initialized (and the value may or may not be zero).

>> This shows "all zero" on all modern operating systems:

You're changing the topic. BSS has been guaranteed to
be initialized to zero since long before windows existed.

>into one common segment and zero-fill as an extension of the loading
>process, or in some cases the OS can just promise that new segments for
>the program are just always zero-filled.

If the bss section is page-aligned, demand paging will ensure
that the contents are zeroed before being accessed.

scott@slp53.sl.home (Scott Lurndal): Sep 21 02:56PM

>> Actually, considering that contention is very rare in practice, ...

>The initialization might last an arbitrary time, so spinning
>is inacceptable.

It's rare in practice because good programmers very seldom[*] use
non-global static variables in threaded applications.

[*] Effectively never.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 05:03PM +0200

Am 21.09.2023 um 16:48 schrieb Michael S:

> If initialization is short then contention is very unlikely.

Maybe, but no one would cose this solutions if mutexes are available.
Yielding in this situation is crap.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 21 05:04PM +0200

Am 21.09.2023 um 16:56 schrieb Scott Lurndal:

> It's rare in practice because good programmers very seldom[*]
> use non-global static variables in threaded applications.

It's just your taste that this should be avoided.

David Brown <david.brown@hesbynett.no>: Sep 21 08:10PM +0200

On 21/09/2023 16:30, Michael S wrote:
> unacceptably long time then, indeed, you have a problem.
> But to me it looks unlikely that deadlock due to priority inversion is your
> main problem in such scenario.

I always want to think through possible issues when it comes to
multi-threading - no matter how unlikely.

It is certainly not uncommon to have more ready real-time threads than
cpus in real-time systems - though perhaps not on real-time Linux
systems. In microcontroller systems, your OS (if you have one) is
usually an RTOS, and most of the threads are real-time (of different
priorities). And you usually only have one, occasionally two, cpu.

But I think for that kind of system, your recommended solution - don't
have function-local statics that need such initialisation
synchronisation - is far and away the most common solution. (It's
certainly the one I use myself.) Function-local statics with dynamic
initialisation have quite a bit of overhead, both for initialisation and
in use.

> dynamically between objects. Like A and B, mutex held throughout all duration
> of the constructor.

> All major toolchains appear to use variants of (C)

I must have a look at the language support library provided with the
embedded (32-bit arm-none-eabi-gcc) I use. (D) seems the natural choice
there, with less risk of conflict than (B) while avoiding the space
costs of (A). But my guess is that is actually done like (D) but with
simple spinlocks, since the compiler support library is independent of
the RTOS and therefore does not have access to mutexes.

Looking at generated code (using godbolt.org) for Linux, it seems there
is just a single byte boolean guard per object - there is no thread id
or anything else stored per object. I don't know what is going on
inside "__cxa_guard_acquire" and other such functions, however.

>>> it's unlikely to be optimal answer in any concrete scenario.

>> Yes, that is one option. You can also use "constinit" statics safely.

> I am not aware of constinit. Is it something new, like C++14 or later?

C++20 : <https://en.cppreference.com/w/cpp/language/constinit>

If a variable is declared with the "constinit" specifier, it must have
static initialisation (so in practice it is initialised like a C static
variable, though it can be done using something like a constexpr
function call). Such statics will all be initialised before main()
starts, so the there are no synchronisation issues.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:13AM -0700

On 9/21/2023 1:04 AM, Michael S wrote:
> It's only disadvantage is higher memory footage - 6 bytes per guarded object instead of 1.
> Can be reduced to 5 bytes, but in practice compiler will allocate 8 bytes anyway, so let it
> be 6.

Here is Relacy, a pretty nice way to model sync algorithms:

https://www.1024cores.net/home/relacy-race-detector

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:19AM -0700

On 9/21/2023 12:47 AM, Bonita Montero wrote:

>> Have you heard about an adaptive backoff, different than a
>> "traditional" adaptive mutex? ...

> We discussed yield with periodic polling.

An adaptive backoff tries to automatically set its spin limit based on
some statistics it gathers dynamically during runtime, aka before it
waits in the kernel. One way to do it, a very interesting way imvho, is
instead of spinning and yielding, it tries to do some other unrelated
work instead. Pretty nice, actually. Something akin to:

while (! try_lock_or_whatever())
{
if (! try_to_do_some_other_unrelated_work())
{
// wait in the kernel, really slow path...
}
}

{
// We are locked! :^D
}

unlock();

If your program can handle something like this, it actually works pretty
good...

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:21AM -0700

On 9/21/2023 5:40 AM, Bonita Montero wrote:

> Because the initialization might only take a microsecond or less
> and a yield is a full timeslice, on Linux and Windows usually one
> millisecond.

An adaptive backoff can be a mixture of the PAUSE instruction aka x86,
and yield wrt the OS.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:25AM -0700

On 9/21/2023 7:17 AM, Bonita Montero wrote:

>> "Yield" won't take a timeslice unless there is something else of the
>> same priority, waiting to run on the same core. ...

> Ok, you're right, but the code would be still unacceptable under load.

Wait, are you talking about a _pure_ spin lock in user space here? For
some reason you seem to be struggling with adaptive backoffs, that can
choose to wait in the kernel for a "really slow path", so to speak. Keep
in mind that spinning is already a slow path by default. Now, there is a
way to organize a program where a yield can be replaced by a
"try_to_do_something_else()" function. Its been a while sense I
implemented one. Man, how time flies!

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:27AM -0700

On 9/21/2023 8:03 AM, Bonita Montero wrote:

>> If initialization is short then contention is very unlikely.

> Maybe, but no one would cose this solutions if mutexes are available.
> Yielding in this situation is crap.

You tell em Bonita. You should start protesting about this in front of
Microsoft head quarters in Redmond.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:29AM -0700

On 9/19/2023 1:50 AM, Bonita Montero wrote:
>> that user logic that waits for infinity while its trying to initialize
>> itself should be handled in the std? Humm... Not sure about you.

> You often seem confused.

Really? Actually, my comments here are trying to help you. But, you like
to flush them down the toilet and call me an idiot. Well, shit happens.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 21 11:34AM -0700

On 9/21/2023 12:51 AM, Bonita Montero wrote:

> A recursive lock wouldn't make sense here since the object might have
> beein partitially created when it re-enters the mutex which guards the
> initialization.

Huh? I must be misunderstanding you here. A Recursive mutex means it
allows the same thread to acquire it more than once. It can be detected
rather easily.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, September 21, 2023

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

No comments:

Blog Archive

About Me