soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Thread-safe initialization of static objects - 16 Updates
What a bug - 7 Updates
from_chars vs my parse_double<> - 1 Update
It is my last post - 1 Update

Thread-safe initialization of static objects

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 04:54AM +0200

Am 07.09.2023 um 00:44 schrieb Pavel:
>> is initialized and a mutex which guards the initalization.

> Not needed. A test-and-set instruction on a flag -- that is itself
> constant-initialized -- is sufficient.

Using only one flag would require spin-locking. However, spin-locking
is not possible in userspace because a thread holding a spinlock could
keep other threads spinning for a long time. Therefore, there is no
getting around a solution with a mutex. And creation and mutex synchro-
nization may fail.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Sep 07 12:18AM -0400

Bonita Montero wrote:
>> constant-initialized -- is sufficient.

> Using only one flag would require spin-locking. However, spin-locking
> is not possible in userspace
not true, a loop with yield in the body is very possible.

> because a thread holding a spinlock could
> keep other threads spinning for a long time. Therefore, there is no
> getting around a solution with a mutex.
does not have to be a mutex, can be call_once, a semaphore, anything,
actually.

> And creation and mutex synchro-
> nization may fail.
If initialization synchronization fails, the initialization can catch
and terminate. No need to throw a system error. No need to use C++
synchronization primitives in the initialization code either; nothing
prevents the implementation from being implemented in a
platform-specific manner.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:27AM +0200

Am 07.09.2023 um 06:18 schrieb Pavel:

> not true, a loop with yield in the body is very possible.

No one would accept that because that would make the waiters to wait
much more longer than necessary.

> does not have to be a mutex, can be call_once, a semaphore, anything,
> actually.

Every applicable facility for that would rely on kernel-synchronization.
You'd need to create a binary semaphore for that and you need so syn-
chronize on that; both may fail.

> If initialization synchronization fails, the initialization can catch
> and terminate. ...

Nothing like that is specified.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 06 11:33PM -0700

On 9/6/2023 3:44 PM, Pavel wrote:
>> is initialized and a mutex which guards the initalization.
> Not needed. A test-and-set instruction on a flag -- that is itself
> constant-initialized -- is sufficient.
[...]

You also need to use the appropriate memory barriers. An acquire after
the first check, and a release before making the object visible.

// pseudo code, its been a while.
// Damn, I used to work with threads all of the time.
___________________________
static foo* g_foo = nullptr;

foo* local = g_foo; // atomic load

if (! local)
{
hash_lock(&g_foo);

local = g_foo; // atomic load

if (! local)
{
local = new foo;

// release mb #LoadStore | #StoreStore

g_foo = local; // atomic store

}

else
{
// acquire mb #LoadStore | #LoadLoad
}

hash_unlock(&g_foo);
}

else
{
// acquire mb #LoadStore | #LoadLoad
}

local->foobar();
___________________________

Iirc, that is a bare bones DCL.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 08:53AM +0200

Am 07.09.2023 um 08:33 schrieb Chris M. Thomasson:

> You also need to use the appropriate memory barriers. ...
That's all inside the Wikipedia example about DCL. But the discussion
was about whether the thead safe-initialization may fail before or
after the object's constructor is called because the mutex creation
or the kernel-synchronization may fail.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 07 12:16AM -0700

On 9/6/2023 11:53 PM, Bonita Montero wrote:
> was about whether the thead safe-initialization may fail before or
> after the object's constructor is called because the mutex creation
> or the kernel-synchronization may fail.

Usually, the hash table of mutexes is created before any of the programs
logic is executed...

Paavo Helde <eesnimi@osa.pri.ee>: Sep 07 12:55PM +0300

06.09.2023 21:15 Bonita Montero kirjutas:
> for all static data objects since it includes a kernel semaphore,
> which is a costly resource.
> To find out if this is true I wrote the below application:
[...]
> code. So the threads don't share a central object. If there would be
> a central mutex used the above code would run for about 10s. But the
> code does run about one second,

There is a note in the standard: "[Note: This definition permits
initialization of a sequence of ordered variables concurrently with
another sequence. —end note]"

i.e. there are indivual mutexes per
> is done while the DCL-locked creation of the static object. So at last
> static initialization should be declared to throw a system_errror. But
> I can't find anything about that in the standard.

Some debugging with VS2022 seems to indicate it is using a Windows
critical section for thread-safe statics initialization.
EnterCriticalSection() does not return any error code and of course does
not throw any C++ exceptions either, so it is supposed to never fail.

Yes, it's true it can throw a Windows structured exception
EXCEPTION_POSSIBLE_DEADLOCK (after 30 days by default). But this would
be considered as a fault in the program. This is what the C++ standard
says about deadlocks (again a footnote):

"The implementation must not introduce any deadlock around execution of
the initializer. Deadlocks might still be caused by the program logic;
the implementation need only avoid deadlocks due to its own
synchronization operations."

So I gather that in case the thread-safe static init synchronization
fails, there must be a bug in the implementation. No C++ exceptions
would be thrown anyway.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 02:17PM +0200

Am 07.09.2023 um 09:16 schrieb Chris M. Thomasson:

> Usually, the hash table of mutexes is created before
> any of the programs logic is executed...

That sounds too complex for me. Creating individual mutexes on
demand would be o.k. and I think a lock-free stack with a pool
of mutexes would be fancy. A hashtable is too slow for that.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Sep 07 10:44AM -0400

Bonita Montero wrote:

>> not true, a loop with yield in the body is very possible.

> No one would accept that because that would make the waiters to wait
> much more longer than necessary.
How so? Waiters will wait for the time of initialization. The
initializing thread will be yielded to and receive virtually as many and
as complete time slices as it would under any other scheduling
discipline so it will complete its job in same or virtually same time
(actually, the higher system contention level is the more efficient
user-space waiting with yield becomes). Hence the time waiters will wait
is exactly or virtually same as when using mutex.

Also it should be taken into account that all above (and below) is only
relevant to the rare case when the initialization is contended; in other
words, "no one would accept" would be an exaggeration of the year even
if your speculation on "wait much longer" were true -- which it isn't.

> Every applicable facility for that would rely on kernel-synchronization.
> You'd need to create a binary semaphore for that and you need so syn-
> chronize on that; both may fail.
try_lock in a loop with a sleep or yield wouldn't fail. But as said
above, it's not needed.

Regardless, your argument assumes too much C++-morphism in the
implementation whereas the implementation can use any platform-specific
approach available to it. E.g. pthread_once on Linux does not fail if
given valid arguments (which C++ implementation can provide). Other
platforms may have tools of their own to do the job.

>> If initialization synchronization fails, the initialization can catch
>> and terminate. ...

> Nothing like that is specified.
Correct, the above was wrong. But initialization can catch and try
again. This does not have to be specified as it is not observable from
outside.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:39PM +0200

Am 07.09.2023 um 16:44 schrieb Pavel:

> How so? Waiters will wait for the time of initialization.
> The initializing thread will be yielded ...

Initializing is usually much faster than a whole timeslice,
so yielding would be incacceptabel. That's just a stupid idea.

> try_lock in a loop with a sleep or yield wouldn't fail. ...

Do you really think someone would accept spinning with that ?
This means that an initializing thread which is scheduled away
while holding the mutex might keep other threads spinning for
a long time.

> implementation whereas the implementation can use any platform
> -specific approach available to it. E.g. pthread_once on Linux
> does not fail if given valid arguments ...

pthread_once could be implemented with a single central semaphore for
all operations and if the implementers know that the synchronization
itself doesn't fail and the semphore is pre-allocated by the runtime
it's possible to survive that synchronization without error.
But check that code:

#include <iostream>
#include <thread>
#include <vector>

using namespace std;

template<unsigned Thread>
struct SleepAtInitialize
{
SleepAtInitialize()
{
this_thread::sleep_for( 1s );
}
};

int main()
{
auto unroll = []<size_t ... Indices>( index_sequence<Indices ...>, auto
fn )
{
((fn.template operator ()<Indices>()), ...);
};
constexpr unsigned N_THREADS = 10;
vector<jthread> threads;
threads.reserve( N_THREADS );
unroll( make_index_sequence<N_THREADS>(),
[&]<unsigned Thread>()
{
threads.emplace_back(
[&]<unsigned IObj>( integral_constant<unsigned, IObj> )
{
static SleepAtInitialize<IObj> guard;
}, integral_constant<unsigned, Thread>() );
} );

}

This code runs with individual mutexes per object, i.e. the time
taken is about one second (with MSVC, libc++ and libstdc++). So
when individual mutexes are used the initialization may fail.

> Correct, the above was wrong. But initialization can catch and try
> again. ...

That a bumbler solution.

scott@slp53.sl.home (Scott Lurndal): Sep 07 05:27PM

>> The initializing thread will be yielded ...

>Initializing is usually much faster than a whole timeslice,
>so yielding would be incacceptabel. That's just a stupid idea.

So tell us, on which operating systems will there be more than
one thread running when application static objects are
initialized (which happens generally before the application
'main' function is called, and thus before the application
has a chance to create any threads)?

Richard Damon <Richard@Damon-Family.org>: Sep 07 11:06AM -0700

On 9/7/23 10:27 AM, Scott Lurndal wrote:
> initialized (which happens generally before the application
> 'main' function is called, and thus before the application
> has a chance to create any threads)?

While GLOBAL static objects get initialized before main starts, function
local static objects don't get initialized until the first call of the
function. These will need some synchronization if the function is called
from multiple threads at "the same time".

Also, some global object could start up a thread in its constructor.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 08:24PM +0200

Am 07.09.2023 um 19:27 schrieb Scott Lurndal:

> initialized (which happens generally before the application
> 'main' function is called, and thus before the application
> has a chance to create any threads)?

I've shown with my code that each static object gets its own mutex
with MSVC, libstdc++ and libc++. Here it is again in a simplified
version:

#include <iostream>
#include <thread>
#include <vector>

using namespace std;

int main()
{
struct SleepAtInitialize
{
SleepAtInitialize()
{
this_thread::sleep_for( 1s );
}
};
auto unroll = []<size_t ... Indices>( index_sequence<Indices ...>, auto
fn )
{
((fn.template operator ()<Indices>()), ...);
};
constexpr unsigned N_THREADS = 10;
vector<jthread> threads;
threads.reserve( N_THREADS );
unroll( make_index_sequence<N_THREADS>(),
[&]<unsigned Thread>()
{
threads.emplace_back(
[&]<unsigned IObj>( integral_constant<unsigned, IObj> )
{
static SleepAtInitialize guard;
}, integral_constant<unsigned, Thread>() );
} );

}

The code runs about one second with all three implementations,
so ther's a mutex per statically initialized object.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 07 12:06PM -0700

On 9/7/2023 5:17 AM, Bonita Montero wrote:

> That sounds too complex for me. Creating individual mutexes on
> demand would be o.k. and I think a lock-free stack with a pool
> of mutexes would be fancy. A hashtable is too slow for that.

No. A simple hash of a pointer into an index works out okay, not too
slow at all. Fwiw, check this out, tell me what you think:

https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/wwYQCG2hAwAJ

It is a quick and crude example simulation of one way to do it. The hash
lock table is created before program logic is executed.

Any thoughts?

Also, we are talking about a slow path wrt DCL.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 07 12:20PM -0700

On 9/7/2023 11:06 AM, Richard Damon wrote:
> local static objects don't get initialized until the first call of the
> function. These will need some synchronization if the function is called
> from multiple threads at "the same time".

A simple global hash lock scheme is where we can hash addresses directly
into a static locking table. The lock table is created _before_ any
program logic is executed.

https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/wwYQCG2hAwAJ

> Also, some global object could start up a thread in its constructor.

YIKES! Shit. I have had to debug other peoples code that did this. Many
points of errors... One was a rather common peach of a bug. The
constructor would create a thread that would in turn call into a virtual
function and start using the object before its constructor was
completed. A massive race condition, nasty ones!

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 09:27PM +0200

Am 07.09.2023 um 21:06 schrieb Chris M. Thomasson:

> No. A simple hash of a pointer into an index works out okay, not
> too slow at all. Fwiw, check this out, tell me what you think.

Then the number of mutexes would be fixed.

What a bug

red floyd <no.spam.here@its.invalid>: Sep 06 05:51PM -0700

On 9/5/2023 9:14 PM, Keith Thompson wrote:

>> No, I believe that std::unordered_map has a specialization for both
>> const char* and const wchar_t*.

> I don't believe that's correct. Do you have a reference?

Isn't it based on std::less<>?

Just checked. My mistake, forget I said anything.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 04:51AM +0200

Am 06.09.2023 um 20:44 schrieb Pavel:

>> The string-object is created for every lookup with a string-literal
>> if the key is also a string-object.

> False. Read the standard.

Quote the standard.

>> That has nothing to do with C++20.

> False. Read the standard, e.g. how to use

Quote the standard.

> template<class K> iterator find(const K& k);
> template<class K> const_iterator find(const K& k) const;
> };

Theres a templated K-parameter for the loookup, but internally
this parameter needs to be converted to a string-object to be
hashed and to be equality-comparable the same way.
In theory there might be overloaded specializations that take
a string_view or whatever; but that's ratther unlikely.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Sep 06 11:54PM -0400

Bonita Montero wrote:

> Theres a templated K-parameter for the loookup, but internally
> this parameter needs to be converted to a string-object to be
> hashed and to be equality-comparable the same way.
correct

> In theory there might be overloaded specializations that take
> a string_view or whatever;
not needed

> but that's ratther unlikely.
close but imprecise. That's simply not there.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:58AM +0200

Am 07.09.2023 um 05:54 schrieb Pavel:

>> In theory there might be overloaded specializations that take
>> a string_view or whatever;

> not needed

With out such a specialization a string object needs to be created
inside find to have compatible hashing and equality comparison.

>> but that's ratther unlikely.

> close but imprecise. That's simply not there.

That's a theoretically valid possibility,
so it can't be said that it doesn't exist.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Sep 07 11:08AM -0400

Bonita Montero wrote:

>> not needed

> With out such a specialization a string object needs to be created
> inside find to have compatible hashing and equality comparison.

Not true. I was benevolently trying to make you read the standard so you
could become a better C++ programmer but you refused.

I am therefore giving up on helping you. I will become evil and post the
complete standard-compliant example that demonstrates how exactly the
unordered_map::find can be made work on string literal and the map with
the string keys without creating a string object (in the example, the
string is wrapped in StringKey to track constructions easily but you are
welcome to remove the wrapper). The code will work as described under
C++20 but not any previous version of the standard -- and this is the
expected behavior.

You still have a chance to improve your C++ programming skills if you
read the standard and find the explanation for why this code shall
behave like it does.

// ------------- code begin cut here -------------------------
#include <cassert>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <numeric>
#include <string>
#include <unordered_map>

using namespace std;

class StringKey {like
public:
StringKey(const char* cStr): s_(cStr) {
cout << "\n\t(StringKey(" << cStr << ") called)\n";
}
const string& getS() const { return s_; }
private:
string s_;
};

struct StringKeyHash {
typedef void is_transparent;
size_t operator()(const char *s) const {
assert(!!s);
return Hash(s, s + strlen(s));
}
size_t operator()(const StringKey &s) const {
return Hash(s.getS().data(), s.getS().data() + s.getS().size());
}
private:
static size_t Hash(const char* begin, const char* end) {
return accumulate(begin, end, (size_t) 0u,
[](size_t a, char c) -> size_t { return a + (size_t)c; });
}
};

struct StringKeyEq {
typedef void is_transparent;
bool operator()(const StringKey& x, const StringKey& y) const {
return IsEq(x.getS().c_str(), y.getS().c_str());
}
bool operator()(const StringKey& x, const char* y) const {
assert(!!y);
return IsEq(x.getS().c_str(), y);
}
bool operator()(const char* x, const StringKey& y) const {
assert(!!x);
return IsEq(x, y.getS().c_str());
}
private:
static bool IsEq(const char *x, const char *y) {
assert(!!x);
assert(!!y);
for (;; ++x, ++y) {
if (*x == *y) {
if (!*x)
return true;
continue;
}
assert(*x != *y);
return false;
}
}
};

int
main(int, char*[]) {
cout << "*** fill up the unordered map\n";
unordered_map<StringKey, string, StringKeyHash, StringKeyEq> um {
{ "key1", "val1" },
{ "key2", "val2" },
};
const char* key3 = "key3";
const char* key4 = "key4";
const char* key1 = "key1";
cout << "*** now do the the find\n";
const auto i1 = um.find(key1);
const bool r1 = i1 == um.end();
const auto i3 = um.find(key3);
const bool r3 = i3 == um.end();
const auto i4 = um.find(key4);
const bool r4 = i4 == um.end();
cout << "find(" << key1 << "):" << r1 << ' ' <<
"find(" << key4 << "):" << r4 << ' ' <<
"find(" << key3 << "):" << r3 << endl;

return 0;
}

// ------------- code end cut here -------------------------

// --- example run output if compiled by g++ -std=c++20 begin ---
$ ./a.out
*** fill up the unordered map

(StringKey(key1) called)

(StringKey(key2) called)
*** now do the the find
find(key1):0 find(key4):1 find(key3):1
// --- example run output if compiled by g++ -std=c++20 end ---

// --- example run output if compiled by g++ -std=c++17 begin ---
$ ./a.out
*** fill up the unordered map

(StringKey(key1) called)

(StringKey(key2) called)
*** now do the the find

(StringKey(key1) called)

(StringKey(key3) called)

(StringKey(key4) called)
find(key1):0 find(key4):1 find(key3):1
// --- example run output if compiled by g++ -std=c++17 end ---

>> close but imprecise. That's simply not there.

> That's a theoretically valid possibility,
> so it can't be said that it doesn't exist.
This is "theoretically valid" only for those who refuse to read the
standard; the others know the specialization is not there.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 05:56PM +0200

Am 07.09.2023 um 17:08 schrieb Pavel:

> Not true. I was benevolently trying to make you read the standard
> so you could become a better C++ programmer but you refused.

You're making a total differnt discussion and don't notce where's my
point. Within find for a string key a string key is generated from K
if the key is a string-object; that's all. What's wrong with that ?
The alternative you're showing below is sth. completely different
and the idea came across my mind when I discovered what's the problem
with my bug. But a string_view could much more performant if you won't
do any additional allocations per inserted node.
But that's not my point. Should I disable just my code debugging with
MSVC and show you the code which converts a string-pointer to a com-
pararable string-object ?

Rest unread.
Idiot !

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:04PM +0200

Am 07.09.2023 um 17:56 schrieb Bonita Montero:

> You're making a total differnt discussion and don't notce where's my
> point. Within find for a string key a string key is generated from K
> if the key is a string-object; that's all. What's wrong with that ?

Here are the declarations for C++20 from en.cppreference.com:

template< class K >
iterator find( const K& x );
(3) (since C++20)
template< class K >
const_iterator find( const K& x ) const;
(4) (since C++20)

MSVC hasn't code for that even I use C++20. I think MS dropped
that because an internal conversion within find() could be also
done externally while calling find(). Smart decision.

from_chars vs my parse_double<>

Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 05:55PM +0200

#include <iostream>
#include <type_traits>
#include <string>
#include <chrono>
#include "char_conv.h"

using namespace std;
using namespace chrono;

int main()
{
string strValue( "-3.14159266359e-300" );
else
auto compare = [&]<bool Precise, bool Std>( char const *prefix,
uint64_t reference, bool_constant<Precise>, bool_constant<Std> )
{
double value;
if constexpr( !Std )
parse_double<Precise>( strValue.cbegin(), strValue.cend(), value );
else
from_chars( strValue.data(), strValue.data() + strValue.size(), value );
uint64_t bin = bit_cast<uint64_t>( value );
cout << prefix << hexfloat << value;
if( unsigned diffBits = 64 - countl_zero( bin ^ reference ); reference )
cout << ": " << diffBits;
cout << endl;
return bit_cast<uint64_t>( value );
};
uint64_t
dummyReference = 0,
pdImprecise = compare( "imprecise: ", dummyReference,
false_type(), false_type() ),
pdPrecise = compare( "precise vs imprecise: ", pdImprecise,
true_type(), false_type() ),
fcVsImprecise = compare( "from_chars vs. ip: ", pdImprecise,
false_type(), true_type() ),
fcVsPrecise = compare( "from_chars vs. p: ", pdPrecise,
false_type(), true_type() );
}

I implemented sth. like from_chars myself. I wanted to have a maximum,
precision with that. For the given value above the highest different
bit of my precise solution vs. from_chars is bit 11, so 12 bits of
the mantissa are more exact.
If you take a naive approach and sum up the suffix-digits multiplied
by their 10 ^ N value from left to right the internediate values you
add become smaller the further you get right and at last won't partipate
in the mantissa. So I do the math from right to left by first putting
everything into a table (100 digits, if overflow everything is put into
a thread-local vector which isn't reallocated for the next call). And
if you get further in the 10 ^ N row there are N errors of each / 10
or * 10 operation. My function has a template parameter which leads to
a computation of the 10 ^ N value for each digit by my own pow10()
function, which feeds from a table for each bit of the N-exponent,
i.e. there are two tables which store the positive and negative
eponent's values (10 ^ (2 ^ N)).
With the more precise solution I get a result where the lower 12
mantissa bits are different for the above string value.

It is my last post

Tony Oliver <guinness.tony@gmail.com>: Sep 07 03:36AM -0700

On Wednesday, 6 September 2023 at 23:15:02 UTC+1, Amine Moulay Ramdane wrote:

> Don't worry, i have just posted three more posts about artificial intelligence , since i have just wanted to explain more my views on artificial intelligence, and it is my last post here.

> Thank you,
> Amine Moulay Ramdane.

But of course it won't be, you habitual liar.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, September 7, 2023

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me