Thursday, November 10, 2022

Digest for comp.lang.c++@googlegroups.com - 17 updates in 3 topics

David Brown <david.brown@hesbynett.no>: Nov 10 09:52AM +0100

On 09/11/2022 21:43, Scott Lurndal wrote:
>> should use CMPXCHG16B.
 
> David works with low-end embedded processors, as I understand it, with
> limited and/or restricted instruction sets.
 
Yes.
 
But the principle is the same on bigger systems too. If your processor
can do a single-instruction 64-bit write, you see the problems for
atomics bigger than 64-bit. If it can handle 128-bit writes, you see
the problems for atomics bigger than 128-bit.
 
Obviously the need for big atomics is much lower than the need for
smaller ones. Once you have a DCAS, or LL/SC, you have covered most needs.
 
However, these alone will not give you read-modify-write operations on
anything bigger than you can handle with a single read (or more
importantly, with a single unbreakable write operation). Anything where
the implementation is "use small atomics to get a spin lock, then do the
work" is /broken/. It has a small but non-zero chance of failing in
general use on big multi-core systems. On small single-core systems, it
is guaranteed broken from the outset.
 
The C++ (and C) language, standard library, common toolchains and
library implementations give the programmer the impression that they can
make atomics as they like. You can write :
 
std::atomic<std::array<int, 32>> xs;
 
and it looks like you have a big atomic object. But it will not work -
you cannot rely on it. It will /seem/ to work in all your testing,
because the chance of hitting a problem is small - but it can fail at
any time.
 
The atomics that the programmer can use should either be absolutely
correct, guaranteed by design in all circumstances, or they should not
be compile-time errors when you try to use atomics that are too big, or
where the operations are too complex, for the implementation to guarantee.
 
It would be even better for the implementation to handle these
correctly. That means OS support for /real/ locks, not fake
sort-of-works userland spin locks, but futexes or something like that
for big systems, and interrupt disabling for single-core
microcontrollers. (Dual-core microcontrollers are an extra
complication.) Common library implementations could rely on an extra
library or code for their "lock" and "unlock" calls - if they are not
provided, you at least have a link error.
Michael S <already5chosen@yahoo.com>: Nov 10 03:07AM -0800

On Thursday, November 10, 2022 at 10:52:49 AM UTC+2, David Brown wrote:
> correct, guaranteed by design in all circumstances, or they should not
> be compile-time errors when you try to use atomics that are too big, or
> where the operations are too complex, for the implementation to guarantee.
 
Yes, failing in compile time is the most reasonable.
 
> complication.) Common library implementations could rely on an extra
> library or code for their "lock" and "unlock" calls - if they are not
> provided, you at least have a link error.
 
It certainly would be against the spirit of 'C'.
I'm not sure about about relationship to spirit of C++.
Also I'm not sure that C++ has spirit.
Michael S <already5chosen@yahoo.com>: Nov 10 03:14AM -0800

On Wednesday, November 9, 2022 at 10:44:10 PM UTC+2, Scott Lurndal wrote:
> >should use CMPXCHG16B.
> David works with low-end embedded processors, as I understand it, with
> limited and/or restricted instruction sets.
 
In my book Cortex-M (except M0) is not a low end.
M7 in particular is too big and too complicated not even for
proverbial 99%, but for solid 100% of my microcontroller needs.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 11:53AM -0800

On 11/10/2022 12:52 AM, David Brown wrote:
> you cannot rely on it.  It will /seem/ to work in all your testing,
> because the chance of hitting a problem is small - but it can fail at
> any time.
 
A rule of thumb... Imvho, always check the result of:
 
https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free
 
Just to be, sure... ;^) Fwiw, DWCAS is very different than DCAS. The
latter can work on two non-contiguous words. The former only works with
contiguous words. A main reason for DWCAS to exist in the first place is
to be able to handle a lock-free stack. A pointer and an version count
to combat the ABA problem. Although, there are many other interesting
uses for DWCAS...
 
https://groups.google.com/g/comp.lang.c++/c/nUDtke-H1io/m/g87spoMUCgAJ
 
 
> complication.)  Common library implementations could rely on an extra
> library or code for their "lock" and "unlock" calls - if they are not
> provided, you at least have a link error.
 
If the result of is_lock_free is not true, then you should really think
about digging into how the locking is actually implemented. Hash based
address locking is one simple way to do it. Fwiw, I created one called
multi-mutex:
 
https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJ
David Brown <david.brown@hesbynett.no>: Nov 10 09:44PM +0100

On 10/11/2022 20:53, Chris M. Thomasson wrote:
>> any time.
 
> A rule of thumb... Imvho, always check the result of:
 
> https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free
 
And it if is not, what is the point in allowing it if the locks don't work?
 
> address locking is one simple way to do it. Fwiw, I created one called
> multi-mutex:
 
> https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJ
 
Hash-based arrays of locks are as bad as a single lock for all atomics,
in that it does not work unless it is a proper OS lock. The larger your
array of locks, the lower your chances of problems, but it all comes
down to one thing - are your locks safe or not?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 12:53PM -0800

On 11/10/2022 12:44 PM, David Brown wrote:
> in that it does not work unless it is a proper OS lock.  The larger your
> array of locks, the lower your chances of problems, but it all comes
> down to one thing - are your locks safe or not?
 
Well, my multi-mutex uses std::mutex as elements of its vector of locks.
It hashes an address into said table. So, it's only as good as the
implementation of std::mutex...
 
std::vector<std::mutex> m_locks;
 
Fair enough?
scott@slp53.sl.home (Scott Lurndal): Nov 10 09:20PM

>implementation of std::mutex...
 
>std::vector<std::mutex> m_locks;
 
>Fair enough?
 
I'd point out that using a vector is sub-optimal if the various
elements of the vector are accessed from different cores due to
false sharing....
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 01:23PM -0800

On 11/10/2022 1:20 PM, Scott Lurndal wrote:
 
> I'd point out that using a vector is sub-optimal if the various
> elements of the vector are accessed from different cores due to
> false sharing....
 
Well, the mutex state should really be isolated on their own cache
lines. Padding and alignment on a l2 cacheline boundary. Iirc, this can
be done in modern c++. Even aligning elements of a vector.
olcott <none-ya@beez-waxes.com>: Nov 10 01:31PM -0600

MIT Professor Michael Sipser has agreed that the following verbatim
paragraph is correct (he has not agreed to anything else):
 
If simulating halt decider H correctly simulates its input D until H
correctly determines that its simulated D would never stop running
unless aborted then H can abort its simulation of D and correctly report
that D specifies a non-halting sequence of configurations.
 
void D(void (*x)())
{
int Halt_Status = H(x, x);
if (Halt_Status)
HERE: goto HERE;
return;
}
 
int main()
{
Output("Input_Halts = ", H(D, D));
}
 
Unless one rejects the notion of a universal Turing machine (UTM) then
they already know that H can base its halt status decision on the
behavior of D correctly simulated by H.
 
That theory of computation textbooks do not bother to mention this is
only because they never considered the notion of a simulating halt decider.
 
void D(void (*x)())
{
H(x, x);
}
 
It is dead obvious that (ignoring stack overflow) D correctly simulated
by H will never stop running unless H aborts its simulation of D.
 
 
--
Copyright 2022 Pete Olcott
 
"Talent hits a target no one else can hit;
Genius hits a target no one else can see."
Arthur Schopenhauer
Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 04:16AM +0100

This is a little test-program that proofs that Windows does overcommit
stacks:
 
#include <Windows.h>
#include <iostream>
#include <atomic>
#include <vector>
 
using namespace std;
 
int main()
{
static SYSTEM_INFO si;
GetSystemInfo( &si );
HANDLE hThread = CreateThread( nullptr, 0,
[]( LPVOID lpvThreadParam ) -> DWORD
{
ULONG_PTR lowLimit, highLimit;
GetCurrentThreadStackLimits( &lowLimit, &highLimit );
bool first = true;
for( atomic_char *p = (atomic_char *)highLimit; ; )
__try
{
p -= si.dwPageSize;
(void)p->load( memory_order_relaxed );
MEMORY_BASIC_INFORMATION mbi;
char const *scn = (char *)lowLimit;
size_t allocated = 0;
for( ; scn < (void *)highLimit; )
if( VirtualQuery( scn, &mbi, sizeof mbi ) == sizeof mbi )
if( mbi.AllocationBase == (void *)lowLimit )
scn = (char *)mbi.BaseAddress + mbi.RegionSize,
allocated += mbi.State == MEM_COMMIT ? mbi.RegionSize : 0;
else
break;
else
return EXIT_FAILURE;
cout << ", " + first * 2 << allocated / si.dwPageSize;
first = false;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
break;
}
cout << endl;
return 0;
}, nullptr, 0, nullptr );
(void)WaitForSingleObject( hThread, INFINITE );
}
yx ma <myxfxtstart@gmail.com>: Nov 09 11:32PM -0800

在 2022年11月10日星期四 UTC+8 11:16:24,<Bonita Montero> 写道:
> }, nullptr, 0, nullptr );
> (void)WaitForSingleObject( hThread, INFINITE );
> }
?
Michael S <already5chosen@yahoo.com>: Nov 10 02:59AM -0800

On Thursday, November 10, 2022 at 5:16:24 AM UTC+2, Bonita Montero wrote:
> This is a little test-program that proofs that Windows does overcommit
> stacks:
 
You obviously don't know the meaning of the word 'overcommit' in
application to virtual memory.
FYI, unlike another popular OS, Windows *never* overcommits.
Neither stack, nor heap.
Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 12:42PM +0100

Am 10.11.2022 um 11:59 schrieb Michael S:
 
> FYI, unlike another popular OS, Windows *never* overcommits.
 
Try it out yourself:
 
#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <latch>
#include <memory>
 
using namespace std;
 
using XHANDLE = unique_ptr<void, decltype([]( void *h ) { h && h !=
INVALID_HANDLE_VALUE && CloseHandle( h ); })>;
 
int main()
{
constexpr unsigned N_THREADS = 0x10000;
vector<XHANDLE> threads;
threads.reserve( N_THREADS );
static latch latSync( N_THREADS );
for( unsigned t = N_THREADS; t--; )
{
auto threadFn = []( LPVOID ) -> DWORD { latSync.arrive_and_wait();
return 0; };
threads.emplace_back( CreateThread( nullptr, 0x1000000, threadFn,
nullptr, 0, nullptr ) );
if( !threads.back().get() )
cout << "out of resources" << endl;
}
threads.resize( 0 );

}
 
This demo will allocate 2 ^ 16 threads with one terabyte of stack.
But as the stack on Windows is overcommitted, i.e. actually committed
when it is touched, this program won't crash !
Michael S <already5chosen@yahoo.com>: Nov 10 04:34AM -0800

On Thursday, November 10, 2022 at 1:42:24 PM UTC+2, Bonita Montero wrote:
 
> This demo will allocate 2 ^ 16 threads with one terabyte of stack.
> But as the stack on Windows is overcommitted, i.e. actually committed
> when it is touched, this program won't crash !
 
Ones again, you don't know the meaning of 'overcommit'.
As long as area that was successfully committed either by
VirtualAlloc(..., MEM_COMMIT, ...) or by other means is
guaranteed to be legal to access from user mode, it's not called
'overcommitment' even when the process of access begins by
page fault.
 
Also, it sounds like you don't know the meaning of the word
'commit' in application to virtual memory. It seems, you think
that 'commit' means 'makes page resident in physical memory',
but it's not so, at least not in the language used by Window
documentation.
Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 02:08PM +0100

Am 10.11.2022 um 13:34 schrieb Michael S:
 
> guaranteed to be legal to access from user mode, it's not
> called 'overcommitment' even when the process of access
> begins by page fault.
 
With stacks things are different. The commit is done by the kernel when
you hit the guard page. If you touch the stack's address range beyond
the guard page the application crashes. If the kernel can't dynamically
commit the memory for the guard page you hit you get a SEH guard page
exeption - earlier than with the last valid position of the guard page
when the stack "successfully" extends to its maximum range. So if you
have a default stack size of one MB, you might get one MB minus the
size of the (last) guard page, but this actually might not happen if
the system runs out of memor meanwhile.
"Öö Tiib" <ootiib@hot.ee>: Nov 10 05:18AM -0800

On Thursday, 10 November 2022 at 15:08:47 UTC+2, Bonita Montero wrote:
> have a default stack size of one MB, you might get one MB minus the
> size of the (last) guard page, but this actually might not happen if
> the system runs out of memor meanwhile.
 
So if commit is done lazily on need then there are no overcommit.
Process gets SEH exception that can be even caught and handled.
Way better than what C++ provides ... going over automatic storage
limit is simply not defined.
Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 02:36PM +0100

Am 10.11.2022 um 14:18 schrieb Öö Tiib:
 
> So if commit is done lazily on need then there are no overcommit.
 
A lazy commit which may fail is overcomitting.
 
> Process gets SEH exception that can be even caught and handled.
 
Of course, but the according guard-page might not become a comitted
page then.
 
Try it out yourself:
 
#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <barrier>
#include <memory>
#include <atomic>
 
using namespace std;
 
using XHANDLE = unique_ptr<void, decltype([]( void * h ) { h&& h !=
INVALID_HANDLE_VALUE && CloseHandle( h ); })> ;
 
int main( int argc, char ** )
{
constexpr unsigned N_THREADS = 1 << 10;
constexpr size_t STACK_SIZE = 1 << 30;
static SYSTEM_INFO si;
GetSystemInfo( &si );
static barrier barSync( N_THREADS );
static atomic<size_t> sumComitted( 0 );
static bool touch = argc >= 2;
size_t sumReserved = 0;
vector<XHANDLE> threads;
threads.reserve( N_THREADS );
for( unsigned t = 0; t != N_THREADS; ++t, sumReserved += STACK_SIZE )
{
auto threadFn = []( LPVOID ) -> DWORD
{
barSync.arrive_and_wait();
if( !touch )
return 0;
ULONG_PTR lowLimit, highLimit;
GetCurrentThreadStackLimits( &lowLimit, &highLimit );
atomic_char *pScn = (atomic_char *)highLimit;
for( ; ; )
__try
{
pScn -= si.dwPageSize;
(void)pScn->load( memory_order::relaxed );
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
sumComitted += highLimit - (size_t)pScn;
break;
}
barSync.arrive_and_wait();
return 0;
};
threads.emplace_back( CreateThread( nullptr, STACK_SIZE, threadFn,
nullptr, 0, nullptr ) );
if( !threads.back() )
{
cout << "out of resources: " << t -1 << " threads" << endl;
break;
}
}
threads.resize( 0 );
if( touch )
cout << trunc( 100.0 * (ptrdiff_t)sumReserved / (ptrdiff_t)sumComitted
+ 0.5 ) << "%" << endl;
}
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: