- ???The pool of talented C++ developers is running dry??? - 8 Updates
- Validating the notion of a simulating halt decider - 1 Update
- Windows does overcommit stacks ! - 8 Updates
| David Brown <david.brown@hesbynett.no>: Nov 10 09:52AM +0100 On 09/11/2022 21:43, Scott Lurndal wrote: >> should use CMPXCHG16B. > David works with low-end embedded processors, as I understand it, with > limited and/or restricted instruction sets. Yes. But the principle is the same on bigger systems too. If your processor can do a single-instruction 64-bit write, you see the problems for atomics bigger than 64-bit. If it can handle 128-bit writes, you see the problems for atomics bigger than 128-bit. Obviously the need for big atomics is much lower than the need for smaller ones. Once you have a DCAS, or LL/SC, you have covered most needs. However, these alone will not give you read-modify-write operations on anything bigger than you can handle with a single read (or more importantly, with a single unbreakable write operation). Anything where the implementation is "use small atomics to get a spin lock, then do the work" is /broken/. It has a small but non-zero chance of failing in general use on big multi-core systems. On small single-core systems, it is guaranteed broken from the outset. The C++ (and C) language, standard library, common toolchains and library implementations give the programmer the impression that they can make atomics as they like. You can write : std::atomic<std::array<int, 32>> xs; and it looks like you have a big atomic object. But it will not work - you cannot rely on it. It will /seem/ to work in all your testing, because the chance of hitting a problem is small - but it can fail at any time. The atomics that the programmer can use should either be absolutely correct, guaranteed by design in all circumstances, or they should not be compile-time errors when you try to use atomics that are too big, or where the operations are too complex, for the implementation to guarantee. It would be even better for the implementation to handle these correctly. That means OS support for /real/ locks, not fake sort-of-works userland spin locks, but futexes or something like that for big systems, and interrupt disabling for single-core microcontrollers. (Dual-core microcontrollers are an extra complication.) Common library implementations could rely on an extra library or code for their "lock" and "unlock" calls - if they are not provided, you at least have a link error. |
| Michael S <already5chosen@yahoo.com>: Nov 10 03:07AM -0800 On Thursday, November 10, 2022 at 10:52:49 AM UTC+2, David Brown wrote: > correct, guaranteed by design in all circumstances, or they should not > be compile-time errors when you try to use atomics that are too big, or > where the operations are too complex, for the implementation to guarantee. Yes, failing in compile time is the most reasonable. > complication.) Common library implementations could rely on an extra > library or code for their "lock" and "unlock" calls - if they are not > provided, you at least have a link error. It certainly would be against the spirit of 'C'. I'm not sure about about relationship to spirit of C++. Also I'm not sure that C++ has spirit. |
| Michael S <already5chosen@yahoo.com>: Nov 10 03:14AM -0800 On Wednesday, November 9, 2022 at 10:44:10 PM UTC+2, Scott Lurndal wrote: > >should use CMPXCHG16B. > David works with low-end embedded processors, as I understand it, with > limited and/or restricted instruction sets. In my book Cortex-M (except M0) is not a low end. M7 in particular is too big and too complicated not even for proverbial 99%, but for solid 100% of my microcontroller needs. |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 11:53AM -0800 On 11/10/2022 12:52 AM, David Brown wrote: > you cannot rely on it. It will /seem/ to work in all your testing, > because the chance of hitting a problem is small - but it can fail at > any time. A rule of thumb... Imvho, always check the result of: https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free Just to be, sure... ;^) Fwiw, DWCAS is very different than DCAS. The latter can work on two non-contiguous words. The former only works with contiguous words. A main reason for DWCAS to exist in the first place is to be able to handle a lock-free stack. A pointer and an version count to combat the ABA problem. Although, there are many other interesting uses for DWCAS... https://groups.google.com/g/comp.lang.c++/c/nUDtke-H1io/m/g87spoMUCgAJ > complication.) Common library implementations could rely on an extra > library or code for their "lock" and "unlock" calls - if they are not > provided, you at least have a link error. If the result of is_lock_free is not true, then you should really think about digging into how the locking is actually implemented. Hash based address locking is one simple way to do it. Fwiw, I created one called multi-mutex: https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJ |
| David Brown <david.brown@hesbynett.no>: Nov 10 09:44PM +0100 On 10/11/2022 20:53, Chris M. Thomasson wrote: >> any time. > A rule of thumb... Imvho, always check the result of: > https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free And it if is not, what is the point in allowing it if the locks don't work? > address locking is one simple way to do it. Fwiw, I created one called > multi-mutex: > https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJ Hash-based arrays of locks are as bad as a single lock for all atomics, in that it does not work unless it is a proper OS lock. The larger your array of locks, the lower your chances of problems, but it all comes down to one thing - are your locks safe or not? |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 12:53PM -0800 On 11/10/2022 12:44 PM, David Brown wrote: > in that it does not work unless it is a proper OS lock. The larger your > array of locks, the lower your chances of problems, but it all comes > down to one thing - are your locks safe or not? Well, my multi-mutex uses std::mutex as elements of its vector of locks. It hashes an address into said table. So, it's only as good as the implementation of std::mutex... std::vector<std::mutex> m_locks; Fair enough? |
| scott@slp53.sl.home (Scott Lurndal): Nov 10 09:20PM >implementation of std::mutex... >std::vector<std::mutex> m_locks; >Fair enough? I'd point out that using a vector is sub-optimal if the various elements of the vector are accessed from different cores due to false sharing.... |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 10 01:23PM -0800 On 11/10/2022 1:20 PM, Scott Lurndal wrote: > I'd point out that using a vector is sub-optimal if the various > elements of the vector are accessed from different cores due to > false sharing.... Well, the mutex state should really be isolated on their own cache lines. Padding and alignment on a l2 cacheline boundary. Iirc, this can be done in modern c++. Even aligning elements of a vector. |
| olcott <none-ya@beez-waxes.com>: Nov 10 01:31PM -0600 MIT Professor Michael Sipser has agreed that the following verbatim paragraph is correct (he has not agreed to anything else): If simulating halt decider H correctly simulates its input D until H correctly determines that its simulated D would never stop running unless aborted then H can abort its simulation of D and correctly report that D specifies a non-halting sequence of configurations. void D(void (*x)()) { int Halt_Status = H(x, x); if (Halt_Status) HERE: goto HERE; return; } int main() { Output("Input_Halts = ", H(D, D)); } Unless one rejects the notion of a universal Turing machine (UTM) then they already know that H can base its halt status decision on the behavior of D correctly simulated by H. That theory of computation textbooks do not bother to mention this is only because they never considered the notion of a simulating halt decider. void D(void (*x)()) { H(x, x); } It is dead obvious that (ignoring stack overflow) D correctly simulated by H will never stop running unless H aborts its simulation of D. -- Copyright 2022 Pete Olcott "Talent hits a target no one else can hit; Genius hits a target no one else can see." Arthur Schopenhauer |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 04:16AM +0100 This is a little test-program that proofs that Windows does overcommit stacks: #include <Windows.h> #include <iostream> #include <atomic> #include <vector> using namespace std; int main() { static SYSTEM_INFO si; GetSystemInfo( &si ); HANDLE hThread = CreateThread( nullptr, 0, []( LPVOID lpvThreadParam ) -> DWORD { ULONG_PTR lowLimit, highLimit; GetCurrentThreadStackLimits( &lowLimit, &highLimit ); bool first = true; for( atomic_char *p = (atomic_char *)highLimit; ; ) __try { p -= si.dwPageSize; (void)p->load( memory_order_relaxed ); MEMORY_BASIC_INFORMATION mbi; char const *scn = (char *)lowLimit; size_t allocated = 0; for( ; scn < (void *)highLimit; ) if( VirtualQuery( scn, &mbi, sizeof mbi ) == sizeof mbi ) if( mbi.AllocationBase == (void *)lowLimit ) scn = (char *)mbi.BaseAddress + mbi.RegionSize, allocated += mbi.State == MEM_COMMIT ? mbi.RegionSize : 0; else break; else return EXIT_FAILURE; cout << ", " + first * 2 << allocated / si.dwPageSize; first = false; } __except( EXCEPTION_EXECUTE_HANDLER ) { break; } cout << endl; return 0; }, nullptr, 0, nullptr ); (void)WaitForSingleObject( hThread, INFINITE ); } |
| yx ma <myxfxtstart@gmail.com>: Nov 09 11:32PM -0800 在 2022年11月10日星期四 UTC+8 11:16:24,<Bonita Montero> 写道: > }, nullptr, 0, nullptr ); > (void)WaitForSingleObject( hThread, INFINITE ); > } ? |
| Michael S <already5chosen@yahoo.com>: Nov 10 02:59AM -0800 On Thursday, November 10, 2022 at 5:16:24 AM UTC+2, Bonita Montero wrote: > This is a little test-program that proofs that Windows does overcommit > stacks: You obviously don't know the meaning of the word 'overcommit' in application to virtual memory. FYI, unlike another popular OS, Windows *never* overcommits. Neither stack, nor heap. |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 12:42PM +0100 Am 10.11.2022 um 11:59 schrieb Michael S: > FYI, unlike another popular OS, Windows *never* overcommits. Try it out yourself: #include <Windows.h> #include <iostream> #include <thread> #include <vector> #include <latch> #include <memory> using namespace std; using XHANDLE = unique_ptr<void, decltype([]( void *h ) { h && h != INVALID_HANDLE_VALUE && CloseHandle( h ); })>; int main() { constexpr unsigned N_THREADS = 0x10000; vector<XHANDLE> threads; threads.reserve( N_THREADS ); static latch latSync( N_THREADS ); for( unsigned t = N_THREADS; t--; ) { auto threadFn = []( LPVOID ) -> DWORD { latSync.arrive_and_wait(); return 0; }; threads.emplace_back( CreateThread( nullptr, 0x1000000, threadFn, nullptr, 0, nullptr ) ); if( !threads.back().get() ) cout << "out of resources" << endl; } threads.resize( 0 ); } This demo will allocate 2 ^ 16 threads with one terabyte of stack. But as the stack on Windows is overcommitted, i.e. actually committed when it is touched, this program won't crash ! |
| Michael S <already5chosen@yahoo.com>: Nov 10 04:34AM -0800 On Thursday, November 10, 2022 at 1:42:24 PM UTC+2, Bonita Montero wrote: > This demo will allocate 2 ^ 16 threads with one terabyte of stack. > But as the stack on Windows is overcommitted, i.e. actually committed > when it is touched, this program won't crash ! Ones again, you don't know the meaning of 'overcommit'. As long as area that was successfully committed either by VirtualAlloc(..., MEM_COMMIT, ...) or by other means is guaranteed to be legal to access from user mode, it's not called 'overcommitment' even when the process of access begins by page fault. Also, it sounds like you don't know the meaning of the word 'commit' in application to virtual memory. It seems, you think that 'commit' means 'makes page resident in physical memory', but it's not so, at least not in the language used by Window documentation. |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 02:08PM +0100 Am 10.11.2022 um 13:34 schrieb Michael S: > guaranteed to be legal to access from user mode, it's not > called 'overcommitment' even when the process of access > begins by page fault. With stacks things are different. The commit is done by the kernel when you hit the guard page. If you touch the stack's address range beyond the guard page the application crashes. If the kernel can't dynamically commit the memory for the guard page you hit you get a SEH guard page exeption - earlier than with the last valid position of the guard page when the stack "successfully" extends to its maximum range. So if you have a default stack size of one MB, you might get one MB minus the size of the (last) guard page, but this actually might not happen if the system runs out of memor meanwhile. |
| "Öö Tiib" <ootiib@hot.ee>: Nov 10 05:18AM -0800 On Thursday, 10 November 2022 at 15:08:47 UTC+2, Bonita Montero wrote: > have a default stack size of one MB, you might get one MB minus the > size of the (last) guard page, but this actually might not happen if > the system runs out of memor meanwhile. So if commit is done lazily on need then there are no overcommit. Process gets SEH exception that can be even caught and handled. Way better than what C++ provides ... going over automatic storage limit is simply not defined. |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 10 02:36PM +0100 Am 10.11.2022 um 14:18 schrieb Öö Tiib: > So if commit is done lazily on need then there are no overcommit. A lazy commit which may fail is overcomitting. > Process gets SEH exception that can be even caught and handled. Of course, but the according guard-page might not become a comitted page then. Try it out yourself: #include <Windows.h> #include <iostream> #include <thread> #include <vector> #include <barrier> #include <memory> #include <atomic> using namespace std; using XHANDLE = unique_ptr<void, decltype([]( void * h ) { h&& h != INVALID_HANDLE_VALUE && CloseHandle( h ); })> ; int main( int argc, char ** ) { constexpr unsigned N_THREADS = 1 << 10; constexpr size_t STACK_SIZE = 1 << 30; static SYSTEM_INFO si; GetSystemInfo( &si ); static barrier barSync( N_THREADS ); static atomic<size_t> sumComitted( 0 ); static bool touch = argc >= 2; size_t sumReserved = 0; vector<XHANDLE> threads; threads.reserve( N_THREADS ); for( unsigned t = 0; t != N_THREADS; ++t, sumReserved += STACK_SIZE ) { auto threadFn = []( LPVOID ) -> DWORD { barSync.arrive_and_wait(); if( !touch ) return 0; ULONG_PTR lowLimit, highLimit; GetCurrentThreadStackLimits( &lowLimit, &highLimit ); atomic_char *pScn = (atomic_char *)highLimit; for( ; ; ) __try { pScn -= si.dwPageSize; (void)pScn->load( memory_order::relaxed ); } __except( EXCEPTION_EXECUTE_HANDLER ) { sumComitted += highLimit - (size_t)pScn; break; } barSync.arrive_and_wait(); return 0; }; threads.emplace_back( CreateThread( nullptr, STACK_SIZE, threadFn, nullptr, 0, nullptr ) ); if( !threads.back() ) { cout << "out of resources: " << t -1 << " threads" << endl; break; } } threads.resize( 0 ); if( touch ) cout << trunc( 100.0 * (ptrdiff_t)sumReserved / (ptrdiff_t)sumComitted + 0.5 ) << "%" << endl; } |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment