- new benchmark for my read/write algorithm... - 12 Updates
- is there any UB in this integer math... - 6 Updates
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 03:28PM -0800 >> be performed in this test at 60 seconds. The number of threads is >> variable and is determined by std::hardware_concurrency * THREADS, set >> at 8 in the test. So on my system the setup is: [...] Wrt Version 0.1, remember my first ct_rwmutex that did not use ct_fast_mutex, but instead ct_fast_semaphore with an initial count of 1? Now, it would be beneficial to test against this implementation as well. Also, another poster, Bonita showed another algorithm that uses CAS, kind of like Windows SRW. It has a lot of explicit looping, my algorithm has no loops in ct_rwmutex. Fwiw, it seems like my benchmark code can be used to test many read/write algorithms. Humm... |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 04:15PM -0800 On 2/20/2019 10:25 AM, Bonita Montero wrote: > This is similar shared_mutex, but just for Win32 only. > My shared mutex gives shared readers always priority over > writers (don't know what std::shared_mutex does instead). [...] Need to examine it and implement it for myself, however, your read priority is _radically_ different than mine. ct_rwmutex actually fairly alternates between batches of reads, and a write, like a bakery algorithm, and does _not_ use CAS. Writes can never starve reads, and reads can never starve writes. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 10:03PM -0800 On 2/20/2019 10:25 AM, Bonita Montero wrote: > #include <Windows.h> > #include <intrin.h> > #include <cassert> [...] I implemented your algorithm, minus the recursion part... , in Relacy Race Detector. I am using all seq_cst memory ordering here because you do not provide any memory barrier documentation. It is working fine with seq_cst. Still need to test the recursion because I have seen many broken recursion schemes. I forgot if volatile on MSVC will always use that memory order, with some obscure compiler flag, anyway, here is the code: _________________________________ // Bonita Montero Read/Write Mutex // Relacy Test By Chris M. Thomasson //_______________________________________________ //#define RL_DEBUGBREAK_ON_ASSERT //#define RL_MSVC_OUTPUT //#define RL_FORCE_SEQ_CST //#define RL_GC #include <relacy/relacy_std.hpp> #include <iostream> #define MB_SEQ_CST std::memory_order_seq_cst #define READERS 3 #define WRITERS 3 #define THREADS (READERS + WRITERS) LONG ct_InterlockedCompareExchange( std::atomic<long>* Destination, long Exchange, long Comparand ) { LONG cmp = Comparand; if (Destination->compare_exchange_strong(cmp, Exchange, MB_SEQ_CST)) { RL_ASSERT(cmp == Comparand); } return cmp; } class SharedMutex { public: SharedMutex(); ~SharedMutex(); void Lock(); void Unlock(); void LockShared(); void UnlockShared(); private: std::atomic<LONG> m_lRWCounters; // lower 16 bits: writers including the one that owns the lock // bits 16-30: count of threads having read-access or waiting for reac-access // sign-bit: readers active HANDLE m_hEvtReleaseWriter; HANDLE m_hSemReleaseReaders; }; SharedMutex::SharedMutex() { m_lRWCounters.store(0, MB_SEQ_CST); m_hEvtReleaseWriter = CreateEvent(NULL, FALSE, FALSE, NULL); m_hSemReleaseReaders = CreateSemaphore(NULL, 0, 0x7FFFFFFF, NULL); } SharedMutex::~SharedMutex() { CloseHandle(m_hEvtReleaseWriter); CloseHandle(m_hSemReleaseReaders); } void SharedMutex::Lock() { LONG lRWCounters, lOldRWCounters; for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters = lOldRWCounters) { RL_ASSERT((lRWCounters & 0x0FFFF) != 0x0FFFF); if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters + 1, lRWCounters)) == lRWCounters) if (lOldRWCounters == 0) { return; } else break; } WaitForSingleObject(m_hEvtReleaseWriter, INFINITE); } void SharedMutex::Unlock() { LONG lRWCounters, lOldRWCounters; // sharers have priority over writers for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters = lOldRWCounters) { RL_ASSERT(lRWCounters > 0); if ((lRWCounters & 0x7FFF0000)) { if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 1) | (LONG)0x80000000u, lRWCounters)) == lRWCounters) { ReleaseSemaphore(m_hSemReleaseReaders, (lRWCounters & 0x7FFF0000) >> 16, NULL); return; } } else if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 1, lRWCounters)) == lRWCounters) { if ((lOldRWCounters & 0x0FFFF) > 1) SetEvent(m_hEvtReleaseWriter); return; } } } void SharedMutex::LockShared() { LONG lRWCounters, lOldRWCounters; for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters = lOldRWCounters) { RL_ASSERT((lRWCounters & 0x7FFF0000) != 0x7FFF0000); // even shared access can be recursively, but in this case the sharer-count will increment if (lRWCounters <= 0) { if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000) | (LONG)0x80000000u, lRWCounters)) == lRWCounters) return; } else if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000), lRWCounters)) == lRWCounters) { WaitForSingleObject(m_hSemReleaseReaders, INFINITE); return; } } } void SharedMutex::UnlockShared() { LONG lRWCounters, lOldRWCounters; for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters = lOldRWCounters) { RL_ASSERT(lRWCounters < 0 && (lRWCounters & 0x7FFF0000) != 0); if ((lRWCounters & 0x7FFF0000) == 0x10000) { if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 0x10000) & (LONG)0x7FFFFFFF, lRWCounters)) == lRWCounters) { if ((lRWCounters & 0x0FFFF) != 0) SetEvent(m_hEvtReleaseWriter); return; } } else if ((lOldRWCounters = ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 0x10000, lRWCounters)) == lRWCounters) return; } } // Relacy Stack Test... struct ct_fast_mutex_test : rl::test_suite<ct_fast_mutex_test, THREADS> { VAR_T(long) g_shared_state; SharedMutex g_shared_mutex; void before() { VAR(g_shared_state) = 0; } void after() { } // reader void reader(unsigned int tidx) { g_shared_mutex.LockShared(); long shared_state = VAR(g_shared_state); g_shared_mutex.UnlockShared(); RL_ASSERT(shared_state > -1); } // writer void writer(unsigned int tidx) { g_shared_mutex.Lock(); ++VAR(g_shared_state); g_shared_mutex.Unlock(); } void thread(unsigned int tidx) { if (tidx < WRITERS) { writer(tidx); } else { reader(tidx); } } }; // Test away... Or fly? Humm... int main() { { rl::test_params p; p.iteration_count = 10000000; //p.execution_depth_limit = 33333; //p.search_type = rl::sched_bound; //p.search_type = rl::fair_full_search_scheduler_type; //p.search_type = rl::fair_context_bound_scheduler_type; rl::simulate<ct_fast_mutex_test>(p); } return 0; } _________________________________ Okay. Getting closer. :^) |
Christian Gollwitzer <auriocus@gmx.de>: Feb 21 07:09AM +0100 Am 20.02.19 um 21:01 schrieb Paavo Helde: > comment was wrong, volatile is indeed needed for this code to work, but > the code is restricted to not just to Windows, but also requires MSVC > compiler and its /volatile:ms regime. Wouldn't that mean the code can become standard C++ by replacing the volatile stuff with std::atomic<> ? Christian |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 07:20AM +0100 > for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters = > lOldRWCounters) Your additional barrier is not necessary here and in the other places! And m_lRWCounters doesn't need to be an atomic variable. And you disbled recursive locking. I think if it is possible to do shared locking recursively implicitly it should be able to to exclusive locking explicitly. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 07:36AM +0100 > Wouldn't that mean the code can become standard C++ by > replacing the volatile stuff with std::atomic<> ? volatile is necessary in one case: with m_dwOwningThreadId. volatile just should keep away the compiler from caching this value in a register. But the value in memory can asynchronously change at any time when the thread trying to get a write-lock isn't the current write-owner of the mutex. And on all Win32-platforms loads and stores to aligned DWORDs are atomic by nature. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:26PM -0800 On 2/20/2019 10:20 PM, Bonita Montero wrote: >> = lOldRWCounters) > Your additional barrier is not necessary here and in the other places! > And m_lRWCounters doesn't need to be an atomic variable. Of course you are correct on this point. :^) Fwiw, I just quickly mocked it up using seq_cst. Now, I can add fine-grain membars. Fwiw, I can basically already see what is needed and where in your algorithm. I need to add them to your Interlocked*'s as well. The next step will be your algorithm using the weakest memory model that I can possibly get, and still still pass Relacy testing. Then it will be ready for testing in my read/write benchmark. Nice. :^) Does this sound okay to you Bonita? > And you disbled recursive locking. I think if it is possible to > do shared locking recursively implicitly it should be able to to > exclusive locking explicitly. I disabled recursion simply because my test does not need any, and it would just add a little extra overhead to your algorithm. So, why have it in there? Do you want me to model your recursion with Relacy? I personally don't see a real need right now, but will do it. Well, perhaps I am "biased" against recursive locking in the first place. Some think it is a bit "evil"? ;^) |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:41PM -0800 On 2/20/2019 10:20 PM, Bonita Montero wrote: > And you disbled recursive locking. I think if it is possible to > do shared locking recursively implicitly it should be able to to > exclusive locking explicitly. After I get the weakest memory model, it is going to be fun to test your work against Windows SRW. It should beat it because of the way your are waiting on the semaphore and event. In other words, they are fire and forget, and not hooked up to any looping. Nice. :^) |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:01PM -0800 On 2/20/2019 10:03 PM, Chris M. Thomasson wrote: > } > return cmp; > } [...] Notice how I am using compare_exchange_strong? I think, this thought is before testing, that your algorihtm should work fine with compare_exchange_weak. The weak version does not impose some rather hefty requirements on the CAS. It is more "friendly" on systems that use LL/SC to emulate CAS. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:04PM -0800 > writes_per_tick = 451 > Ticks = 60.0368 > ___________________________________ [...] Thank you! I need to try to organize all of the various results into a single database. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:28PM -0800 On 2/20/2019 2:33 PM, Melzzzzz wrote: >>> Who would do this in year 2019? Recompile in 64-bit and measure again! >> Just tested with a x64 build, and here are the results wrt version 0.1: >> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex [...] >> Nice! Humm... Iirc, unsigned long on Windows is only 32-bit? > Yep. And long double is 64 bits if even hardware provides 80 bits... Fwiw, on Intel, I want fetch_add to use LOCK XADD, this is very important. Let's check out a little test on GodBolt: https://godbolt.org/z/yruhAL // using long https://godbolt.org/z/NKYDeI // using short Wrt my ct_rwmutex, it better use LOCK XADD no matter what damn it. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 02:50PM -0800 On 2/21/2019 12:41 PM, Chris M. Thomasson wrote: > work against Windows SRW. It should beat it because of the way your are > waiting on the semaphore and event. In other words, they are fire and > forget, and not hooked up to any looping. Nice. :^) Ahh, my waits on these "slow-paths" are fire and forget as well. In that sense, they are "similar". This is a good property to have Bonita. |
"Öö Tiib" <ootiib@hot.ee>: Feb 21 01:24AM -0800 On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote: > >> flexible alternative and which is actively maintained. > > Sanitizers sound like debugging options. > But that's what you want, isn't it? I consider it as free sanity check in all signed integer calculations. When correctness of those calculations is of importance then I would leave -ftrapv on forever. Lets say in summer we have product with 0% crashes per usage. Then someone of our team (me) maintains some integer arithmetic incorrectly and since the defect manifests rarely it reaches users in fall. a) With -ftrapv our automated crash reporting shows 0.2% crashes per usage. The patch is up within a week. b) Without it 0.2% runs result with incorrect answers. Those are harder to notice (and in complex whole system the reasons can be also harder to find/blame). The defect will sit in product until spring. When the results of calculations matter to anything then I prefer a). |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 21 11:56AM +0200 On 21.02.2019 11:24, Öö Tiib wrote: > a) With -ftrapv our automated crash reporting shows 0.2% crashes per > usage. The patch is up within a week. > b) Without it 0.2% runs result with incorrect answers. This does not follow. As pointed out by others, the intermediate overflows can still produce correct results when balanced with opposite overflows or cast to unsigned. Based on my limited experience, over half of -ftrapv crashes would be false alarms. Depends on the codebase, of course. Now it would be up to the management to decide whether the problems caused by false alarms on customer sites justify the earlier catching of non-false alarms. |
David Brown <david.brown@hesbynett.no>: Feb 21 01:26PM +0100 On 21/02/2019 10:24, Öö Tiib wrote: >>> Sanitizers sound like debugging options. >> But that's what you want, isn't it? > I consider it as free sanity check in all signed integer calculations. But it is not free - not /remotely/ free. Please, go to <https://godbolt.org> and try some samples with optimisation on (-O1 or -O2), with and without -"ftrapv" and "-fsanitize=signed-integer-overflow". Let me give you a couple of examples: int foo1(int a, int b) { return 2 * a + b; } int foo2(int a, int b) { return 2 * a + b - a; } (Yes, I know people don't write code like in foo2 directly - but such things can easily occur due to macros, inlining, constant folding, templates, etc.) This is the normal code (for x86) : foo1: lea eax, [rsi+rdi*2] ret foo2: lea eax, [rdi+rsi] ret This is the code with -ftrapv : foo1: push rbp mov ebp, esi mov esi, 2 call __mulvsi3 mov esi, ebp mov edi, eax call __addvsi3 pop rbp ret foo2: push r12 mov r12d, esi mov esi, 2 push rbp mov ebp, edi sub rsp, 8 call __mulvsi3 mov esi, r12d mov edi, eax call __addvsi3 mov esi, ebp mov edi, eax call __subvsi3 add rsp, 8 pop rbp pop r12 ret This is the code with -fsanitize=signed-integer-overflow : foo1: push rbp push rbx mov ebx, esi sub rsp, 24 imul ebp, edi, 2 jo .L8 .L2: mov eax, ebx add eax, ebp jo .L9 .L4: add rsp, 24 pop rbx pop rbp ret .L8: movsx rsi, edi mov edx, 2 mov edi, OFFSET FLAT:.Lubsan_data0 call __ubsan_handle_mul_overflow jmp .L2 .L9: movsx rdx, ebp movsx rsi, ebx mov edi, OFFSET FLAT:.Lubsan_data1 mov DWORD PTR [rsp+12], eax call __ubsan_handle_add_overflow mov eax, DWORD PTR [rsp+12] jmp .L4 foo2: push r13 push r12 push rbp mov ebp, esi push rbx mov ebx, edi sub rsp, 24 imul r13d, edi, 2 jo .L18 .L11: mov r12d, ebp add r12d, r13d jo .L19 .L13: mov eax, r12d sub eax, ebx jo .L20 .L15: add rsp, 24 pop rbx pop rbp pop r12 pop r13 ret .L18: movsx rsi, edi mov edx, 2 mov edi, OFFSET FLAT:.Lubsan_data2 call __ubsan_handle_mul_overflow jmp .L11 .L20: movsx rdx, ebx movsx rsi, r12d mov edi, OFFSET FLAT:.Lubsan_data4 mov DWORD PTR [rsp+12], eax call __ubsan_handle_sub_overflow mov eax, DWORD PTR [rsp+12] jmp .L15 .L19: movsx rdx, r13d movsx rsi, ebp mov edi, OFFSET FLAT:.Lubsan_data3 call __ubsan_handle_add_overflow jmp .L13 In what world are these checks "free" ? The "sanitize" version is significantly better than the "-ftrapv" version in that it has short paths when no overflows occur. You could ask why the "-ftrapv" code is not similar - the answer, I think, is that "-ftrapv" simply has not received much care or attention from gcc developers for a very long time. > When correctness of those calculations is of importance then I would > leave -ftrapv on forever. In my world, a program that crashes with a message "overflow detected" is /not/ correct. It is merely broken in a different way from one that gets the calculation wrong. Whether this is better or worse depends on the circumstances, but it is /definitely/ not a correct result. > harder to notice (and in complex whole system the reasons can be > also harder to find/blame). The defect will sit in product until spring. > When the results of calculations matter to anything then I prefer a). -ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a useful option for testing and debugging. But it is an aid to finding bugs in the program - it does /not/ improve correctness, it is of no help in a deployed system (unless you could post-mortems), and it is of very significant cost. Use tools for their appropriate purpose. Take your car to the garage when you have a problem with it - don't tow a caravan behind your car with a couple of mechanics and a bootload of tools on the off-chance that your car breaks down on the journey. |
"Öö Tiib" <ootiib@hot.ee>: Feb 21 09:35AM -0800 On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote: > On 21/02/2019 10:24, Öö Tiib wrote: > In what world are these checks "free" ? Free in sense that I do not have to write and to use classes that do same thing and likely even less efficiently than those two options. > is /not/ correct. It is merely broken in a different way from one that > gets the calculation wrong. Whether this is better or worse depends on > the circumstances, but it is /definitely/ not a correct result. I have repeatedly expressed my lack of knowledge about any beings in this universe who can write correct programs. Only "programmers" who do not write incorrect programs are those who do not write programs. > when you have a problem with it - don't tow a caravan behind your car > with a couple of mechanics and a bootload of tools on the off-chance > that your car breaks down on the journey. So you would prefer b) or have some kind of c) that results with something better in that situation or what you were trying to say? |
"Öö Tiib" <ootiib@hot.ee>: Feb 21 09:50AM -0800 On Thursday, 21 February 2019 11:56:48 UTC+2, Paavo Helde wrote: > Now it would be up to the management to decide whether the problems > caused by false alarms on customer sites justify the earlier catching of > non-false alarms. Ok, fair enough. Lets say a) 0.2% crash or b) 0.05% of runs result with incorrect answers. Is it easier to decide that way? |
David Brown <david.brown@hesbynett.no>: Feb 21 10:58PM +0100 On 21/02/2019 18:35, Öö Tiib wrote: >> In what world are these checks "free" ? > Free in sense that I do not have to write and to use classes that do > same thing and likely even less efficiently than those two options. Classes could be more flexible and more efficient (certainly more efficient than the -ftrapv code), and give you accurate control of when you use overflow detection instead of an "all or nothing" choice. But I agree that it takes effort to make and use them. > I have repeatedly expressed my lack of knowledge about any beings in > this universe who can write correct programs. Only "programmers" who > do not write incorrect programs are those who do not write programs. It is entirely possible to eliminate large classes of bugs - at least on certain kinds of programming. For most of my work, I know that my code has /no/ bugs with dynamic memory allocation or freeing. It has /no/ bugs on signed integer overflow. It never has divide by zeros, or shifts of negative numbers. It does not have code that accidentally uses variables without initialising them. I can't promise I have no out-of-bounds array accesses, but they should be extraordinarily rare - as will buffer overflows. And even if you assume that it is plausible that you have signed integer overflow, how do you think -ftrapv improves matters? Clearly it could be useful in testing and debugging - that is what it is for. But in released code, who would be happy with a crash and an overflow error message? I could understand this more if we were talking about a type of error that is harder to avoid, or harder to catch in testing - things like race conditions and deadlocks in multi-threaded code. But it is /easy/ to avoid signed integer overflow. It is /easy/ to write your code in a way that it can't happen, or that can detect it appropriately and respond sensibly, rather than crashing. >> that your car breaks down on the journey. > So you would prefer b) or have some kind of c) that results with something > better in that situation or what you were trying to say? Sorry - it looks like you have cut part of your answer here. I am not sure what you meant to write. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment