- new benchmark for my read/write algorithm... - 17 Updates
- an experimental read/write mutex... - 1 Update
- neos - 7 Updates
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:51PM -0800 On 2/25/2019 10:25 PM, Bonita Montero wrote: >> CAS will _always_ fail if the comparand is different than >> the destination at the point of the atomic operation. > Thank you for this insigt! I never would have known this otherwise! Sorry for that. Well, I really do like your algorihtm. It is a very nice piece of work. Thank you for sharing it. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:52PM -0800 On 2/25/2019 10:23 PM, Bonita Montero wrote: >> Writer A spins on CAS because it failed! >> Reader B releases read access. > Impossible that reader B has read-access such a little time. Tell that to a general purpose read/writer lock... ;^) > be done. > If there was really meaningful work in reader B the locking > -interval was at least that long that writer A gets to sleep. Regardless, I like your algorihtm. Is has an elegance about it. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:57PM -0800 On 2/26/2019 8:52 PM, Chris M. Thomasson wrote: >> If there was really meaningful work in reader B the locking >> -interval was at least that long that writer A gets to sleep. > Regardless, I like your algorihtm. Is has an elegance about it. Just wondering if you are interested in a version of your algorihtm that is 100% pure c++11, and using the weakest memory order? The semaphore and event would be emulated in c++11 using mutex/condvar. With that in mind, it is going to "reduce" performance wrt raw Windows primitives. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:58PM -0800 On 2/26/2019 8:51 PM, Chris M. Thomasson wrote: >> Thank you for this insigt! I never would have known this otherwise! > Sorry for that. Well, I really do like your algorihtm. It is a very nice > piece of work. Thank you for sharing it. The performance is nice. Not bad at all. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 08:58AM +0100 >> Regardless, I like your algorihtm. Is has an elegance about it. > Just wondering if you are interested in a version of your algorihtm > that is 100% pure c++11, ... How should that be possible? C++-whatever has no kernel-synchroniza- tion-facilities like events or semaphores. > The semaphore and event would be emulated in c++11 using mutex/condvar. That would be inefficient. It would be better to rewrite the code for POSIX with POSIX-sempahores. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 11:25AM +0100 >> The semaphore and event would be emulated in c++11 using mutex/condvar. > That would be inefficient. I reconsidered that: it wouldn't be that inefficient as the mutex and the condvar would be used only in contention cases and when that happens there would be kernel-calls which are a lot slower the additional over- head over raw semaphores (and eventually events). But I think it's more elegant because more minimalistic when you realize that with the system facilities and have more than one implementation of the rw-mutex. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 12:59PM +0200 On 27.02.2019 9:58, Bonita Montero wrote: >> that is 100% pure c++11, ... > How should that be possible? C++-whatever has no kernel-synchroniza- > tion-facilities like events or semaphores. By this argument a C++ program would also not be able to allocate memory or open files as these are also kernel facilities. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 01:06PM +0100 >> tion-facilities like events or semaphores. > By this argument a C++ program would also not be able to allocate memory > or open files as these are also kernel facilities. You are unable to read. I only said that C++ ha "no kernel -synchronization-facilities like events or semaphores". Or please tell me what the classes for semaphores and events are. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 04:29PM +0200 On 27.02.2019 14:06, Bonita Montero wrote: > You are unable to read. I only said that C++ ha "no kernel > -synchronization-facilities like events or semaphores". > Or please tell me what the classes for semaphores and events are. C++11 contains everything needed for efficient multithreading programming. I trust the committee in that if semaphores and events are not included then they are not needed for that purpose. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 03:35PM +0100 > C++11 contains everything needed for efficient multithreading > programming. I didn't tell the opposiste or: what C++11 supplies is sufficient in most cases. But if you try to implement a rw-mutex you can't do this that effient than with native events and semaphores. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 05:19PM +0200 On 27.02.2019 16:35, Bonita Montero wrote: > I didn't tell the opposiste or: what C++11 supplies is sufficient in > most cases. But if you try to implement a rw-mutex you can't do this > that effient than with native events and semaphores. OK, fair enough. I guess that's why there is std::shared_mutex added in C++17. |
scott@slp53.sl.home (Scott Lurndal): Feb 27 03:25PM >C++11 contains everything needed for efficient multithreading >programming. I trust the committee in that if semaphores and events are >not included then they are not needed for that purpose. Over the last three decades, I've found the mutexes and condition variables have been sufficient for everything from mainframe operating systems (which had, in 1985, mutex and condition variables built right into the instruction set) to modern pthread apps. http://vseries.lurndal.org/doku.php?id=instructions:lok |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:02PM -0800 >> be performed in this test at 60 seconds. The number of threads is >> variable and is determined by std::hardware_concurrency * THREADS, set >> at 8 in the test. So on my system the setup is: [snip code] > * GCC -Ofast -march=native -lstdc++ * > ************************************* > Testing: Chris M. Thomasson's Experimental Read/Write Mutex [...] Thank you! Sorry to ask, but can you try running version 0.1? Try reloading the page at: https://pastebin.com/raw/1QtPCGhV The newest version 0.1 should have: Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex Testing Version 0.1: std::shared_mutex Fwiw, here is the code: ____________________________________ /* Crude Read/Write Mutex Benchmark This tests my algorithm ct_rwmutex vs. std::shared_mutex. It shows how many reads and writes can be performed within a fixed amount of time. by: Chris M. Thomasson version 0.1 __________________________________________*/ #include <thread> #include <atomic> #include <shared_mutex> #include <condition_variable> #include <iostream> #include <functional> #include <chrono> #include <cassert> #include <cstdlib> #include <ctime> #include <climits> #include <cstdint> #include <vector> // undefine to test std::shared_mutex #define CT_TEST_FAST_MUTEX 1 #define THREADS 8 // multiplied by std::hardware_concurrency #define NODE_PRIME 1000000 // number of nodes #define CRUDE_CACHE_PAD 256 // crude cache pad #define TEST_DURATION_SEC 60 // number of seconds for the test // bare bones mutex/condvar based semaphore struct ct_slow_semaphore { unsigned long m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_slow_semaphore(unsigned long state) : m_state(state) {} void inc() { m_mutex.lock(); ++m_state; m_mutex.unlock(); m_cond.notify_one(); } void add(unsigned long addend) { { std::unique_lock<std::mutex> lock(m_mutex); m_state += addend; } m_cond.notify_all(); } void dec() { std::unique_lock<std::mutex> lock(m_mutex); while (m_state == 0) m_cond.wait(lock); --m_state; } }; // bin-sema struct ct_auto_reset_event { bool m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_auto_reset_event() : m_state(false) {} void signal() { std::unique_lock<std::mutex> lock(m_mutex); m_state = true; m_cond.notify_one(); } void wait() { std::unique_lock<std::mutex> lock(m_mutex); while (m_state == false) m_cond.wait(lock); m_state = false; // auto-reset } }; // just a layer over an auto-reset event struct ct_fast_mutex { std::atomic<unsigned int> m_state; ct_auto_reset_event m_waitset; ct_fast_mutex() : m_state(0) {} void lock() { if (m_state.exchange(1, std::memory_order_acquire)) { while (m_state.exchange(2, std::memory_order_acquire)) { m_waitset.wait(); } } } void unlock() { if (m_state.exchange(0, std::memory_order_release) == 2) { m_waitset.signal(); } } }; // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions // The number of readers _must_ never exceed LONG_MAX! #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD]; // shared state std::atomic<long> m_count; unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD]; std::atomic<long> m_rdwake; unsigned char m_crude_cache_pad_2[CRUDE_CACHE_PAD]; ct_slow_semaphore m_rdwset; unsigned char m_crude_cache_pad_3[CRUDE_CACHE_PAD]; ct_slow_semaphore m_wrwset; unsigned char m_crude_cache_pad_4[CRUDE_CACHE_PAD]; ct_fast_mutex m_wrlock; unsigned char m_crude_cache_pad_5[CRUDE_CACHE_PAD]; ct_rwmutex() : m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } // READ, pretty slim... void lock_shared() { if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) { // We need to wait for a writer. m_rdwset.dec(); } } void unlock_shared() { if (m_count.fetch_add(1, std::memory_order_release) < 0) { // There is a writer if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1) { // We need to wake the writer up m_wrwset.inc(); } } } // WRITE, more hefty void lock() { // Acquire exclusive access m_wrlock.lock(); // we are the only thread in here now. // Gain write access wrt m_count long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); // count can never be negative. if (count < RWMUTEX_COUNT_MAX) { // We detected readers. long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { // Okay, we need to wait for all of the readers // The number of readers is actually // RWMUTEX_COUNT_MAX - count m_wrwset.dec(); } } } // write unlock void unlock() { // Release write access wrt m_count long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { // We need to wake -count readers. m_rdwset.add(-count); } // Release exclusive access m_wrlock.unlock(); } }; struct ct_simple_stack { struct node { node* volatile m_next; // to suppress optimization unsigned int m_tid; node(unsigned int tid) : m_tid(tid) {} }; node* m_head; ct_simple_stack() : m_head(nullptr) {} ~ct_simple_stack() { ct_simple_stack::node* n = flush(); while (n) { ct_simple_stack::node* next = n->m_next; delete n; n = next; } } void push(node* n) { n->m_next = m_head; m_head = n; } node* pop() { node* n = m_head; if (n) { m_head = n->m_next; } return n; } node* flush() { node* n = m_head; m_head = nullptr; return n; } }; struct ct_shared { std::atomic<bool> m_run; // protected by m_std_rwmutex std::uint64_t m_reads; std::uint64_t m_writes; ct_simple_stack m_stack; ct_simple_stack m_stack_dtor; #if defined (CT_TEST_FAST_MUTEX) ct_rwmutex m_std_rwmutex; #else std::shared_mutex m_std_rwmutex;
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment