- Lock-free LRU-cache-algorithm - 6 Updates
- Turn a functor into a function pointer - 9 Updates
- hash<size_t> - 1 Update
- Cacheline Ping-Pong - 1 Update
Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 06:33PM +0200 > look like? LRU-caches are only possible with doubly-linked lists. > So I thought this woudn't be possible lock-free. But maybe I'm > wrong here and someone can give me an idea. I've found a solution that is mega-cool. I'm not gonna reveal it because I'm thinking about to get an US-patent on it, but it is a exotic combination of locked and lock-free programming. It scales almost linear with the number of threads. |
scott@slp53.sl.home (Scott Lurndal): Sep 24 04:53PM >> LRU-caches are only possible with doubly-linked lists. Hm. That's news, indeed. Have you produced a formal proof? You may want to consider LRU cache replacement algorithms used in processor caches. |
Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 07:00PM +0200 > Hm. That's news, indeed. Have you produced a formal proof? There's nothing to prof. The idea isn't very complex, but it is very original. > You may want to consider LRU cache replacement algorithms > used in processor caches. That's a totally different issue as the caches are basically pipelined, but within the pipeline each step is serialized. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:50PM -0700 On 9/24/2019 9:33 AM, Bonita Montero wrote: > I've found a solution that is mega-cool. I'm not gonna reveal it > because I'm thinking about to get an US-patent on it, but it is a > exotic combination of locked and lock-free programming. Do a deep patent feasibility study! Build a list of as many references as you can, and read them all. It can safe you some of your hard earned $$$. Patent lawyers cost a lot money! Or, do you work for Microsoft? For some reason, I thought about that. Let me guess... Clever hashed locking for the slow-paths, and lock-free for the fast paths? That is a basic idiom. RCU uses it a lot. Locks for the writers and sync free for the readers. The readers do not use any atomics or memory barriers at all. Well, Alpha aside for a moment... Here is a very simple example of a mixture of lock-based and wait-free. 100% wait-free on the push side, and "sometimes" lock-based on the reader side experimental LIFO: https://groups.google.com/d/msg/comp.arch/8Y0C8zGjtqI/bwg-hBLRAQAJ > It scales > almost linear with the number of threads. Cool! |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:56PM -0700 On 9/24/2019 1:50 PM, Chris M. Thomasson wrote: > On 9/24/2019 9:33 AM, Bonita Montero wrote: [...] > 100% wait-free on the push side, and "sometimes" lock-based on the > reader side experimental LIFO: > https://groups.google.com/d/msg/comp.arch/8Y0C8zGjtqI/bwg-hBLRAQAJ The experimental core of the code above is: ________________________________________ // A work node struct ct_work { std::atomic<ct_work*> m_next; std::thread::id m_data; ct_work(std::thread::id data) : m_next(nullptr), m_data(data) {} void process() { // [...] // User Processing For This Work } ct_work* get_next() const { ct_work* w = nullptr; while ((w = m_next.load(CT_MB_RLX)) == CT_WAIT) { // we can spin, or even do other work right here... std::this_thread::yield(); } return w; } }; // Easy Stack, only uses XCHG struct ct_estack { std::atomic<ct_work*> m_head; ct_estack() : m_head(nullptr) {} void push(ct_work* n) { n->m_next.store(CT_WAIT, CT_MB_RLX); ct_work* head = m_head.exchange(n, CT_MB_REL); // release n->m_next.store(head, CT_MB_RLX); } ct_work* flush_try() { return m_head.exchange(nullptr, CT_MB_ACQ); // acquire } }; // Consume an Easy Stack... void ct_consume( ct_estack& estack ) { ct_work* w = estack.flush_try(); while (w) { // Process FIRST! w->process(); // Now, we can gain the next pointer. ct_work* next = w->get_next(); // Okay, we can delete the work delete w; g_allocs.fetch_sub(1, CT_MB_RLX); // dec w = next; } } ________________________________________ The takeaway is that the LIFO class ct_estack only uses XCHG, and no loops. Now, this can be distributed. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:58PM -0700 On 9/24/2019 10:00 AM, Bonita Montero wrote: >> Hm. That's news, indeed. Have you produced a formal proof? > There's nothing to prof. The idea isn't very complex, > but it is very original. It seems like you realized how bad CMPXCHG can be when used in a loop. Always try XADD and/or XCHG in a loopless algorihtm, before going to a CAS loop. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 03:31AM -0700 Have they considered adding a one-liner to Boost that can turn a functor into a function pointer, for example so that we can use "std::atexit" with "boost::bind", like so: extern void closeAnimation(int prefix); int main() { atexit( boost::bind(closeAnimation, 0) ); } Such a one-liner should really be in boost. |
Ian Collins <ian-news@hotmail.com>: Sep 24 10:39PM +1200 On 24/09/2019 22:31, Frederick Gotham wrote: > atexit( boost::bind(closeAnimation, 0) ); > } > Such a one-liner should really be in boost. What's wrong with atexit( [](){ closeAnimation(0); }); ? -- Ian |
Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 04:51AM -0700 On Tuesday, September 24, 2019 at 11:39:57 AM UTC+1, Ian Collins wrote: > What's wrong with > atexit( [](){ closeAnimation(0); }); > ? That will work fine when there's nothing captured by the lambda function. There could be a situation where a handle is obtained inside a function and stored in a local variable, something like: void Init_Flight_Simulator() { FSim *handle = ObtainSimulator(); if ( nullptr != handle ) atexit( [handle](){ ReleaseSimulator(handle); } ); /* Continue doing stuff with handle */ } This won't compile for me. |
David Brown <david.brown@hesbynett.no>: Sep 24 02:31PM +0200 On 24/09/2019 13:51, Frederick Gotham wrote: > /* Continue doing stuff with handle */ > } > This won't compile for me. It's good that it doesn't compile, because it would go horribly wrong - your handle is at block scope, but needs to be available at program exit time. You have to figure out the best way to make it life long enough. For example, you could write: void Init_Flight_Simulator() { static FSim *handle = ObtainSimulator(); if ( nullptr != handle ) atexit( [](){ ReleaseSimulator(handle); } ); /* Continue doing stuff with handle */ } But you'd be better doing this with RAII and smart pointers - releasing resources with "atexit" is not a typical C++ solution. |
Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 03:47PM +0300 On 24.09.2019 14:51, Frederick Gotham wrote: > /* Continue doing stuff with handle */ > } > This won't compile for me. And neither would a boost::bind solution. A single function pointer fixed at compile time just cannot contain or otherwise hold another value ('handle' here) whose value will become available at run time only. Technically it could be done via trampolines by run-time code generation, but I believe it's a good thing this is not part of C++. |
Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 04:02PM +0300 On 24.09.2019 15:47, Paavo Helde wrote: > value ('handle' here) whose value will become available at run time only. > Technically it could be done via trampolines by run-time code > generation, but I believe it's a good thing this is not part of C++. PS. In Linux there is also on_exit() which takes an extra callback pointer and would work fine with this example, but is arguably even more unfriendly to lambdas. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 08:05AM -0700 I've been playing around with this code that still has a few issues: #include <functional> /* Pretend that these next 2 functions are defined in another translation unit */ int GetHandle(void) { return 5; } void ReleaseHandle(int) { /* Whatever */ } template <class RetType, class T> RetType (*(Make_Funcptr(T &&functor)))(void) { class InnerClass { public: static T *Set_Or_Get_Functor(T *const arg) { static T *pfunctor = nullptr; if ( nullptr != arg ) pfunctor = arg; return pfunctor; } static RetType ActualFunction(void) { return (*(Set_Or_Get_Functor(0)))(); } }; InnerClass::Set_Or_Get_Functor(&functor); return InnerClass::ActualFunction; } void SomeFunc(void) { int h = GetHandle(); std::bind(ReleaseHandle, h); atexit( Make_Funcptr<void>( std::bind(ReleaseHandle, h) ) ); } int main(void) { SomeFunc(); } A few problems: (1) Taking the address of the temporary object returned from "bind" (2) The static pointer will have one instance for each combination of T and RetType (when really we need a static pointer for every invocation of Make_FuncPtr) |
Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 08:55PM +0300 On 24.09.2019 18:05, Frederick Gotham wrote: > A few problems: > (1) Taking the address of the temporary object returned from "bind" > (2) The static pointer will have one instance for each combination of T and RetType (when really we need a static pointer for every invocation of Make_FuncPtr) To accomplish such things with atexit() you will need a global static anyway, so better be explicit about this: #include <deque> #include <functional> int GetHandle(void) { return 5; } void ReleaseHandle(int) { /* Whatever */ } static std::deque<std::function<void()>> globalRegistry; void SomeFunc() { int h = GetHandle(); globalRegistry.push_back([h]() {ReleaseHandle(h);}); } int main() { atexit([]() { for (auto& functor : globalRegistry) { functor(); }} ); SomeFunc(); } |
legalize+jeeves@mail.xmission.com (Richard): Sep 24 07:38PM [Please do not mail me a copy of your followup] Frederick Gotham <cauldwell.thomas@gmail.com> spake the secret code > atexit( [handle](){ ReleaseSimulator(handle); } ); > /* Continue doing stuff with handle */ >} IMO, this particular example would better be served by an RAII wrapper class rather than through the use of atexit. -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Terminals Wiki <http://terminals-wiki.org> The Computer Graphics Museum <http://computergraphicsmuseum.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com> |
Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 06:27AM +0200 So I added another hashcode-generation. I don't show the changes but the full source now: #include <iostream> #include <cstddef> #include <random> #include <limits> #include <vector> #include <cmath> #include <functional> #include <algorithm> #include <intrin.h> using namespace std; void showHashStatistics( vector<unsigned> &v ); int main() { size_t const MAX64_PRIME = 0xFFFFFFFFFFFFFFC5u; size_t const N_BUCKETS = 0x100000; random_device rd; uniform_int_distribution<size_t> uidHash( 0, numeric_limits<size_t>::max() ); vector<unsigned> v; v.resize( N_BUCKETS, 0 ); cout << "simple prime hash:" << endl; for( size_t i = 0; i != 10'000'000; ++i ) ++v[(size_t)(uidHash( rd ) * MAX64_PRIME) % N_BUCKETS]; showHashStatistics( v ); size_t low, high; fill( v.begin(), v.end(), 0 ); cout << "extended prime hash:" << endl; for( size_t i = 0; i != 10'000'000; ++i ) low = _mul128( uidHash( rd ), MAX64_PRIME, &(__int64 &)high ), ++v[(low ^ high) % N_BUCKETS]; showHashStatistics( v ); cout << endl; cout << "hash<size_t>:" << endl; fill( v.begin(), v.end(), 0 ); hash<size_t> hasher; for( size_t i = 0; i != 10'000'000; ++i ) ++v[hasher( i ) % N_BUCKETS]; showHashStatistics( v ); double mean = (double)numeric_limits<size_t>::max() / 2.0; normal_distribution<double> nd( mean, sqrt( mean ) ); cout << endl; cout << "normal distribution:" << endl; fill( v.begin(), v.end(), 0 ); for( size_t i = 0; i != 10'000'000; ++i ) ++v[((size_t)nd( rd ) * MAX64_PRIME) % N_BUCKETS]; showHashStatistics( v ); } void showHashStatistics( vector<unsigned> &v ) { unsigned min = numeric_limits<unsigned>::max(); unsigned max = 0; for( unsigned x : v ) min = x < min ? x : min, max = x > max ? x : max; cout << "min: " << min << " max: " << max << endl; double avg = 0; for( unsigned x : v ) avg += x; avg /= v.size(); cout << "average: " << avg << endl; double rmsd = 0.0; double sq; for( unsigned x : v ) sq = (x - avg), rmsd += sq * sq; rmsd = sqrt( rmsd / v.size() ); cout << "standard deviation: " << rmsd << endl << endl; cout << "distrubution:" << endl; vector<unsigned> vDist; vDist.resize( max - min + 1, 0 ); for( unsigned x : v ) ++vDist[x - min]; for( unsigned i = 0; i < max; ++i ) cout << vDist[i] << endl; } The relevant new part here is: size_t low, high; fill( v.begin(), v.end(), 0 ); cout << "extended prime hash:" << endl; for( size_t i = 0; i != 10'000'000; ++i ) low = _mul128( uidHash( rd ), MAX64_PRIME, &(__int64 &)high ), ++v[(low ^ high) % N_BUCKETS]; showHashStatistics( v ); There I'm multiplying a random value with the maximum prime fitting in 64 bit to a 128 bit value through an intrinsic and xor the higher 64 bit to the lower 64 bit. One might think you'd get pure randomness here, but the distribution of bucket-depths is almost the same as simply multiplying the input-value with the mentioned prime. So we have bucket-depthts vom 0 to 29 instead 0 to 28, the average is exactly the same and the standard-deviation is almost the same; so WTF is going on here? Where's the relationship between randomness basing on prime numbers and normal distribution? |
Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 04:39AM +0200 > added to v with the CAS? Also, fwiw, atomic::compare_exchange_weak > automatically returns the previous value in v on failure, no need to > reload it on ever iteration of the loop. I'm just testing the latency of moving cachelines between cores in two variants. The source was a bit misordered because I pastet it twice, so here it is another time. #if defined(_MSC_VER) #include <Windows.h> #elif defined(__unix__) #include <sys/sysinfo.h> #include <sched.h> #include <pthread.h>
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment