- is there any UB in this integer math... - 12 Updates
- an experimental read/write mutex... - 2 Updates
David Brown <david.brown@hesbynett.no>: Feb 18 07:59PM +0100 On 18/02/2019 19:11, Bonita Montero wrote: > No, that's not that what I imagined. I imagined something that > marks a variable to be excluded from the default-behaviour set > by the compiler-flag. Use the sanitizer, rather than -ftrapv, and you can mark functions with the "no_sanitize_undefined" attribute. |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:22PM -0500 On 2/17/19 23:23, Chris M. Thomasson wrote: > There is a sort of funny discussion going on over on reddit about my > read/write mutex. Some people think is has UB wrt an integer overflow or > underflow. I boiled it down to the following simple program: ... > Is there any UB in there? Some on reddit seem to think so. They are most > likely trolling me. Little shi%'s. I see no undefined behavior in the code you've shown. In general, I'm less reliable when I say there's no problem than I am when saying that there is a problem - but I've got David and Ben backing me up on this. Are you sure the code you've give us is the same code they're referring to? It would help a great deal if you could explain the arguments they've given for thinking it's undefined. However, it's generally very difficult to accurately summarize an argument that hasn't convinced you, even if it's a valid argument that should have convinced you - your own biases get in the way of accurately summarizing it. It would be better yet if you could give us a link so we can review the discussion and determine the validity of their arguments ourselves. |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:23PM -0500 On 2/18/19 13:11, Bonita Montero wrote: > No, that's not that what I imagined. I imagined something that > marks a variable to be excluded from the default-behaviour set > by the compiler-flag. I've no doubt that he understood exactly what you were asking for. His point is that this solution is better than the one you were asking for. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:59PM -0800 On 2/18/2019 6:34 AM, Ben Bacarisse wrote: >> Is there any UB in there? Some on reddit seem to think so. They are >> most likely trolling me. Little shi%'s. > Tell us where! I could no find your posted code on Reddit. In the following thread: https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/ Here is a start of the sub thread about integer overflow and underflow: https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi Afaict, they did not seem to understand that fetch_add atomically returns the previous value, not the new value. Btw, Google has been using my algorithm in the GO language for over 7 years: https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ However, Dmitry Vyukov did not give be credit for creating it, e.g., put my name in the source code. Will ask him to put my name in the GO language source code. Fwiw, I invented my algorithm back in 2008-2009. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:12PM -0800 On 2/18/2019 12:22 PM, James Kuyper wrote: > biases get in the way of accurately summarizing it. > It would be better yet if you could give us a link so we can review the > discussion and determine the validity of their arguments ourselves. It has to come from confusion about fetch_add on Reddit. Some apparently think that count can be negative. It will never be negative. So, I boiled it down to a case that shows what happens when a write access notices a batch of n readers. It simulates how the writer obtains n in order to properly wait for the n readers. ________________________ #include <iostream> #include <climits> long fetch_add(long& gcount, long addend) { long lcount = gcount; gcount += addend; return lcount; } int main() { long m_count = LONG_MAX; std::cout << "m_count = " << m_count << "\n"; // simulate three concurret readers fetch_add(m_count, -3); std::cout << "m_count = " << m_count << ", 3 readers\n"; // now m_count = LONG_MAX - 3 // simulate a writer. There can only be a single writer. long count = fetch_add(m_count, -LONG_MAX); std::cout << "m_count = " << m_count << ", 3 readers in write mode\n"; if (count < LONG_MAX) { std::cout << "\nReaders Detected!\n"; long readers = LONG_MAX - count; std::cout << "count = " << count << "\n"; std::cout << "readers = " << readers << "\n"; } return 0; } ________________________ There can only be a single writer here, because this is modeling the point where a writer detects n readers. count will never be negative. Fwiw, here is my algorithm: ________________________ // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { // shared state std::atomic<long> m_wrstate; std::atomic<long> m_count; std::atomic<long> m_rdwake; ct_slow_semaphore m_rdwset; ct_slow_semaphore m_wrwset; ct_fast_mutex m_wrlock; ct_rwmutex() : m_wrstate(1), m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } // READ, pretty slim... void lock_shared() { if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) { m_rdwset.dec(); } } void unlock_shared() { if (m_count.fetch_add(1, std::memory_order_release) < 0) { if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1) { m_wrwset.inc(); } } } // WRITE, more hefty void lock() { m_wrlock.lock(); long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); if (count < RWMUTEX_COUNT_MAX) { long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { m_wrwset.dec(); } } } // write unlock void unlock() { long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { m_rdwset.add(-count); } m_wrlock.unlock(); } }; ________________________ Can you notice any integer overflow and/or underflow? I cannot. |
jameskuyper@alumni.caltech.edu: Feb 18 02:22PM -0800 On Monday, February 18, 2019 at 4:59:29 PM UTC-5, Chris M. Thomasson wrote: > https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi > Afaict, they did not seem to understand that fetch_add atomically > returns the previous value, not the new value. I saw no comments suggesting any such misunderstanding. I did see a claim by you that "m_count can have any value between, and including, -LONG_MAX and LONG_MAX". That is NOT true for the code you showed us: the code you showed us only allows m_count to take on three possible values: LONG_MAX, LONG_MAX-3, and -3. Therefore, your comment is about some other piece of code that you have not yet shown us. This is also supported by the fact that your main critic quotes code, presumably an excerpt from your code, that doesn't match any code you've provided in this thread. Could you please provide that code? |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:31PM -0800 > supported by the fact that your main critic quotes code, presumably > an excerpt from your code, that doesn't match any code you've > provided in this thread. Could you please provide that code? The only way my algorithm can get into a condition where m_count can equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access. This is fine. So, in my example I showed 3 readers, this takes m_count to -3. Try it with LONG_MAX readers. Now, this is critical: If _more_ than LONG_MAX readers hit my algorithm at the same time, then integer overflow and underflow _will_ occur. So, as long as the number of readers is LONG_MAX, and never exceeds this amount, then there is no UB. Basically, the math in my algorithm requires that the number of reader threads never exceeds LONG_MAX. Btw, have you seen a computer that had more than LONG_MAX threads running? Perhaps 64-bit LONG_MAX? |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:44PM -0800 On 2/18/2019 2:31 PM, Chris M. Thomasson wrote: >>> On 2/18/2019 6:34 AM, Ben Bacarisse wrote: >>>> "Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com> >>>> writes: [...] > equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access. > This is fine. So, in my example I showed 3 readers, this takes m_count > to -3. It goes to -3 when a writer subtracts LONG_MAX from m_count. This is in regard to the boiled down code: _________________________ m_count = LONG_MAX fetch_add(m_count, -3) // three readers m_count = LONG_MAX - 3 count = fetch_add(m_count, -LONG_MAX); // a single writer count = LONG_MAX - 3 m_count = -3 readers = LONG_MAX - count = 3 = perfect! _________________________ okay, now take this into account wrt LONG_MAX readers... _________________________ m_count = LONG_MAX fetch_add(m_count, -LONG_MAX) // LONG_MAX readers m_count = LONG_MAX - LONG_MAX = 0 // Okay, we are full of readers! count = fetch_add(m_count, -LONG_MAX); // a single writer count = 0 m_count = -LONG_MAX readers = LONG_MAX - count = LONG_MAX = Nice! :^D _________________________ Only a single writer is allowed to perform the: count = fetch_add(m_count, -LONG_MAX); mutation. |
jameskuyper@alumni.caltech.edu: Feb 18 02:45PM -0800 On Monday, February 18, 2019 at 5:12:52 PM UTC-5, Chris M. Thomasson wrote: ... > #include <iostream> > #include <climits> > long fetch_add(long& gcount, long addend) This is a non-member function. > if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) This is a member function, whose definition you have not provided. Presumably they're related, but the member function you haven't shown us takes one argument with no analogue in the non-member function you did show us. Once again, it would be much easier to evaluate the issues if you would show all of the relevant code in an example that's compileable. I have not looked at your code in any detail - seeing comments that makes it clear that the code being commented on is significantly different from the code that has been presented makes me unwilling to invest much time on the code that has been presented. However, a quick scan lead me to the following questions: You said that m_count can have any value in the range -LONG_MAX to LONG_MAX. Under what circumstances would m_count be LONG_MAX? What happens if the following code is executed at a time when that is the case? > if (m_count.fetch_add(1, std::memory_order_release) < 0) You said that m_count can have any value in the range -LONG_MAX to LONG_MAX, inclusive. Under what circumstances would m_count be negative? What happens if the following code is executed at a time when that is the case? > long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, You said that m_count can have any value in the range -LONG_MAX to LONG_MAX, inclusive. Under what circumstances would m_count be positive? What happens if the following code is executed at a time when that is the case? > long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, Since I haven't bothered examining your code in detail; there might very well be good reasons why the cases I've just mentioned above will never actually come up. However, your response to the person who asked the very same question on Reddit (in less detail) just denigrated his understanding without bother to explain what those reasons are. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:50PM -0800 >> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, > You said that m_count can have any value in the range -LONG_MAX to > LONG_MAX, inclusive. Under what circumstances would m_count be positive? m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any time that ct_rwmutex::m_count equals LONG_MAX means that there are no readers or writers, or activity whatsoever. ________________ #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { // shared state std::atomic<long> m_wrstate; std::atomic<long> m_count; std::atomic<long> m_rdwake; ct_slow_semaphore m_rdwset; ct_slow_semaphore m_wrwset; ct_fast_mutex m_wrlock; ct_rwmutex() : m_wrstate(1), m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } [...] ________________ |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 06:06PM -0500 On 2/18/19 17:50, Chris M. Thomasson wrote: > On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote: ... > m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any > time that ct_rwmutex::m_count equals LONG_MAX means that there are no > readers or writers, or activity whatsoever. So a call to unlock_shared() when there are no readers, writers, or activity, produces signed overflow. I asked about three different cases. Your responded to my second case with an answer to my first case, and didn't respond to either of the other two cases. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:16PM -0800 On 2/18/2019 3:06 PM, James Kuyper wrote: >> readers or writers, or activity whatsoever. > So a call to unlock_shared() when there are no readers, writers, or > activity, produces signed overflow. Why in the world would a thread call unlock_shared() when it did not call lock_shared() first? Your question make no sense to me. The only time you call unlock_shared() is to unlock read access that was previously acquired with lock_shared(): __________________ lock_shared() // read activity unlock_shared() lock() // write activity unlock() __________________ Think about it for a moment. > I asked about three different cases. Your responded to my second case > with an answer to my first case, and didn't respond to either of the > other two cases. Working on on another benchmark right now. A little time constrained. It will have some comments. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:22PM -0800 On 2/13/2019 11:03 PM, Chris M. Thomasson wrote: > __________________________________ > /* Simple, crude read/write mutex test > by: Chris M. Thomasson [...] Holy moly! Read all of the following post: https://groups.google.com/forum/#!original/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ WOW! Btw, Dmitry Vyukov works at Google and develops the GO language! Why did I fail to put a license on my work. Damn it. Here is his response when I asked him what he thought about my algorithm: Dmitry Vyukov: I think it's one of the best algorithms out there. The complete fairness for both readers and writers is notable. And the wait-free fast-path for readers. It still suffers from high cache line contention even on read-only workload, but it's as good as one can get with a centralized design (i.e. not per-thread/cpu distributed stuff which has own problems). I would expect a CAS-loop-based read lock to degrade significantly under high read load. Btw your algorithm is used as the standard Go RWMutex for the past 7+ years= : https://github.com/golang/go/commit/daaf29cf932 (I should have mentioned your authorship somewhere!) As for potential improvements I can only think of optimizing uncontented writer lock/unlock to be 1 RMW each, i.e. not offloading writer competition resolution to a separate mutex, but rather incorporate it into the same atomic variable readers and writers use for synchronization with each other. Do you think it's possible? With CAS-loop? Or maybe with some smart atomic_fetch_or? Wait, atomic_fetch_or is compiled to a CAS loop on x86 anyway. These new C/C++ atomic interfaces sometimes make me forget that. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:43PM -0800 >> Here is the code: Still have to try it out on GCC in c++17 mode. >> Can anybody run it? >> https://pastebin.com/raw/xCBHY9qd [...] > Testing: Chris M. Thomasson's Experimental Read/Write Mutex > msec = 40027 > shared.m_state = 160000000 [...] > Testing: Chris M. Thomasson's Experimental Read/Write Mutex > msec = 37518 > shared.m_state = 160000000 [...] > VS SHARED MUTEX: > Ran for 15 minutes then i stopped it. It's not even close. Wow. Thank you so much for taking the time to test it for yourself. I really do appreciate it. It seems that some of the std::shared_mutex impls are created around pure mutex/condvar. So, it is only slow pathed (always blocking). My algorithms try to create fast-paths (non-blocking) paths that can cleverly "skip" calls into the "slow" paths, so to speak. I am working on another type of benchmark code. Will get back to you. Thanks again. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment