Monday, February 18, 2019

Digest for comp.lang.c++@googlegroups.com - 14 updates in 2 topics

David Brown <david.brown@hesbynett.no>: Feb 18 07:59PM +0100

On 18/02/2019 19:11, Bonita Montero wrote:
 
> No, that's not that what I imagined. I imagined something that
> marks a variable to be excluded from the default-behaviour set
> by the compiler-flag.
 
Use the sanitizer, rather than -ftrapv, and you can mark functions with
the "no_sanitize_undefined" attribute.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:22PM -0500

On 2/17/19 23:23, Chris M. Thomasson wrote:
> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow or
> underflow. I boiled it down to the following simple program:
...
> Is there any UB in there? Some on reddit seem to think so. They are most
> likely trolling me. Little shi%'s.
 
 
I see no undefined behavior in the code you've shown. In general, I'm
less reliable when I say there's no problem than I am when saying that
there is a problem - but I've got David and Ben backing me up on this.
Are you sure the code you've give us is the same code they're referring to?
 
It would help a great deal if you could explain the arguments they've
given for thinking it's undefined. However, it's generally very
difficult to accurately summarize an argument that hasn't convinced you,
even if it's a valid argument that should have convinced you - your own
biases get in the way of accurately summarizing it.
It would be better yet if you could give us a link so we can review the
discussion and determine the validity of their arguments ourselves.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:23PM -0500

On 2/18/19 13:11, Bonita Montero wrote:
 
> No, that's not that what I imagined. I imagined something that
> marks a variable to be excluded from the default-behaviour set
> by the compiler-flag.
 
I've no doubt that he understood exactly what you were asking for. His
point is that this solution is better than the one you were asking for.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:59PM -0800

On 2/18/2019 6:34 AM, Ben Bacarisse wrote:
>> Is there any UB in there? Some on reddit seem to think so. They are
>> most likely trolling me. Little shi%'s.
 
> Tell us where! I could no find your posted code on Reddit.
 
In the following thread:
 
https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/
 
Here is a start of the sub thread about integer overflow and underflow:
 
https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi
 
Afaict, they did not seem to understand that fetch_add atomically
returns the previous value, not the new value. Btw, Google has been
using my algorithm in the GO language for over 7 years:
 
https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ
 
However, Dmitry Vyukov did not give be credit for creating it, e.g., put
my name in the source code. Will ask him to put my name in the GO
language source code. Fwiw, I invented my algorithm back in 2008-2009.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:12PM -0800

On 2/18/2019 12:22 PM, James Kuyper wrote:
> biases get in the way of accurately summarizing it.
> It would be better yet if you could give us a link so we can review the
> discussion and determine the validity of their arguments ourselves.
 
It has to come from confusion about fetch_add on Reddit. Some apparently
think that count can be negative. It will never be negative. So, I
boiled it down to a case that shows what happens when a write access
notices a batch of n readers. It simulates how the writer obtains n in
order to properly wait for the n readers.
________________________
#include <iostream>
#include <climits>
 
 
long fetch_add(long& gcount, long addend)
{
long lcount = gcount;
gcount += addend;
return lcount;
}
 
 
int main()
{
long m_count = LONG_MAX;
std::cout << "m_count = " << m_count << "\n";
 
// simulate three concurret readers
fetch_add(m_count, -3);
std::cout << "m_count = " << m_count << ", 3 readers\n";
 
// now m_count = LONG_MAX - 3
 
// simulate a writer. There can only be a single writer.
long count = fetch_add(m_count, -LONG_MAX);
std::cout << "m_count = " << m_count << ", 3 readers in write mode\n";
 
if (count < LONG_MAX)
{
std::cout << "\nReaders Detected!\n";
long readers = LONG_MAX - count;
std::cout << "count = " << count << "\n";
std::cout << "readers = " << readers << "\n";
}
 
return 0;
}
________________________
 
 
There can only be a single writer here, because this is modeling the
point where a writer detects n readers. count will never be negative.
 
 
Fwiw, here is my algorithm:
________________________
// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions
 
#define RWMUTEX_COUNT_MAX LONG_MAX
 
struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;
 
ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;
 
 
ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
 
 
// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}
 
void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}
 
 
// WRITE, more hefty
void lock()
{
m_wrlock.lock();
 
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);
 
if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);
 
if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}
 
// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);
 
if (count < 0)
{
m_rdwset.add(-count);
}
 
m_wrlock.unlock();
}
};
________________________
 
Can you notice any integer overflow and/or underflow? I cannot.
jameskuyper@alumni.caltech.edu: Feb 18 02:22PM -0800

On Monday, February 18, 2019 at 4:59:29 PM UTC-5, Chris M. Thomasson wrote:
 
> https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi
 
> Afaict, they did not seem to understand that fetch_add atomically
> returns the previous value, not the new value.
 
I saw no comments suggesting any such misunderstanding. I did see a
claim by you that "m_count can have any value between, and including,
-LONG_MAX and LONG_MAX". That is NOT true for the code you showed us:
the code you showed us only allows m_count to take on three possible
values: LONG_MAX, LONG_MAX-3, and -3. Therefore, your comment is about
some other piece of code that you have not yet shown us. This is also
supported by the fact that your main critic quotes code, presumably
an excerpt from your code, that doesn't match any code you've
provided in this thread. Could you please provide that code?
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:31PM -0800

> supported by the fact that your main critic quotes code, presumably
> an excerpt from your code, that doesn't match any code you've
> provided in this thread. Could you please provide that code?
 
The only way my algorithm can get into a condition where m_count can
equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
This is fine. So, in my example I showed 3 readers, this takes m_count
to -3. Try it with LONG_MAX readers. Now, this is critical: If _more_
than LONG_MAX readers hit my algorithm at the same time, then integer
overflow and underflow _will_ occur. So, as long as the number of
readers is LONG_MAX, and never exceeds this amount, then there is no UB.
 
Basically, the math in my algorithm requires that the number of reader
threads never exceeds LONG_MAX. Btw, have you seen a computer that had
more than LONG_MAX threads running? Perhaps 64-bit LONG_MAX?
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:44PM -0800

On 2/18/2019 2:31 PM, Chris M. Thomasson wrote:
>>> On 2/18/2019 6:34 AM, Ben Bacarisse wrote:
>>>> "Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>
>>>> writes:
[...]
> equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
> This is fine. So, in my example I showed 3 readers, this takes m_count
> to -3.
 
It goes to -3 when a writer subtracts LONG_MAX from m_count. This is in
regard to the boiled down code:
_________________________
m_count = LONG_MAX
 
fetch_add(m_count, -3) // three readers
 
m_count = LONG_MAX - 3
 
count = fetch_add(m_count, -LONG_MAX); // a single writer
 
count = LONG_MAX - 3
 
m_count = -3
 
 
readers = LONG_MAX - count = 3 = perfect!
_________________________
 
 
okay, now take this into account wrt LONG_MAX readers...
_________________________
m_count = LONG_MAX
 
fetch_add(m_count, -LONG_MAX) // LONG_MAX readers
 
m_count = LONG_MAX - LONG_MAX = 0 // Okay, we are full of readers!
 
count = fetch_add(m_count, -LONG_MAX); // a single writer
 
count = 0
 
m_count = -LONG_MAX
 
readers = LONG_MAX - count = LONG_MAX = Nice! :^D
_________________________
 
 
Only a single writer is allowed to perform the:
 
count = fetch_add(m_count, -LONG_MAX);
 
mutation.
 
 
 
 
jameskuyper@alumni.caltech.edu: Feb 18 02:45PM -0800

On Monday, February 18, 2019 at 5:12:52 PM UTC-5, Chris M. Thomasson wrote:
...
> #include <iostream>
> #include <climits>
 
> long fetch_add(long& gcount, long addend)
 
This is a non-member function.
 
> if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
 
This is a member function, whose definition you have not provided.
Presumably they're related, but the member function you haven't shown us
takes one argument with no analogue in the non-member function you did
show us. Once again, it would be much easier to evaluate the issues if
you would show all of the relevant code in an example that's compileable.
 
I have not looked at your code in any detail - seeing comments that
makes it clear that the code being commented on is significantly
different from the code that has been presented makes me unwilling to
invest much time on the code that has been presented. However, a quick
scan lead me to the following questions:
 
You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX. Under what circumstances would m_count be LONG_MAX? What
happens if the following code is executed at a time when that is the
case?
 
> if (m_count.fetch_add(1, std::memory_order_release) < 0)
 
You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX, inclusive. Under what circumstances would m_count be negative?
What happens if the following code is executed at a time when that is
the case?
 
> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
 
You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX, inclusive. Under what circumstances would m_count be positive?
What happens if the following code is executed at a time when that is
the case?
 
> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
 
Since I haven't bothered examining your code in detail; there might very
well be good reasons why the cases I've just mentioned above will never
actually come up. However, your response to the person who asked the
very same question on Reddit (in less detail) just denigrated his
understanding without bother to explain what those reasons are.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:50PM -0800


>> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
 
> You said that m_count can have any value in the range -LONG_MAX to
> LONG_MAX, inclusive. Under what circumstances would m_count be positive?
 
m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
time that ct_rwmutex::m_count equals LONG_MAX means that there are no
readers or writers, or activity whatsoever.
 
________________
#define RWMUTEX_COUNT_MAX LONG_MAX
 
struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;
 
ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;
 
 
ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
[...]
________________
 
 
 
 
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 06:06PM -0500

On 2/18/19 17:50, Chris M. Thomasson wrote:
> On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote:
...
 
> m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
> time that ct_rwmutex::m_count equals LONG_MAX means that there are no
> readers or writers, or activity whatsoever.
 
So a call to unlock_shared() when there are no readers, writers, or
activity, produces signed overflow.
 
I asked about three different cases. Your responded to my second case
with an answer to my first case, and didn't respond to either of the
other two cases.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:16PM -0800

On 2/18/2019 3:06 PM, James Kuyper wrote:
>> readers or writers, or activity whatsoever.
 
> So a call to unlock_shared() when there are no readers, writers, or
> activity, produces signed overflow.
 
Why in the world would a thread call unlock_shared() when it did not
call lock_shared() first? Your question make no sense to me. The only
time you call unlock_shared() is to unlock read access that was
previously acquired with lock_shared():
__________________
lock_shared()
// read activity
unlock_shared()
 
 
lock()
// write activity
unlock()
__________________
 
Think about it for a moment.
 
 
> I asked about three different cases. Your responded to my second case
> with an answer to my first case, and didn't respond to either of the
> other two cases.
 
Working on on another benchmark right now. A little time constrained. It
will have some comments.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:22PM -0800

On 2/13/2019 11:03 PM, Chris M. Thomasson wrote:
> __________________________________
> /* Simple, crude read/write mutex test
>    by: Chris M. Thomasson
[...]
 
Holy moly! Read all of the following post:
 
 
 
https://groups.google.com/forum/#!original/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ
 
 
 
WOW! Btw, Dmitry Vyukov works at Google and develops the GO language!
Why did I fail to put a license on my work. Damn it.
 
 
 
Here is his response when I asked him what he thought about my algorithm:
 
 
 
Dmitry Vyukov:
 
I think it's one of the best algorithms out there. The complete fairness
for both readers and writers is notable. And the wait-free fast-path for
readers. It still suffers from high cache line contention even on
read-only workload, but it's as good as one can get with a centralized
design (i.e. not per-thread/cpu distributed stuff which has own
problems). I would expect a CAS-loop-based read lock to degrade
significantly under high read load. Btw your algorithm is used as the
standard Go RWMutex for the past 7+ years= :
https://github.com/golang/go/commit/daaf29cf932 (I should have mentioned
your authorship somewhere!) As for potential improvements I can only
think of optimizing uncontented writer lock/unlock to be 1 RMW each,
i.e. not offloading writer competition resolution to a separate mutex,
but rather incorporate it into the same atomic variable readers and
writers use for synchronization with each other. Do you think it's
possible? With CAS-loop? Or maybe with some smart atomic_fetch_or? Wait,
atomic_fetch_or is compiled to a CAS loop on x86 anyway. These new C/C++
atomic interfaces sometimes make me forget that.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:43PM -0800


>> Here is the code: Still have to try it out on GCC in c++17 mode.
 
>> Can anybody run it?
 
>> https://pastebin.com/raw/xCBHY9qd
[...]
 
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
 
> msec = 40027
> shared.m_state = 160000000
 
[...]
 
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
 
> msec = 37518
> shared.m_state = 160000000
[...]
> VS SHARED MUTEX:
 
> Ran for 15 minutes then i stopped it. It's not even close.
 
Wow. Thank you so much for taking the time to test it for yourself. I
really do appreciate it. It seems that some of the std::shared_mutex
impls are created around pure mutex/condvar. So, it is only slow pathed
(always blocking). My algorithms try to create fast-paths (non-blocking)
paths that can cleverly "skip" calls into the "slow" paths, so to speak.
 
I am working on another type of benchmark code. Will get back to you.
 
Thanks again.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: