Wednesday, February 20, 2019

Digest for comp.lang.c++@googlegroups.com - 13 updates in 1 topic

Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 10:01PM +0200

On 20.02.2019 21:24, Bonita Montero wrote:
>>> volatile DWORD m_dwRecursionCount;
 
>> 'volatile' is neither needed nor sufficient for multithreading code.
 
> It is just to document that it may change asyncronously.
 
If so, I see they are accessed without proper locking all over the
place. For example, there is no memory barrier after assigning to
m_dwOwningThreadId in Lock() and before potential reading it in in
another Lock() in another thread, so there is no guarantee the other
thread ever sees the changed value.
 
However, now I looked at the MSDN site and saw that by Microsoft rules
indeed volatile causes additional memory barriers when compiled with
/volatile:ms (which is the default, except for ARM). So my original
comment was wrong, volatile is indeed needed for this code to work, but
the code is restricted to not just to Windows, but also requires MSVC
compiler and its /volatile:ms regime.
Bonita Montero <Bonita.Montero@gmail.com>: Feb 20 09:34PM +0100

> If so, I see they are accessed without proper locking all over the
> place.
 
The flags for the reader- and writer-count and writer-flag (sign bit)
are the lock themselfes. And the other two members are only modified
by the writer that owns the mutex.
 
> For example, there is no memory barrier after assigning to
> m_dwOwningThreadId in Lock() ....
 
That's not explicitly necesary because the Xchg-Intrinsic before has
aquire and release behaviour.
 
> ... and before potential reading it in in another Lock()
> in another thread, so there is no guarantee the other
 
I think you consider this code in the beginning of Lock():
 
if( m_dwOwningThreadId == (dwCurrentThreadId = GetCurrentThreadId()) )
{
m_dwRecursionCount++;
return;
}
 
There is absolutely no barrier necessary here! If the thread that does
this check is the same that owns the mutex everything is ok because
the barrier is the Xchg in the outer Lock(). If the thread doesn't
own the mutex, the m_dwOwningThreadId may change at any time and may
have any value different than the current thread id.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:46PM -0800

On 2/20/2019 4:50 AM, Paavo Helde wrote:
>> (0D8B00Ch)]
 
> What I see from here is that apparently you are running in 32-bit mode.
> Who would do this in year 2019? Recompile in 64-bit and measure again!
 
OUCH! ARGH.... Will do. Thanks. ;^o
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:53PM -0800

On 2/20/2019 10:25 AM, Bonita Montero wrote:
>   }
> }
 
> I'm pretty sure you won't understand the code.
 
Why not? I need to give it a good read, and then test it out in Relacy
Race Detector:
 
http://www.1024cores.net/home/relacy-race-detector
 
> But the performance should be unbeatable. ;-)
 
Nice. Okay, I will try to get it up and running in Relacy. It can
emulate Windows sync primitives, pretty nice. Once I get it running in
there, to make sure it passes tests, then I can add it to my benchmark
code. It will be fun. Thanks for the algorithm Bonita. :^)
 
There is a main different between your algorithm and mine, notice that I
avoid the use of CAS and loops. I am looking forward to giving your
algorithm a go. Thanks again. :^)
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:02PM -0800

>> be performed in this test at 60 seconds. The number of threads is
>> variable and is determined by std::hardware_concurrency * THREADS, set
>> at 8 in the test. So on my system the setup is:
[snip code]
 
> * GCC -Ofast -march=native -lstdc++ *
> *************************************
 
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
[...]
 
Thank you! Sorry to ask, but can you try running version 0.1?
 
Try reloading the page at: https://pastebin.com/raw/1QtPCGhV
 
The newest version 0.1 should have:
 
 
Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
 
Testing Version 0.1: std::shared_mutex
 
 
 
 
Fwiw, here is the code:
____________________________________
/* Crude Read/Write Mutex Benchmark
 
This tests my algorithm ct_rwmutex vs. std::shared_mutex.
 
It shows how many reads and writes can be performed within
a fixed amount of time.
 
by: Chris M. Thomasson
 
version 0.1
__________________________________________*/
 
 
 
#include <thread>
#include <atomic>
#include <shared_mutex>
#include <condition_variable>
#include <iostream>
#include <functional>
#include <chrono>
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <climits>
#include <cstdint>
#include <vector>
 
 
// undefine to test std::shared_mutex
#define CT_TEST_FAST_MUTEX 1
 
 
#define THREADS 8 // multiplied by std::hardware_concurrency
#define NODE_PRIME 1000000 // number of nodes
#define CRUDE_CACHE_PAD 256 // crude cache pad
#define TEST_DURATION_SEC 60 // number of seconds for the test
 
 
// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
unsigned long m_state;
std::mutex m_mutex;
std::condition_variable m_cond;
 
ct_slow_semaphore(unsigned long state) : m_state(state) {}
 
void inc()
{
m_mutex.lock();
++m_state;
m_mutex.unlock();
m_cond.notify_one();
}
 
void add(unsigned long addend)
{
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state += addend;
}
 
m_cond.notify_all();
}
 
void dec()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == 0) m_cond.wait(lock);
--m_state;
}
};
 
 
 
 
// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;
 
ct_auto_reset_event() : m_state(false) {}
 
void signal()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state = true;
m_cond.notify_one();
}
 
void wait()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == false) m_cond.wait(lock);
m_state = false; // auto-reset
}
};
 
 
 
 
// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic<unsigned int> m_state;
ct_auto_reset_event m_waitset;
 
ct_fast_mutex() : m_state(0) {}
 
void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}
 
void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};
 
 
 
// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions
 
// The number of readers _must_ never exceed LONG_MAX!
 
#define RWMUTEX_COUNT_MAX LONG_MAX
 
struct ct_rwmutex
{
unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD];
 
// shared state
std::atomic<long> m_count;
unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD];
 
std::atomic<long> m_rdwake;
unsigned char m_crude_cache_pad_2[CRUDE_CACHE_PAD];
 
ct_slow_semaphore m_rdwset;
unsigned char m_crude_cache_pad_3[CRUDE_CACHE_PAD];
 
ct_slow_semaphore m_wrwset;
unsigned char m_crude_cache_pad_4[CRUDE_CACHE_PAD];
 
ct_fast_mutex m_wrlock;
unsigned char m_crude_cache_pad_5[CRUDE_CACHE_PAD];
 
 
ct_rwmutex() :
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
 
 
// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
// We need to wait for a writer.
m_rdwset.dec();
}
}
 
void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
// There is a writer
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
// We need to wake the writer up
m_wrwset.inc();
}
}
}
 
 
// WRITE, more hefty
void lock()
{
// Acquire exclusive access
m_wrlock.lock();
 
// we are the only thread in here now.
 
// Gain write access wrt m_count
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);
 
// count can never be negative.
if (count < RWMUTEX_COUNT_MAX)
{
// We detected readers.
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);
 
if (rdwake + RWMUTEX_COUNT_MAX - count)
{
// Okay, we need to wait for all of the readers
// The number of readers is actually
// RWMUTEX_COUNT_MAX - count
m_wrwset.dec();
}
}
}
 
// write unlock
void unlock()
{
// Release write access wrt m_count
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);
 
if (count < 0)
{
// We need to wake -count readers.
m_rdwset.add(-count);
}
 
// Release exclusive access
m_wrlock.unlock();
}
};
 
 
struct ct_simple_stack
{
struct node
{
node* volatile m_next; // to suppress optimization
unsigned int m_tid;
 
node(unsigned int tid) : m_tid(tid) {}
};
 
node* m_head;
 
ct_simple_stack() : m_head(nullptr) {}
 
~ct_simple_stack()
{
ct_simple_stack::node* n = flush();
 
while (n)
{
ct_simple_stack::node* next = n->m_next;
delete n;
n = next;
}
}
 
void push(node* n)
{
n->m_next = m_head;
m_head = n;
}
 
node* pop()
{
node* n = m_head;
 
if (n)
{
m_head = n->m_next;
}
 
return n;
}
 
 
node* flush()
{
node* n = m_head;
m_head = nullptr;
return n;
}
 
};
 
 
struct ct_shared
{
std::atomic<bool> m_run;
 
// protected by m_std_rwmutex
std::uint64_t m_reads;
std::uint64_t m_writes;
 
ct_simple_stack m_stack;
ct_simple_stack m_stack_dtor;
 
#if defined (CT_TEST_FAST_MUTEX)
ct_rwmutex m_std_rwmutex;
#else
std::shared_mutex m_std_rwmutex;

No comments: