soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

new benchmark for my read/write algorithm... - 17 Updates
an experimental read/write mutex... - 1 Update
neos - 7 Updates

new benchmark for my read/write algorithm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:51PM -0800

On 2/25/2019 10:25 PM, Bonita Montero wrote:
>> CAS will _always_ fail if the comparand is different than
>> the destination at the point of the atomic operation.

> Thank you for this insigt! I never would have known this otherwise!

Sorry for that. Well, I really do like your algorihtm. It is a very nice
piece of work. Thank you for sharing it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:52PM -0800

On 2/25/2019 10:23 PM, Bonita Montero wrote:
>> Writer A spins on CAS because it failed!
>> Reader B releases read access.

> Impossible that reader B has read-access such a little time.

Tell that to a general purpose read/writer lock... ;^)

> be done.
> If there was really meaningful work in reader B the locking
> -interval was at least that long that writer A gets to sleep.

Regardless, I like your algorihtm. Is has an elegance about it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:57PM -0800

On 2/26/2019 8:52 PM, Chris M. Thomasson wrote:
>> If there was really meaningful work in reader B the locking
>> -interval was at least that long that writer A gets to sleep.

> Regardless, I like your algorihtm. Is has an elegance about it.

Just wondering if you are interested in a version of your algorihtm that
is 100% pure c++11, and using the weakest memory order? The semaphore
and event would be emulated in c++11 using mutex/condvar. With that in
mind, it is going to "reduce" performance wrt raw Windows primitives.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:58PM -0800

On 2/26/2019 8:51 PM, Chris M. Thomasson wrote:

>> Thank you for this insigt! I never would have known this otherwise!

> Sorry for that. Well, I really do like your algorihtm. It is a very nice
> piece of work. Thank you for sharing it.

The performance is nice. Not bad at all.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 08:58AM +0100

>> Regardless, I like your algorihtm. Is has an elegance about it.

> Just wondering if you are interested in a version of your algorihtm
> that is 100% pure c++11, ...

How should that be possible? C++-whatever has no kernel-synchroniza-
tion-facilities like events or semaphores.

> The semaphore and event would be emulated in c++11 using mutex/condvar.

That would be inefficient.

It would be better to rewrite the code for POSIX with POSIX-sempahores.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 11:25AM +0100

>> The semaphore and event would be emulated in c++11 using mutex/condvar.

> That would be inefficient.

I reconsidered that: it wouldn't be that inefficient as the mutex and
the condvar would be used only in contention cases and when that happens
there would be kernel-calls which are a lot slower the additional over-
head over raw semaphores (and eventually events).
But I think it's more elegant because more minimalistic when you realize
that with the system facilities and have more than one implementation of
the rw-mutex.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 12:59PM +0200

On 27.02.2019 9:58, Bonita Montero wrote:
>> that is 100% pure c++11, ...

> How should that be possible? C++-whatever has no kernel-synchroniza-
> tion-facilities like events or semaphores.

By this argument a C++ program would also not be able to allocate memory
or open files as these are also kernel facilities.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 01:06PM +0100

>> tion-facilities like events or semaphores.

> By this argument a C++ program would also not be able to allocate memory
> or open files as these are also kernel facilities.

You are unable to read. I only said that C++ ha "no kernel
-synchronization-facilities like events or semaphores".

Or please tell me what the classes for semaphores and events are.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 04:29PM +0200

On 27.02.2019 14:06, Bonita Montero wrote:

> You are unable to read. I only said that C++ ha "no kernel
> -synchronization-facilities like events or semaphores".

> Or please tell me what the classes for semaphores and events are.

C++11 contains everything needed for efficient multithreading
programming. I trust the committee in that if semaphores and events are
not included then they are not needed for that purpose.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 03:35PM +0100

> C++11 contains everything needed for efficient multithreading
> programming.

I didn't tell the opposiste or: what C++11 supplies is sufficient in
most cases. But if you try to implement a rw-mutex you can't do this
that effient than with native events and semaphores.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 05:19PM +0200

On 27.02.2019 16:35, Bonita Montero wrote:

> I didn't tell the opposiste or: what C++11 supplies is sufficient in
> most cases. But if you try to implement a rw-mutex you can't do this
> that effient than with native events and semaphores.

OK, fair enough. I guess that's why there is std::shared_mutex added in
C++17.

scott@slp53.sl.home (Scott Lurndal): Feb 27 03:25PM

>C++11 contains everything needed for efficient multithreading
>programming. I trust the committee in that if semaphores and events are
>not included then they are not needed for that purpose.

Over the last three decades, I've found the mutexes and condition
variables have been sufficient for everything from mainframe
operating systems (which had, in 1985, mutex and condition variables
built right into the instruction set) to modern pthread apps.

http://vseries.lurndal.org/doku.php?id=instructions:lok

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:02PM -0800

>> be performed in this test at 60 seconds. The number of threads is
>> variable and is determined by std::hardware_concurrency * THREADS, set
>> at 8 in the test. So on my system the setup is:
[snip code]

> * GCC -Ofast -march=native -lstdc++ *
> *************************************

> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
[...]

Thank you! Sorry to ask, but can you try running version 0.1?

Try reloading the page at: https://pastebin.com/raw/1QtPCGhV

The newest version 0.1 should have:

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex

Testing Version 0.1: std::shared_mutex

Fwiw, here is the code:
____________________________________
/* Crude Read/Write Mutex Benchmark

This tests my algorithm ct_rwmutex vs. std::shared_mutex.

It shows how many reads and writes can be performed within
a fixed amount of time.

by: Chris M. Thomasson

version 0.1
__________________________________________*/

#include <thread>
#include <atomic>
#include <shared_mutex>
#include <condition_variable>
#include <iostream>
#include <functional>
#include <chrono>
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <climits>
#include <cstdint>
#include <vector>

// undefine to test std::shared_mutex
#define CT_TEST_FAST_MUTEX 1

#define THREADS 8 // multiplied by std::hardware_concurrency
#define NODE_PRIME 1000000 // number of nodes
#define CRUDE_CACHE_PAD 256 // crude cache pad
#define TEST_DURATION_SEC 60 // number of seconds for the test

// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
unsigned long m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_slow_semaphore(unsigned long state) : m_state(state) {}

void inc()
{
m_mutex.lock();
++m_state;
m_mutex.unlock();
m_cond.notify_one();
}

void add(unsigned long addend)
{
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state += addend;
}

m_cond.notify_all();
}

void dec()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == 0) m_cond.wait(lock);
--m_state;
}
};

// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_auto_reset_event() : m_state(false) {}

void signal()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state = true;
m_cond.notify_one();
}

void wait()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == false) m_cond.wait(lock);
m_state = false; // auto-reset
}
};

// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic<unsigned int> m_state;
ct_auto_reset_event m_waitset;

ct_fast_mutex() : m_state(0) {}

void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}

void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};

// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions

// The number of readers _must_ never exceed LONG_MAX!

#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD];

// shared state
std::atomic<long> m_count;
unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD];

std::atomic<long> m_rdwake;
unsigned char m_crude_cache_pad_2[CRUDE_CACHE_PAD];

ct_slow_semaphore m_rdwset;
unsigned char m_crude_cache_pad_3[CRUDE_CACHE_PAD];

ct_slow_semaphore m_wrwset;
unsigned char m_crude_cache_pad_4[CRUDE_CACHE_PAD];

ct_fast_mutex m_wrlock;
unsigned char m_crude_cache_pad_5[CRUDE_CACHE_PAD];

ct_rwmutex() :
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}

// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
// We need to wait for a writer.
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
// There is a writer
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
// We need to wake the writer up
m_wrwset.inc();
}
}
}

// WRITE, more hefty
void lock()
{
// Acquire exclusive access
m_wrlock.lock();

// we are the only thread in here now.

// Gain write access wrt m_count
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);

// count can never be negative.
if (count < RWMUTEX_COUNT_MAX)
{
// We detected readers.
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
// Okay, we need to wait for all of the readers
// The number of readers is actually
// RWMUTEX_COUNT_MAX - count
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
// Release write access wrt m_count
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);

if (count < 0)
{
// We need to wake -count readers.
m_rdwset.add(-count);
}

// Release exclusive access
m_wrlock.unlock();
}
};

struct ct_simple_stack
{
struct node
{
node* volatile m_next; // to suppress optimization
unsigned int m_tid;

node(unsigned int tid) : m_tid(tid) {}
};

node* m_head;

ct_simple_stack() : m_head(nullptr) {}

~ct_simple_stack()
{
ct_simple_stack::node* n = flush();

while (n)
{
ct_simple_stack::node* next = n->m_next;
delete n;
n = next;
}
}

void push(node* n)
{
n->m_next = m_head;
m_head = n;
}

node* pop()
{
node* n = m_head;

if (n)
{
m_head = n->m_next;
}

return n;
}

node* flush()
{
node* n = m_head;
m_head = nullptr;
return n;
}

};

struct ct_shared
{
std::atomic<bool> m_run;

// protected by m_std_rwmutex
std::uint64_t m_reads;
std::uint64_t m_writes;

ct_simple_stack m_stack;
ct_simple_stack m_stack_dtor;

#if defined (CT_TEST_FAST_MUTEX)
ct_rwmutex m_std_rwmutex;
#else
std::shared_mutex m_std_rwmutex;

soft and program

Wednesday, February 27, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me