soft and program: February 2019

comp.lang.c++@googlegroups.com

Google Groups

new benchmark for my read/write algorithm... - 8 Updates
is there any UB in this integer math... - 3 Updates

new benchmark for my read/write algorithm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:28PM -0800

On 2/20/2019 2:33 PM, Melzzzzz wrote:
>>> Who would do this in year 2019? Recompile in 64-bit and measure again!

>> Just tested with a x64 build, and here are the results wrt version 0.1:

>> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
[...]
>> Nice! Humm... Iirc, unsigned long on Windows is only 32-bit?

> Yep. And long double is 64 bits if even hardware provides 80 bits...

Fwiw, on Intel, I want fetch_add to use LOCK XADD, this is very
important. Let's check out a little test on GodBolt:

https://godbolt.org/z/yruhAL // using long

https://godbolt.org/z/NKYDeI // using short

Wrt my ct_rwmutex, it better use LOCK XADD no matter what damn it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:01PM -0800

On 2/20/2019 10:03 PM, Chris M. Thomasson wrote:
> }

> return cmp;
> }
[...]

Notice how I am using compare_exchange_strong? I think, this thought is
before testing, that your algorihtm should work fine with
compare_exchange_weak. The weak version does not impose some rather
hefty requirements on the CAS. It is more "friendly" on systems that use
LL/SC to emulate CAS.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 03:42PM -0800

On 2/20/2019 10:03 PM, Chris M. Thomasson wrote:
>> This is similar shared_mutex, but just for Win32 only.
>> My shared mutex gives shared readers always priority over
>> writers (don't know what std::shared_mutex does instead).

Our algorithms are getting similar reads, but mine and std::shared_mutex
get many more writes. This is due to your explicit writer starvation,
aka reader priority over all else. Also, your looping on CAS can give my
loopless algorithm an advantage wrt more intensive reads for longer
periods of time. The code at the end of this post uses your raw
algorihtm posted as-is, I included windows.h and intrin.h.

It tests your work against std::shared_mutex, and should compile with
MSVC 2017. Have not tested it on GCC yet. I can port your algorihtm to
pure C++11, but then it would have to use event and semaphore created
with mutex/condvar, and not Windows primitives directly.

Keep in mind that my work is using 100% pure C++11, and yours is using
windows specific primitives.

Afaict, your work is faster than std::shared_mutex on my end, but
consistently suffers from writer starvation. This can be an issue. Well,
fwiw, here is code for a test of your algorihtm vs std::shared_mutex, it
should compile for you wrt windows:

_________________________________________________

/* Crude Read/Write Mutex Benchmark

This tests Bonita's algorithm SharedMutex vs. std::shared_mutex.

It shows how many reads and writes can be performed within
a fixed amount of time.

by: Chris M. Thomasson

version 0.1
__________________________________________*/

#include <thread>
#include <atomic>
#include <shared_mutex>
#include <condition_variable>
#include <iostream>
#include <functional>
#include <chrono>
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <climits>
#include <cstdint>
#include <vector>

// undefine to test std::shared_mutex
#define CT_TEST_FAST_MUTEX 1

#define THREADS 8 // multiplied by std::hardware_concurrency
#define NODE_PRIME 1000000 // number of nodes
#define CRUDE_CACHE_PAD 256 // crude cache pad
#define TEST_DURATION_SEC 60 // number of seconds for the test

// Bonita work start
#define WIN32_LEAN_AND_MEAN

// Bonita Montero Read/Write Mutex
// Interface modified for std::shared_mutex compatibility.
#include <Windows.h>
#include <intrin.h>
#include <cassert>

class SharedMutex
{
public:
SharedMutex();
~SharedMutex();
void lock();
void unlock();
void lock_shared();
void unlock_shared();

private:
volatile LONG m_lRWCounters; // lower 16 bits: writers including
the one that owns the lock
// bits 16-30: count of threads
having read-access or waiting for reac-access
// sign-bit: readers active
volatile DWORD m_dwOwningThreadId;
volatile DWORD m_dwRecursionCount;
HANDLE m_hEvtReleaseWriter;
HANDLE m_hSemReleaseReaders;
};

SharedMutex::SharedMutex()
{
m_lRWCounters = 0;
m_dwOwningThreadId = 0;
m_dwRecursionCount = 0;
m_hEvtReleaseWriter = CreateEvent(NULL, FALSE, FALSE, NULL);;
m_hSemReleaseReaders = CreateSemaphore(NULL, 0, 0x7FFFFFFF, NULL);
}

SharedMutex::~SharedMutex()
{
assert(m_dwOwningThreadId == 0);
assert(m_dwRecursionCount == 0);

CloseHandle(m_hEvtReleaseWriter);
CloseHandle(m_hSemReleaseReaders);
}

void SharedMutex::lock()
{
DWORD dwCurrentThreadId;

if (m_dwOwningThreadId == (dwCurrentThreadId = GetCurrentThreadId()))
{
m_dwRecursionCount++;
return;
}

LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters; ; lRWCounters = lOldRWCounters)
{
assert((lRWCounters & 0x0FFFF) != 0x0FFFF);

if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, lRWCounters + 1,
lRWCounters)) == lRWCounters)
if (lOldRWCounters == 0)
{
m_dwOwningThreadId = dwCurrentThreadId;
return;
}
else
break;

}

WaitForSingleObject(m_hEvtReleaseWriter, INFINITE);
m_dwOwningThreadId = dwCurrentThreadId;
}

void SharedMutex::unlock()
{
DWORD dwRecursionCount;

assert(m_dwOwningThreadId == GetCurrentThreadId());

if ((dwRecursionCount = m_dwRecursionCount) != 0)
{
m_dwRecursionCount = dwRecursionCount - 1;
return;
}

m_dwOwningThreadId = 0;

LONG lRWCounters,
lOldRWCounters;

// sharers have priority over writers

for (lRWCounters = m_lRWCounters; ; lRWCounters = lOldRWCounters)
{
assert(lRWCounters > 0);

if ((lRWCounters & 0x7FFF0000))
{
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 1) |
(LONG)0x80000000u, lRWCounters)) == lRWCounters)
{
ReleaseSemaphore(m_hSemReleaseReaders, (lRWCounters &
0x7FFF0000) >> 16, NULL);
return;
}
}
else
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 1,
lRWCounters)) == lRWCounters)
{
if ((lOldRWCounters & 0x0FFFF) > 1)
SetEvent(m_hEvtReleaseWriter);
return;
}
}
}

void SharedMutex::lock_shared()
{
LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters; ; lRWCounters = lOldRWCounters)
{
assert((lRWCounters & 0x7FFF0000) != 0x7FFF0000);

// even shared access can be recursively, but in this case the
sharer-count will increment

if (lRWCounters <= 0)
{
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000) |
(LONG)0x80000000u, lRWCounters)) == lRWCounters)
return;
}
else
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000),
lRWCounters)) == lRWCounters)
{
WaitForSingleObject(m_hSemReleaseReaders, INFINITE);
return;
}
}
}

void SharedMutex::unlock_shared()
{
LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters; ; lRWCounters = lOldRWCounters)
{
assert(lRWCounters < 0 && (lRWCounters & 0x7FFF0000) != 0);

if ((lRWCounters & 0x7FFF0000) == 0x10000)
{
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 0x10000) &
(LONG)0x7FFFFFFF, lRWCounters)) == lRWCounters)
{
if ((lRWCounters & 0x0FFFF) != 0)
SetEvent(m_hEvtReleaseWriter);
return;
}
}
else
if ((lOldRWCounters =
_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 0x10000,
lRWCounters)) == lRWCounters)
return;
}
}
// Bonita work end

struct ct_simple_stack
{
struct node
{
node* volatile m_next; // to suppress optimization
unsigned int m_tid;

node(unsigned int tid) : m_tid(tid) {}
};

node* m_head;

ct_simple_stack() : m_head(nullptr) {}

~ct_simple_stack()
{
ct_simple_stack::node* n = flush();

while (n)
{
ct_simple_stack::node* next = n->m_next;
delete n;
n = next;
}
}

void push(node* n)
{
n->m_next = m_head;
m_head = n;
}

node* pop()
{
node* n = m_head;

if (n)
{
m_head = n->m_next;
}

return n;
}

node* flush()
{
node* n = m_head;
m_head = nullptr;
return n;
}

};

struct ct_shared
{
std::atomic<bool> m_run;

// protected by m_std_rwmutex
std::uint64_t m_reads;
std::uint64_t m_writes;

ct_simple_stack m_stack;
ct_simple_stack m_stack_dtor;

#if defined (CT_TEST_FAST_MUTEX)
//ct_rwmutex m_std_rwmutex;
SharedMutex m_std_rwmutex;
#else
std::shared_mutex m_std_rwmutex;

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

is there any UB in this integer math... - 10 Updates
new benchmark for my read/write algorithm... - 15 Updates

is there any UB in this integer math...

Paavo Helde <myfirstname@osa.pri.ee>: Feb 21 11:56AM +0200

On 21.02.2019 11:24, Öö Tiib wrote:

> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> usage. The patch is up within a week.

> b) Without it 0.2% runs result with incorrect answers.

This does not follow. As pointed out by others, the intermediate
overflows can still produce correct results when balanced with opposite
overflows or cast to unsigned.

Based on my limited experience, over half of -ftrapv crashes would be
false alarms. Depends on the codebase, of course.

Now it would be up to the management to decide whether the problems
caused by false alarms on customer sites justify the earlier catching of
non-false alarms.

Manfred <noname@add.invalid>: Feb 19 09:55PM +0100

On 2/19/2019 8:57 PM, Bart wrote:
>> As a confirmation of earlier arguments, here -1294967296 as a result
>> falls into the category of 'rubbish', 3000000000 does not.

> So just like 1u-2u, but that one apparently is not classed as rubbish.

1u-2u is rubbish in conventional arithmetic, but it is not in wrapping
arithmetic.
Any programmer knows that the result is 0xFFFFFFFF, and this is what C
guarantees.

>> clear value for unsigned ones (see earlier example).

> After 40 years of experiencing exactly that behaviour on a fair number
> of machines and languages, it's what I'm come to expect.

I'll rephrase: C requires, like for most program behaviours, that
wrapping be intentional for the programmer to use.
Since an intentional use of wrapping implies that all bits follow the
same semantics, then the unsigned type is the correct choice.

Using wrapping arithmetic with a signed type is not correct - purely
from a binary logic point of view, because that first bit would be
arbitrarily supposed to behave like the others - thus C assumes that a
careful programmer will not use it with signed types, and leaves freedom
to compiler writers to use this feature space anyhow they like, by
specifying undefined behavior.

I would guess that since this has been clearly and intentionally
specified in the standard, at least /some/ use for this 'undefinedness'
must have been foreseen by the committee.

Manfred <noname@add.invalid>: Feb 19 04:35PM +0100

On 2/19/2019 4:01 PM, Bart wrote:
> example. How is it treating the top bit? It doesn't matter. The x86 will
> set flags both for an unsigned add, and a signed one. The resulting bit
> pattern is the same in both cases.

I know, but in the case of 'signed' It Happens to Work™, under the
assumption of an underlying two's complement representation.
In the case of 'unsigned' It Just Works™, guaranteed by the standard and
with no assumptions required.
When dealing with any kind of workable logic (math, algorithms, etc.)
the possibility of dropping an assumption is an added value.

> C however (and presumably C++ inherits) says that the result is
> undefined for the second 'add' operation, even it is always well-defined
> for x86.

You know that C as a language and x86 as an architecture are two
different things.

> No signed arithmetic is going on. But if you interpret the 4294967293
> result as two's complement, with the top bit as a sign bit, then you
> will get -3 if printed.

This too is based on the assumption of two's complement.

> Similarly, the c+d in the first example gives -1294967296, but
> 3000000000 if the same bit pattern is interpreted as an unsigned value.

As a confirmation of earlier arguments, here -1294967296 as a result
falls into the category of 'rubbish', 3000000000 does not.

> The two representations are closely related, in that the corresponding
> bit patterns are interchangeable, but C refuses to acknowledge that.

The two representations are interchangeable under the assumption of
two's complement. C has made the choice of lifting this constraint.

> That might be because two's complement representation is not universal,
> but it means some C compiler ending up doing unexpected things even on
> the 99.99% of machines which do use it.

I wouldn't look at this this way. I believe the rationale is that
wrapping overflow has negligible use for signed types, while it has
clear value for unsigned ones (see earlier example).

More than the compiler doing unexpected things, it is that C requires
the programmer to pay attention to details.
It is not distracted-friendly.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 19 10:31PM +0200

On 19.02.2019 21:57, Bart wrote:
> upper limit of +127. But what about (120+20)-20?

> This should end up back as 120 using two's complement, but according to
> C, it's undefined because an intermediate value overflows.

And rightly so. Try e.g. (120+20)/2 - 20, with those imaginary 8-bit
wrap-around rules the result would be -78, not the expected 50.

In C++ 8-bit ints get promoted to normal ints, so the above is not a
real example, but there are actual problems with larger numbers.
Effectively you are saying that some sloppy code is OK because it
accidentally happens to work most of the time, whereas some other very
similar sloppy code is not OK.

On the same token one could claim that writing to a location after a
dynamic array of odd number of elements is OK as the memory allocator
would leave an unused alignment gap at that place anyway.

David Brown <david.brown@hesbynett.no>: Feb 20 12:26PM +0100

On 19/02/2019 17:12, Öö Tiib wrote:

> Note that "they only work on benchmark" is not quote of mine. I can't
> defend positions of straw-men that you build. ;-) Will you tell next
> that I wrote somewhere that "I do not care about performance"?

I did not mean to imply that you used those words - I meant it as a
general sort of remark that I have seen made many times. Some people
are convinced that compiler makers are concerned only about the speed of
benchmarks, not how their compiler works for other code.

> and the likes of "Meltdown and Spectre" are not result of overeager
> optimizing of those and even when it seems to be so then nothing of it
> is measured with benchmarks compiled on those compilers?

Meltdown and Spectre are both /hardware/ problems. Can we at least
agree on that? Compilers have been able to add additional code to
reduce the effect of the problems - but this is /not/ because the
compilers were previously generating incorrect or suspect code.

(And while it's easy to see in hindsight that Intel should have been
more careful with speculation across privilege levels, it was harder to
see that in advance. There is no way to make a system where code from
multiple sources can run and share resources, without information
leakage at least in theory.)

>> happy with the tools - they don't care if you use a different compiler
>> instead.

> The programmers they target are not overly concerned with performance?

Of course performance is important to many people - compiler writers and
compiler users included. Performance on /benchmarks/ is of smaller
concern, especially of artificial benchmarks. Programmers want /their/
programs to run fast.

> My impression is that programmers start to post about performance before
> figuring out how to turn the optimizations on.

It is often the case that people who understand less, post more. (That
is in no way a comment about anyone posting here, in case you feel I was
trying to be insulting.)

>> care about benchmarks - they care about performance on their own software.

> They do not use compilers to build their software and so do not care about
> compiler optimizations?

I really don't understand your point. I said they /do/ care about
performance - and therefore also about compiler optimisations.

>> quality of their code, and making something worthwhile - they don't care
>> about benchmark performances.

> Again there are developers who don't care about performance?

I think you have been misunderstanding my point all along here. I
apologise if I have been unclear.

Yes, compiler writers care about performance and optimisations. I have
said this all along.

No, compiler writers are /not/ solely concerned about performance and
optimisations. It is merely one of many factors. The same applies to
people choosing compilers.

And it is definitely not the case that compiler writers care /primarily/
about the speed on artificial benchmark programs (like SPEC benchmarks).
They do not write their compilers with an aim of getting top marks on
these benchmarks. That does not mean they don't run them sometimes, and
compare figures (especially if there are regressions). It just means
that they do not use them as driving forces.

It is not the case that compiler writers are always looking for new ways
to take undefined behaviour in source code and turn it into a mess in
the object code. Rather, it /is/ the case that compiler writers have no
responsibility to try to generate "correct" results from incorrect
source code. They have /never/ had that responsibility. From the days
of the first C compilers 40+ years ago, signed integer overflow has been
undefined behaviour and compilers could assume it never happened if that
helped optimisation. Code that assumes wrapping behaviour is either
tied to a specific compiler (if the compiler documents that behaviour),
or it is /wrong/. As compilers have got more sophisticated, there have
been more occasions where incorrect code had predictable behaviour in
the past. That is why compiler writers add extra "this code has
mistakes, but worked with the behaviour of the old compiler" flags like
"-fwrapv" or "-fno-strict-aliasing".

> However it seems that there are only few weirdos like me who think that
> it does not matter how fast the wrong answers are calculated and consider
> it better when those wrong answers are not calculated at all.

/I/ don't care how fast wrong answers are calculated. I just don't
expect compilers to calculate right answers from wrong code. And I do
care how fast right answers are calculated from correct code.

>> C or C++ sources?

> What conspiracy theory? Where did I say that they disregard support for
> existing source code?

You wrote "Since compiler writers are people with extremely
benchmark-oriented head shape" and then suggested it is because that's
what their employers tell them to do - /that/ is the "conspiracy theory".

> Go, Swift, C# or Java. So indeed they might want to "extend" crappy
> "features" and "optimizations" into C++ that they won't ever use in
> and with their own code.

You think Microsoft would intentionally introduce broken features or
mistakes into their C++ compilers to push people towards using C# ?

> optimizations piled together. That can be tricky to get such a pile
> correct and if to prioritize correctness below performance then defects
> slip through. Same things do happen with compiler optimizations.

My understanding of the hardware problems here is that the operation was
perfectly correct - but there was information leakage. A certain degree
of information leakage is inevitable when there are shared resources,
but this situation could be exploited more conveniently than many others.

Compiler optimisations are a different game altogether. The main
similarity seems to be "they are both hard, so someone might make a
mistake".

> compiler options) but that might also change out of blue when
> a way to "optimize" it will be discovered, and then we need to
> add some -fplease-dont-remove-divide-by-zero-trap I suspect.

"-ftrapv" was somewhat poorly conceived and specified in the first place
(IMHO), and has only got worse. Originally compilers were relatively
straightforward - if your code said "x = y + z;", you'd get an "add"
assembler instruction. "-ftrapv" just meant adding a "trap if overflow
flag set" instruction after it. Then things got complicated - what
about "x = y * 5", that can be done with a single "lea" instruction (on
x86)? What about "x = y + z - y;" - do you simplify without trapping,
or do you detect overflows on the intermediary values? The original
idea was to have a cheap and simple way to spot overflow errors, but the
practice is a good deal more complicated, and subject to a lot of
opinions as to what kinds of checks should be done.

So if I wanted code to check for overflows, I'd do it specifically in
the situations where it was appropriate - using __builtin_add_overflow
and friends, or more manual checking (using C++ classes to keep it all
neat). And for debugging or testing, -fsanitize=undefined is a good choice.

>> should be dropped in favour of the sanitizer, which is a more modern and
>> flexible alternative and which is actively maintained.

> Sanitizers sound like debugging options.

Yes.

> Why two almost equal features
> are developed into same tool?

-ftrapv is a debugging tool. It is not really appropriate for final
binaries - it is too unpredictable, and plays havoc with code
optimisation. In particular, expressions involving several parts are
either significantly slower as each bit needs to be handled separately,
or you don't get full overflow checking. (I have no idea where the
lines are drawn in practice in gcc.)

> and "unreliable" and then some third feature will be the "correct"
> way to make programs to crash on signed integer overflow. With
> feature creep after a while nothing is reliable.

There is /no/ "correct way to make programs crash on undefined
behaviour". There never has been, and never will be. The best you can
get are tools to help you find the bugs in your program when testing and
debugging. Maybe one day other tools will replace the sanitize options.

Options like "-ftrapv" have never been about changing the semantics of C
to give a defined behaviour to signed integer overflow. (This is
different from -fwrapv.) If you want to complain about poor or
misleading documentation in the gcc manual page, then I can agree with that.

C and C++ are languages which trust the programmer - they expect you to
write code that is correct and will not misbehave. They have little or
no checking unless you add it manually, which is how you can get
efficient code. But there is nothing hindering you from manual checks -
especially in C++ where you can make classes with operators to simplify
the process. (Yes, I know that means you can get overflow checking in
your own code, but can't easily add it to existing code.)

> How is it so self-evident that -ftrapv is unreasonable option?
> Correctness of behavior is more important than performance of behavior
> and incorrect behavior is often worse than no behavior whatsoever.

"-ftrapv" does not turn incorrect behaviour into correct behaviour. At
best, it helps you spot the incorrect behaviour so that you can fix the
program. (And that's a useful thing for testing and debugging.)

> dereference away, other optimizes null pointer check away (since it
> was "after" dereference) and result is that neither dereference does
> crash nor check work.

You can be sure it works by writing the code correctly. The "null
pointer" incident you are alluding to was a serious bug in the Linux
source code - changes to the compiler optimisation affected the
consequences of the bug, but did not introduce the bug. Compilers
assume that the programmer obeys the rules of the language - including
the "thou shalt not dereference a null pointer" rule.

David Brown <david.brown@hesbynett.no>: Feb 21 10:58PM +0100

On 21/02/2019 18:35, Öö Tiib wrote:

>> In what world are these checks "free" ?

> Free in sense that I do not have to write and to use classes that do
> same thing and likely even less efficiently than those two options.

Classes could be more flexible and more efficient (certainly more
efficient than the -ftrapv code), and give you accurate control of when
you use overflow detection instead of an "all or nothing" choice. But I
agree that it takes effort to make and use them.

> I have repeatedly expressed my lack of knowledge about any beings in
> this universe who can write correct programs. Only "programmers" who
> do not write incorrect programs are those who do not write programs.

It is entirely possible to eliminate large classes of bugs - at least on
certain kinds of programming. For most of my work, I know that my code
has /no/ bugs with dynamic memory allocation or freeing. It has /no/
bugs on signed integer overflow. It never has divide by zeros, or
shifts of negative numbers. It does not have code that accidentally
uses variables without initialising them. I can't promise I have no
out-of-bounds array accesses, but they should be extraordinarily rare -
as will buffer overflows.

And even if you assume that it is plausible that you have signed integer
overflow, how do you think -ftrapv improves matters? Clearly it could
be useful in testing and debugging - that is what it is for. But in
released code, who would be happy with a crash and an overflow error
message?

I could understand this more if we were talking about a type of error
that is harder to avoid, or harder to catch in testing - things like
race conditions and deadlocks in multi-threaded code. But it is /easy/
to avoid signed integer overflow. It is /easy/ to write your code in a
way that it can't happen, or that can detect it appropriately and
respond sensibly, rather than crashing.

>> that your car breaks down on the journey.

> So you would prefer b) or have some kind of c) that results with something
> better in that situation or what you were trying to say?

Sorry - it looks like you have cut part of your answer here. I am not
sure what you meant to write.

Rosario19 <Ros@invalid.invalid>: Feb 22 11:05AM +0100

On Sun, 17 Feb 2019 20:23:13 -0800, "Chris M. Thomasson"
> gcount += addend;
> return lcount;
>}

the behaviour can not be definited because long is not a fixed size
type is it 32 bit? is it 64? than even if it is fixed to 64 bit for
example: is the operation + defined from standard to all value in the
range over is defined?

Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 22 12:33PM

On Thu, 2019-02-21, David Brown wrote:
> <https://godbolt.org> and try some samples with optimisation on (-O1 or
> -O2), with and without -"ftrapv" and
> "-fsanitize=signed-integer-overflow". Let me give you a couple of examples:
...

> In what world are these checks "free" ?

Devil's advocate: when you have way more computing power than you
need, you can spend it on things like this.

...
>> leave -ftrapv on forever.

> In my world, a program that crashes with a message "overflow detected"
> is /not/ correct.

But at least you don't get incorrect calculations, which seems to be
his point.

It's not hard to see what you both mean.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Brown <david.brown@hesbynett.no>: Feb 22 01:31PM +0100

On 22/02/2019 12:05, Öö Tiib wrote:
> tells that they found 43 sites of undefined integer overflows in
> Microsoft SafeInt library (that is supposed to be such classes and
> functions).

What can I say? You are right that most software has bugs - and that
many projects don't have a development methodology that is strong enough
to stop such preventable bugs. Integer overflows are just bugs like
anything else - but they are preventable bugs as long as you take
reasonable care.

As for the particular case of MS's "SafeInt" library - let me just say
that finding 43 bugs in it does not harm their reputation in my eyes.

I am only familiar with one name on that paper - John Regehr. I have
read other publications by him. He seems to make a living from making
condescending and sarcastic criticisms of C and C++ without really
appreciating the point of the languages. His is particularly critical
of compilers that support experienced developers and assume the
programmers actually understand the language they are using - he appears
to think C programmers believe the language is just Java without
classes, and compilers should behave accordingly.

Still, the paper had some interesting points. One surprising case is
the expression "char c = CHAR_MAX; c++;" which they say varies according
to compiler. "The question is: Does c get promoted to int before being
incremented? If so, the behaviour is well-defined. We found
disagreement between compiler vendors' implementations of this
construct". The C standards are quite clear on the matter - "c++;" has
the result of the original value of c, and the side effect of "c = c +
1;", for which c is first promoted to "int". If CHAR_MAX = INT_MAX,
this is undefined - otherwise it is well defined. Converting back to
"char" is implementation-defined if "char" is signed, well-defined
modulo conversion if it is unsigned.

The paper rightly points out that some programmers /don't/ understand
that that signed integer overflow is undefined behaviour - or that they
"know" the rules but think it is safe to break them and assume wrapping
behaviour. (Typically this is because they have tested some code and
seen that it works - for that code, with that compiler version, and
those flags.) This is one reason why gcc has the "-fwrapv" flag - if
you are faced with third-party code and you are not sure the programmer
understands that signed integers do not wrap, you can use the "-fwrapv"
flag. If the code is good, it will make no difference to the results
(but have some impact on the efficiency). If the code is bad, it will
run the way the programmer expected.

> Everybody. The sole correct handling of programming defects is to fix
> those. Crashes get fixed fastest and so the total count of that defect
> ever manifesting in practice will be smallest.

Wrong.

An alternative handling of defects is to minimise the damage, to
maximise the use of the imperfect program. It is a well established
"fact" that all complex programs have bugs in them. If they were all to
stop with an error report on each bug, the software would be useless
until all bugs were eliminated. That might be nice in a perfect world -
in the real world, however, it would mean waiting decades for testing
and development to finish on big systems. People complain about Windows
- imagine how much more they would complain if it blue-screened for
every minor bug. And imagine the "fun" if your aeroplane flight
controller stopped with an error message because of a bug in a function
for a warning light.

Even during testing, stopping dead on a bug is not necessarily the best
choice - it may be better to collect more data first.

>> respond sensibly, rather than crashing.

> Misbehavior on case of unusually large numbers for example because
> of corrupt data can be quite tricky to test and quite easy to miss.

If your data comes from a source where it could be bad, then you check it.

> Yes. I am still confused, it does read like you are not discussing the
> practical example above that I gave (and to what you seem to respond
> with above).

I think the direction of the discussion has slid somewhat (no doubt it
is my fault), and we may not be discussing the same things. Perhaps it
is time to wind it down a bit?

David Brown <david.brown@hesbynett.no>: Feb 22 04:35PM +0100

On 22/02/2019 13:33, Jorgen Grahn wrote:

>> In what world are these checks "free" ?

> Devil's advocate: when you have way more computing power than you
> need, you can spend it on things like this.

In such cases, C or C++ is probably not the language you want. And even
if it /is/ the language you want, you would be better using integer
classes that are checked and where you have clear and user-defined
control of what happens on overflow, rather than using a poorly
specified and imperfectly implemented compiler-specific flag. (Though
apparently you should avoid using MS's SafeInt library!)

>> is /not/ correct.

> But at least you don't get incorrect calculations, which seems to be
> his point.

Agreed. But "-ftrapv" is not, as far as I can see, a good way to get that.

> It's not hard to see what you both mean.

Yes.

new benchmark for my read/write algorithm...

Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 10:01PM +0200

On 20.02.2019 21:24, Bonita Montero wrote:
>>> volatile DWORD m_dwRecursionCount;

>> 'volatile' is neither needed nor sufficient for multithreading code.

> It is just to document that it may change asyncronously.

If so, I see they are accessed without proper locking all over the
place. For example, there is no memory barrier after assigning to
m_dwOwningThreadId in Lock() and before potential reading it in in
another Lock() in another thread, so there is no guarantee the other
thread ever sees the changed value.

However, now I looked at the MSDN site and saw that by Microsoft rules
indeed volatile causes additional memory barriers when compiled with
/volatile:ms (which is the default, except for ARM). So my original
comment was wrong, volatile is indeed needed for this code to work, but
the code is restricted to not just to Windows, but also requires MSVC
compiler and its /volatile:ms regime.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 20 09:34PM +0100

> If so, I see they are accessed without proper locking all over the
> place.

The flags for the reader- and writer-count and writer-flag (sign bit)
are the lock themselfes. And the other two members are only modified
by the writer that owns the mutex.

> For example, there is no memory barrier after assigning to
> m_dwOwningThreadId in Lock() ....

That's not explicitly necesary because the Xchg-Intrinsic before has
aquire and release behaviour.

> ... and before potential reading it in in another Lock()
> in another thread, so there is no guarantee the other

I think you consider this code in the beginning of Lock():

if( m_dwOwningThreadId == (dwCurrentThreadId = GetCurrentThreadId()) )
{
m_dwRecursionCount++;
return;
}

There is absolutely no barrier necessary here! If the thread that does
this check is the same that owns the mutex everything is ok because
the barrier is the Xchg in the outer Lock(). If the thread doesn't
own the mutex, the m_dwOwningThreadId may change at any time and may
have any value different than the current thread id.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:46PM -0800

On 2/20/2019 4:50 AM, Paavo Helde wrote:
>> (0D8B00Ch)]

> What I see from here is that apparently you are running in 32-bit mode.
> Who would do this in year 2019? Recompile in 64-bit and measure again!

OUCH! ARGH.... Will do. Thanks. ;^o

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:53PM -0800

On 2/20/2019 10:25 AM, Bonita Montero wrote:
> }
> }

> I'm pretty sure you won't understand the code.

Why not? I need to give it a good read, and then test it out in Relacy
Race Detector:

http://www.1024cores.net/home/relacy-race-detector

> But the performance should be unbeatable. ;-)

Nice. Okay, I will try to get it up and running in Relacy. It can
emulate Windows sync primitives, pretty nice. Once I get it running in
there, to make sure it passes tests, then I can add it to my benchmark
code. It will be fun. Thanks for the algorithm Bonita. :^)

There is a main different between your algorithm and mine, notice that I
avoid the use of CAS and loops. I am looking forward to giving your
algorithm a go. Thanks again. :^)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:41PM -0800

On 2/20/2019 10:20 PM, Bonita Montero wrote:

> And you disbled recursive locking. I think if it is possible to
> do shared locking recursively implicitly it should be able to to
> exclusive locking explicitly.

After I get the weakest memory model, it is going to be fun to test your
work against Windows SRW. It should beat it because of the way your are
waiting on the semaphore and event. In other words, they are fire and
forget, and not hooked up to any looping. Nice. :^)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:11AM -0800

On 2/19/2019 11:59 PM, Ian Collins wrote:
>> #define CRUDE_CACHE_PAD 256 // crude cache pad
>> #define TEST_DURATION_SEC 60 // number of seconds for the test

> General whinge: these ugly macros are bad form in C++!

Yeah... Habit from C.

Wrt, THREADS being 8, well, yeah. That can get gnarly. Sorry about that.

> $ g++ -Ofast x.cc -lpthread -std=c++17
> $ ./a.out
> Testing: std::shared_mutex
[...]
> writes_per_tick = 8
> Ticks = 60.0055
> ___________________________________
[...]
> $ g++ -Ofast x.cc -lpthread -std=c++17
> $ ./a.out
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
[...]
> writes_per_tick = 61
> Ticks = 60.0135
> ___________________________________

Ahh, mine loses here by 98 reads per-second. Yet wins wrt writers by 53
reads a second. Humm... This is definitely different than my other
benchmark. Need to think about this...

Thanks for giving it a go Ian. :^)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:21AM -0800

On 2/19/2019 10:49 PM, Chris M. Thomasson wrote:
[..]
> ___________________________________
[...]
>         unsigned int m_tid;

>         node(unsigned int tid) : m_tid(tid) {}
>     };
[...]

Humm... The ct_simple_stack::node::m_next should probably be a volatile
pointer...

>     {
>         shared.m_std_rwmutex.lock_shared();

>         ct_simple_stack::node* n = shared.m_stack.m_head;

///////////////////////////
>             ct_simple_stack::node* next = n->m_next;

>             n = next;
>         }
///////////////////////////

You know... I am wondering if the optimizer can omit the while loop
above. Shi%.

>     shared.m_reads += reads;
>     shared.m_std_rwmutex.unlock();
> }
[...]

Let me change it to a volatile pointer and run the tests again...

I edited the code at: https://pastebin.com/raw/1QtPCGhV

Just reload the page and you should see version 0.1 pop up. The new code
will always have Testing Version 0.1 in front of it. Here are my much
different timings:

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex

___________________________________
Raw Reads: 54195
Raw Writes: 3232
reads_per_tick = 902
writes_per_tick = 53
Ticks = 60.0432
___________________________________

Testing Version 0.1: std::shared_mutex

___________________________________
Raw Reads: 23452
Raw Writes: 1873
reads_per_tick = 390
writes_per_tick = 31
Ticks = 60.0513
___________________________________

Very different. Mine is still beating std::shared_mutex on my end. Still
have to check if std::shared_mutex is using SRW on my end to address Paavo.

Sorry about that. But, can you try running Version 0.1?

Bonita Montero <Bonita.Montero@gmail.com>: Feb 20 07:29PM +0100

And just ignore that there is no error-handling for CreateEvent,
CreateSemaphore, SetEvent and ReleaseSemaphore. This is just to
show the basic concent.
And no, there are no additional barriers necessary!

Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 02:50PM +0200

On 20.02.2019 11:57, Chris M. Thomasson wrote:

> 00D7C8F7 call dword ptr [__imp__AcquireSRWLockShared@4 (0D8B00Ch)]

What I see from here is that apparently you are running in 32-bit mode.
Who would do this in year 2019? Recompile in 64-bit and measure again!

Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 09:22PM +0200

On 20.02.2019 20:25, Bonita Montero wrote:
> // sign-bit: readers active
> volatile DWORD m_dwOwningThreadId;
> volatile DWORD m_dwRecursionCount;

'volatile' is neither needed nor sufficient for multithreading code.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 03:28PM -0800

>> be performed in this test at 60 seconds. The number of threads is
>> variable and is determined by std::hardware_concurrency * THREADS, set
>> at 8 in the test. So on my system the setup is:
[...]

Wrt Version 0.1, remember my first ct_rwmutex that did not use
ct_fast_mutex, but instead ct_fast_semaphore with an initial count of 1?
Now, it would be beneficial to test against this implementation as well.
Also, another poster, Bonita showed another algorithm that uses CAS,
kind of like Windows SRW. It has a lot of explicit looping, my algorithm
has no loops in ct_rwmutex. Fwiw, it seems like my benchmark code can be
used to test many read/write algorithms.

Humm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 02:26PM -0800

On 2/20/2019 4:50 AM, Paavo Helde wrote:
>> (0D8B00Ch)]

> What I see from here is that apparently you are running in 32-bit mode.
> Who would do this in year 2019? Recompile in 64-bit and measure again!

Just tested with a x64 build, and here are the results wrt version 0.1:

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex

___________________________________
cpu_threads_n = 4
threads_n = 32
writers = 16
readers = 16
test duration = 60 seconds
___________________________________

___________________________________
Raw Reads: 19658
Raw Writes: 1516
reads_per_tick = 327
writes_per_tick = 25
Ticks = 60.0547
___________________________________

Testing Version 0.1: std::shared_mutex

___________________________________
cpu_threads_n = 4
threads_n = 32
writers = 16
readers = 16
test duration = 60 seconds
___________________________________

___________________________________
Raw Reads: 8241
Raw Writes: 1114
reads_per_tick = 137
writes_per_tick = 18
Ticks = 60.1291
___________________________________

Nice! Humm... Iirc, unsigned long on Windows is only 32-bit? My
algorithm is beating 64-bit SRW on my end using only 32-bit atomics. ;^)

My algorithm is getting 190 more reads per-second on a linked shared
data-structure that has NODE_PRIME nodes, or 1,000,000 nodes...

So each read actually iterated NODE_PRIME nodes.

NICE!

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:04PM -0800

> writes_per_tick = 451
> Ticks = 60.0368
> ___________________________________
[...]

Thank you! I need to try to organize all of the various results into a
single database.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 02:50PM -0800

On 2/21/2019 12:41 PM, Chris M. Thomasson wrote:
> work against Windows SRW. It should beat it because of the way your are
> waiting on the semaphore and event. In other words, they are fire and
> forget, and not hooked up to any looping. Nice. :^)

Ahh, my waits on these "slow-paths" are fire and forget as well. In that
sense, they are "similar". This is a good property to have Bonita.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:26PM -0800

On 2/20/2019 10:20 PM, Bonita Montero wrote:
>> = lOldRWCounters)

> Your additional barrier is not necessary here and in the other places!
> And m_lRWCounters doesn't need to be an atomic variable.

Of course you are correct on this point. :^)

Fwiw, I just quickly mocked it up using seq_cst. Now, I can add
fine-grain membars. Fwiw, I can basically already see what is needed and
where in your algorithm. I need to add them to your Interlocked*'s as
well. The next step will be your algorithm using the weakest memory
model that I can possibly get, and still still pass Relacy testing. Then
it will be ready for testing in my read/write benchmark. Nice. :^)

Does this sound okay to you Bonita?

> And you disbled recursive locking. I think if it is possible to
> do shared locking recursively implicitly it should be able to to
> exclusive locking explicitly.

I disabled recursion simply because my test does not need any, and it
would just add a little extra overhead to your algorithm. So, why have
it in there? Do you want me to model your recursion with Relacy? I
personally don't see a real need right now, but will do it. Well,
perhaps I am "biased" against recursive locking in the first place. Some
think it is a bit "evil"? ;^)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

new benchmark for my read/write algorithm... - 17 Updates
an experimental read/write mutex... - 1 Update
neos - 7 Updates

new benchmark for my read/write algorithm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:51PM -0800

On 2/25/2019 10:25 PM, Bonita Montero wrote:
>> CAS will _always_ fail if the comparand is different than
>> the destination at the point of the atomic operation.

> Thank you for this insigt! I never would have known this otherwise!

Sorry for that. Well, I really do like your algorihtm. It is a very nice
piece of work. Thank you for sharing it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:52PM -0800

On 2/25/2019 10:23 PM, Bonita Montero wrote:
>> Writer A spins on CAS because it failed!
>> Reader B releases read access.

> Impossible that reader B has read-access such a little time.

Tell that to a general purpose read/writer lock... ;^)

> be done.
> If there was really meaningful work in reader B the locking
> -interval was at least that long that writer A gets to sleep.

Regardless, I like your algorihtm. Is has an elegance about it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:57PM -0800

On 2/26/2019 8:52 PM, Chris M. Thomasson wrote:
>> If there was really meaningful work in reader B the locking
>> -interval was at least that long that writer A gets to sleep.

> Regardless, I like your algorihtm. Is has an elegance about it.

Just wondering if you are interested in a version of your algorihtm that
is 100% pure c++11, and using the weakest memory order? The semaphore
and event would be emulated in c++11 using mutex/condvar. With that in
mind, it is going to "reduce" performance wrt raw Windows primitives.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 26 08:58PM -0800

On 2/26/2019 8:51 PM, Chris M. Thomasson wrote:

>> Thank you for this insigt! I never would have known this otherwise!

> Sorry for that. Well, I really do like your algorihtm. It is a very nice
> piece of work. Thank you for sharing it.

The performance is nice. Not bad at all.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 08:58AM +0100

>> Regardless, I like your algorihtm. Is has an elegance about it.

> Just wondering if you are interested in a version of your algorihtm
> that is 100% pure c++11, ...

How should that be possible? C++-whatever has no kernel-synchroniza-
tion-facilities like events or semaphores.

> The semaphore and event would be emulated in c++11 using mutex/condvar.

That would be inefficient.

It would be better to rewrite the code for POSIX with POSIX-sempahores.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 11:25AM +0100

>> The semaphore and event would be emulated in c++11 using mutex/condvar.

> That would be inefficient.

I reconsidered that: it wouldn't be that inefficient as the mutex and
the condvar would be used only in contention cases and when that happens
there would be kernel-calls which are a lot slower the additional over-
head over raw semaphores (and eventually events).
But I think it's more elegant because more minimalistic when you realize
that with the system facilities and have more than one implementation of
the rw-mutex.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 12:59PM +0200

On 27.02.2019 9:58, Bonita Montero wrote:
>> that is 100% pure c++11, ...

> How should that be possible? C++-whatever has no kernel-synchroniza-
> tion-facilities like events or semaphores.

By this argument a C++ program would also not be able to allocate memory
or open files as these are also kernel facilities.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 01:06PM +0100

>> tion-facilities like events or semaphores.

> By this argument a C++ program would also not be able to allocate memory
> or open files as these are also kernel facilities.

You are unable to read. I only said that C++ ha "no kernel
-synchronization-facilities like events or semaphores".

Or please tell me what the classes for semaphores and events are.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 04:29PM +0200

On 27.02.2019 14:06, Bonita Montero wrote:

> You are unable to read. I only said that C++ ha "no kernel
> -synchronization-facilities like events or semaphores".

> Or please tell me what the classes for semaphores and events are.

C++11 contains everything needed for efficient multithreading
programming. I trust the committee in that if semaphores and events are
not included then they are not needed for that purpose.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 27 03:35PM +0100

> C++11 contains everything needed for efficient multithreading
> programming.

I didn't tell the opposiste or: what C++11 supplies is sufficient in
most cases. But if you try to implement a rw-mutex you can't do this
that effient than with native events and semaphores.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 27 05:19PM +0200

On 27.02.2019 16:35, Bonita Montero wrote:

> I didn't tell the opposiste or: what C++11 supplies is sufficient in
> most cases. But if you try to implement a rw-mutex you can't do this
> that effient than with native events and semaphores.

OK, fair enough. I guess that's why there is std::shared_mutex added in
C++17.

scott@slp53.sl.home (Scott Lurndal): Feb 27 03:25PM

>C++11 contains everything needed for efficient multithreading
>programming. I trust the committee in that if semaphores and events are
>not included then they are not needed for that purpose.

Over the last three decades, I've found the mutexes and condition
variables have been sufficient for everything from mainframe
operating systems (which had, in 1985, mutex and condition variables
built right into the instruction set) to modern pthread apps.

http://vseries.lurndal.org/doku.php?id=instructions:lok

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:02PM -0800

>> be performed in this test at 60 seconds. The number of threads is
>> variable and is determined by std::hardware_concurrency * THREADS, set
>> at 8 in the test. So on my system the setup is:
[snip code]

> * GCC -Ofast -march=native -lstdc++ *
> *************************************

> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
[...]

Thank you! Sorry to ask, but can you try running version 0.1?

Try reloading the page at: https://pastebin.com/raw/1QtPCGhV

The newest version 0.1 should have:

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex

Testing Version 0.1: std::shared_mutex

Fwiw, here is the code:
____________________________________
/* Crude Read/Write Mutex Benchmark

This tests my algorithm ct_rwmutex vs. std::shared_mutex.

It shows how many reads and writes can be performed within
a fixed amount of time.

by: Chris M. Thomasson

version 0.1
__________________________________________*/

#include <thread>
#include <atomic>
#include <shared_mutex>
#include <condition_variable>
#include <iostream>
#include <functional>
#include <chrono>
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <climits>
#include <cstdint>
#include <vector>

// undefine to test std::shared_mutex
#define CT_TEST_FAST_MUTEX 1

#define THREADS 8 // multiplied by std::hardware_concurrency
#define NODE_PRIME 1000000 // number of nodes
#define CRUDE_CACHE_PAD 256 // crude cache pad
#define TEST_DURATION_SEC 60 // number of seconds for the test

// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
unsigned long m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_slow_semaphore(unsigned long state) : m_state(state) {}

void inc()
{
m_mutex.lock();
++m_state;
m_mutex.unlock();
m_cond.notify_one();
}

void add(unsigned long addend)
{
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state += addend;
}

m_cond.notify_all();
}

void dec()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == 0) m_cond.wait(lock);
--m_state;
}
};

// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_auto_reset_event() : m_state(false) {}

void signal()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state = true;
m_cond.notify_one();
}

void wait()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == false) m_cond.wait(lock);
m_state = false; // auto-reset
}
};

// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic<unsigned int> m_state;
ct_auto_reset_event m_waitset;

ct_fast_mutex() : m_state(0) {}

void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}

void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};

// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions

// The number of readers _must_ never exceed LONG_MAX!

#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD];

// shared state
std::atomic<long> m_count;
unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD];

std::atomic<long> m_rdwake;
unsigned char m_crude_cache_pad_2[CRUDE_CACHE_PAD];

ct_slow_semaphore m_rdwset;
unsigned char m_crude_cache_pad_3[CRUDE_CACHE_PAD];

ct_slow_semaphore m_wrwset;
unsigned char m_crude_cache_pad_4[CRUDE_CACHE_PAD];

ct_fast_mutex m_wrlock;
unsigned char m_crude_cache_pad_5[CRUDE_CACHE_PAD];

ct_rwmutex() :
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}

// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
// We need to wait for a writer.
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
// There is a writer
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
// We need to wake the writer up
m_wrwset.inc();
}
}
}

// WRITE, more hefty
void lock()
{
// Acquire exclusive access
m_wrlock.lock();

// we are the only thread in here now.

// Gain write access wrt m_count
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);

// count can never be negative.
if (count < RWMUTEX_COUNT_MAX)
{
// We detected readers.
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
// Okay, we need to wait for all of the readers
// The number of readers is actually
// RWMUTEX_COUNT_MAX - count
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
// Release write access wrt m_count
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);

if (count < 0)
{
// We need to wake -count readers.
m_rdwset.add(-count);
}

// Release exclusive access
m_wrlock.unlock();
}
};

struct ct_simple_stack
{
struct node
{
node* volatile m_next; // to suppress optimization
unsigned int m_tid;

node(unsigned int tid) : m_tid(tid) {}
};

node* m_head;

ct_simple_stack() : m_head(nullptr) {}

~ct_simple_stack()
{
ct_simple_stack::node* n = flush();

while (n)
{
ct_simple_stack::node* next = n->m_next;
delete n;
n = next;
}
}

void push(node* n)
{
n->m_next = m_head;
m_head = n;
}

node* pop()
{
node* n = m_head;

if (n)
{
m_head = n->m_next;
}

return n;
}

node* flush()
{
node* n = m_head;
m_head = nullptr;
return n;
}

};

struct ct_shared
{
std::atomic<bool> m_run;

// protected by m_std_rwmutex
std::uint64_t m_reads;
std::uint64_t m_writes;

ct_simple_stack m_stack;
ct_simple_stack m_stack_dtor;

#if defined (CT_TEST_FAST_MUTEX)
ct_rwmutex m_std_rwmutex;
#else
std::shared_mutex m_std_rwmutex;

soft and program

Thursday, February 28, 2019

Digest for comp.lang.c++@googlegroups.com - 8 updates in 3 topics

Wednesday, February 27, 2019

Digest for comp.lang.c++@googlegroups.com - 11 updates in 2 topics

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

Blog Archive

About Me