soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

new benchmark for my read/write algorithm... - 12 Updates
is there any UB in this integer math... - 6 Updates

new benchmark for my read/write algorithm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 03:28PM -0800

>> be performed in this test at 60 seconds. The number of threads is
>> variable and is determined by std::hardware_concurrency * THREADS, set
>> at 8 in the test. So on my system the setup is:
[...]

Wrt Version 0.1, remember my first ct_rwmutex that did not use
ct_fast_mutex, but instead ct_fast_semaphore with an initial count of 1?
Now, it would be beneficial to test against this implementation as well.
Also, another poster, Bonita showed another algorithm that uses CAS,
kind of like Windows SRW. It has a lot of explicit looping, my algorithm
has no loops in ct_rwmutex. Fwiw, it seems like my benchmark code can be
used to test many read/write algorithms.

Humm...

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 04:15PM -0800

On 2/20/2019 10:25 AM, Bonita Montero wrote:
> This is similar shared_mutex, but just for Win32 only.
> My shared mutex gives shared readers always priority over
> writers (don't know what std::shared_mutex does instead).
[...]

Need to examine it and implement it for myself, however, your read
priority is _radically_ different than mine. ct_rwmutex actually fairly
alternates between batches of reads, and a write, like a bakery
algorithm, and does _not_ use CAS. Writes can never starve reads, and
reads can never starve writes.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 10:03PM -0800

On 2/20/2019 10:25 AM, Bonita Montero wrote:

> #include <Windows.h>
> #include <intrin.h>
> #include <cassert>

[...]

I implemented your algorithm, minus the recursion part... , in Relacy
Race Detector. I am using all seq_cst memory ordering here because you
do not provide any memory barrier documentation. It is working fine with
seq_cst. Still need to test the recursion because I have seen many
broken recursion schemes. I forgot if volatile on MSVC will always use
that memory order, with some obscure compiler flag, anyway, here is the
code:
_________________________________
// Bonita Montero Read/Write Mutex
// Relacy Test By Chris M. Thomasson
//_______________________________________________

//#define RL_DEBUGBREAK_ON_ASSERT
//#define RL_MSVC_OUTPUT
//#define RL_FORCE_SEQ_CST
//#define RL_GC

#include <relacy/relacy_std.hpp>
#include <iostream>

#define MB_SEQ_CST std::memory_order_seq_cst

#define READERS 3
#define WRITERS 3
#define THREADS (READERS + WRITERS)

LONG ct_InterlockedCompareExchange(
std::atomic<long>* Destination,
long Exchange,
long Comparand
) {
LONG cmp = Comparand;

if (Destination->compare_exchange_strong(cmp, Exchange, MB_SEQ_CST))
{
RL_ASSERT(cmp == Comparand);
}

return cmp;
}

class SharedMutex
{
public:
SharedMutex();
~SharedMutex();
void Lock();
void Unlock();
void LockShared();
void UnlockShared();

private:
std::atomic<LONG> m_lRWCounters; // lower 16 bits: writers
including the one that owns the lock
// bits 16-30: count of threads
having read-access or waiting for reac-access
// sign-bit: readers active

HANDLE m_hEvtReleaseWriter;
HANDLE m_hSemReleaseReaders;
};

SharedMutex::SharedMutex()
{
m_lRWCounters.store(0, MB_SEQ_CST);
m_hEvtReleaseWriter = CreateEvent(NULL, FALSE, FALSE, NULL);
m_hSemReleaseReaders = CreateSemaphore(NULL, 0, 0x7FFFFFFF, NULL);
}

SharedMutex::~SharedMutex()
{
CloseHandle(m_hEvtReleaseWriter);
CloseHandle(m_hSemReleaseReaders);
}

void SharedMutex::Lock()
{
LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters =
lOldRWCounters)
{
RL_ASSERT((lRWCounters & 0x0FFFF) != 0x0FFFF);

if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters + 1,
lRWCounters)) == lRWCounters)
if (lOldRWCounters == 0)
{

return;
}
else
break;

}

WaitForSingleObject(m_hEvtReleaseWriter, INFINITE);
}

void SharedMutex::Unlock()
{
LONG lRWCounters,
lOldRWCounters;

// sharers have priority over writers

for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters =
lOldRWCounters)
{
RL_ASSERT(lRWCounters > 0);

if ((lRWCounters & 0x7FFF0000))
{
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 1) |
(LONG)0x80000000u, lRWCounters)) == lRWCounters)
{
ReleaseSemaphore(m_hSemReleaseReaders, (lRWCounters &
0x7FFF0000) >> 16, NULL);
return;
}
}
else
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 1,
lRWCounters)) == lRWCounters)
{
if ((lOldRWCounters & 0x0FFFF) > 1)
SetEvent(m_hEvtReleaseWriter);
return;
}
}
}

void SharedMutex::LockShared()
{
LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters =
lOldRWCounters)
{
RL_ASSERT((lRWCounters & 0x7FFF0000) != 0x7FFF0000);

// even shared access can be recursively, but in this case the
sharer-count will increment

if (lRWCounters <= 0)
{
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000) |
(LONG)0x80000000u, lRWCounters)) == lRWCounters)
return;
}
else
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters + 0x10000),
lRWCounters)) == lRWCounters)
{
WaitForSingleObject(m_hSemReleaseReaders, INFINITE);
return;
}
}
}

void SharedMutex::UnlockShared()
{
LONG lRWCounters,
lOldRWCounters;

for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters =
lOldRWCounters)
{
RL_ASSERT(lRWCounters < 0 && (lRWCounters & 0x7FFF0000) != 0);

if ((lRWCounters & 0x7FFF0000) == 0x10000)
{
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, (lRWCounters - 0x10000) &
(LONG)0x7FFFFFFF, lRWCounters)) == lRWCounters)
{
if ((lRWCounters & 0x0FFFF) != 0)
SetEvent(m_hEvtReleaseWriter);
return;
}
}
else
if ((lOldRWCounters =
ct_InterlockedCompareExchange(&m_lRWCounters, lRWCounters - 0x10000,
lRWCounters)) == lRWCounters)
return;
}
}

// Relacy Stack Test...
struct ct_fast_mutex_test
: rl::test_suite<ct_fast_mutex_test, THREADS>
{
VAR_T(long) g_shared_state;
SharedMutex g_shared_mutex;

void before()
{
VAR(g_shared_state) = 0;
}

void after()
{

}

// reader
void reader(unsigned int tidx)
{
g_shared_mutex.LockShared();
long shared_state = VAR(g_shared_state);
g_shared_mutex.UnlockShared();

RL_ASSERT(shared_state > -1);
}

// writer
void writer(unsigned int tidx)
{
g_shared_mutex.Lock();
++VAR(g_shared_state);
g_shared_mutex.Unlock();
}

void thread(unsigned int tidx)
{
if (tidx < WRITERS)
{
writer(tidx);
}

else
{
reader(tidx);
}
}
};

// Test away... Or fly? Humm...
int main()
{
{
rl::test_params p;

p.iteration_count = 10000000;
//p.execution_depth_limit = 33333;
//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;

rl::simulate<ct_fast_mutex_test>(p);
}

return 0;
}
_________________________________

Okay. Getting closer. :^)

Christian Gollwitzer <auriocus@gmx.de>: Feb 21 07:09AM +0100

Am 20.02.19 um 21:01 schrieb Paavo Helde:
> comment was wrong, volatile is indeed needed for this code to work, but
> the code is restricted to not just to Windows, but also requires MSVC
> compiler and its /volatile:ms regime.

Wouldn't that mean the code can become standard C++ by replacing the
volatile stuff with std::atomic<> ?

Christian

Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 07:20AM +0100

> for (lRWCounters = m_lRWCounters.load(MB_SEQ_CST); ; lRWCounters =
> lOldRWCounters)

Your additional barrier is not necessary here and in the other places!
And m_lRWCounters doesn't need to be an atomic variable.

And you disbled recursive locking. I think if it is possible to
do shared locking recursively implicitly it should be able to to
exclusive locking explicitly.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 07:36AM +0100

> Wouldn't that mean the code can become standard C++ by
> replacing the volatile stuff with std::atomic<> ?

volatile is necessary in one case: with m_dwOwningThreadId.
volatile just should keep away the compiler from caching this value
in a register. But the value in memory can asynchronously change at
any time when the thread trying to get a write-lock isn't the current
write-owner of the mutex. And on all Win32-platforms loads and stores
to aligned DWORDs are atomic by nature.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:26PM -0800

On 2/20/2019 10:20 PM, Bonita Montero wrote:
>> = lOldRWCounters)

> Your additional barrier is not necessary here and in the other places!
> And m_lRWCounters doesn't need to be an atomic variable.

Of course you are correct on this point. :^)

Fwiw, I just quickly mocked it up using seq_cst. Now, I can add
fine-grain membars. Fwiw, I can basically already see what is needed and
where in your algorithm. I need to add them to your Interlocked*'s as
well. The next step will be your algorithm using the weakest memory
model that I can possibly get, and still still pass Relacy testing. Then
it will be ready for testing in my read/write benchmark. Nice. :^)

Does this sound okay to you Bonita?

> And you disbled recursive locking. I think if it is possible to
> do shared locking recursively implicitly it should be able to to
> exclusive locking explicitly.

I disabled recursion simply because my test does not need any, and it
would just add a little extra overhead to your algorithm. So, why have
it in there? Do you want me to model your recursion with Relacy? I
personally don't see a real need right now, but will do it. Well,
perhaps I am "biased" against recursive locking in the first place. Some
think it is a bit "evil"? ;^)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:41PM -0800

On 2/20/2019 10:20 PM, Bonita Montero wrote:

> And you disbled recursive locking. I think if it is possible to
> do shared locking recursively implicitly it should be able to to
> exclusive locking explicitly.

After I get the weakest memory model, it is going to be fun to test your
work against Windows SRW. It should beat it because of the way your are
waiting on the semaphore and event. In other words, they are fire and
forget, and not hooked up to any looping. Nice. :^)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:01PM -0800

On 2/20/2019 10:03 PM, Chris M. Thomasson wrote:
> }

> return cmp;
> }
[...]

Notice how I am using compare_exchange_strong? I think, this thought is
before testing, that your algorihtm should work fine with
compare_exchange_weak. The weak version does not impose some rather
hefty requirements on the CAS. It is more "friendly" on systems that use
LL/SC to emulate CAS.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:04PM -0800

> writes_per_tick = 451
> Ticks = 60.0368
> ___________________________________
[...]

Thank you! I need to try to organize all of the various results into a
single database.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:28PM -0800

On 2/20/2019 2:33 PM, Melzzzzz wrote:
>>> Who would do this in year 2019? Recompile in 64-bit and measure again!

>> Just tested with a x64 build, and here are the results wrt version 0.1:

>> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
[...]
>> Nice! Humm... Iirc, unsigned long on Windows is only 32-bit?

> Yep. And long double is 64 bits if even hardware provides 80 bits...

Fwiw, on Intel, I want fetch_add to use LOCK XADD, this is very
important. Let's check out a little test on GodBolt:

https://godbolt.org/z/yruhAL // using long

https://godbolt.org/z/NKYDeI // using short

Wrt my ct_rwmutex, it better use LOCK XADD no matter what damn it.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 02:50PM -0800

On 2/21/2019 12:41 PM, Chris M. Thomasson wrote:
> work against Windows SRW. It should beat it because of the way your are
> waiting on the semaphore and event. In other words, they are fire and
> forget, and not hooked up to any looping. Nice. :^)

Ahh, my waits on these "slow-paths" are fire and forget as well. In that
sense, they are "similar". This is a good property to have Bonita.

is there any UB in this integer math...

"Öö Tiib" <ootiib@hot.ee>: Feb 21 01:24AM -0800

On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
> >> flexible alternative and which is actively maintained.

> > Sanitizers sound like debugging options.

> But that's what you want, isn't it?

I consider it as free sanity check in all signed integer calculations.
When correctness of those calculations is of importance then I would
leave -ftrapv on forever.

Lets say in summer we have product with 0% crashes per usage. Then
someone of our team (me) maintains some integer arithmetic incorrectly
and since the defect manifests rarely it reaches users in fall.

a) With -ftrapv our automated crash reporting shows 0.2% crashes per
usage. The patch is up within a week.

b) Without it 0.2% runs result with incorrect answers. Those are
harder to notice (and in complex whole system the reasons can be
also harder to find/blame). The defect will sit in product until spring.

When the results of calculations matter to anything then I prefer a).

Paavo Helde <myfirstname@osa.pri.ee>: Feb 21 11:56AM +0200

On 21.02.2019 11:24, Öö Tiib wrote:

> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> usage. The patch is up within a week.

> b) Without it 0.2% runs result with incorrect answers.

This does not follow. As pointed out by others, the intermediate
overflows can still produce correct results when balanced with opposite
overflows or cast to unsigned.

Based on my limited experience, over half of -ftrapv crashes would be
false alarms. Depends on the codebase, of course.

Now it would be up to the management to decide whether the problems
caused by false alarms on customer sites justify the earlier catching of
non-false alarms.

David Brown <david.brown@hesbynett.no>: Feb 21 01:26PM +0100

On 21/02/2019 10:24, Öö Tiib wrote:

>>> Sanitizers sound like debugging options.

>> But that's what you want, isn't it?

> I consider it as free sanity check in all signed integer calculations.

But it is not free - not /remotely/ free. Please, go to
<https://godbolt.org> and try some samples with optimisation on (-O1 or
-O2), with and without -"ftrapv" and
"-fsanitize=signed-integer-overflow". Let me give you a couple of examples:

int foo1(int a, int b) {
return 2 * a + b;
}

int foo2(int a, int b) {
return 2 * a + b - a;
}

(Yes, I know people don't write code like in foo2 directly - but such
things can easily occur due to macros, inlining, constant folding,
templates, etc.)

This is the normal code (for x86) :

foo1:
lea eax, [rsi+rdi*2]
ret
foo2:
lea eax, [rdi+rsi]
ret

This is the code with -ftrapv :

foo1:
push rbp
mov ebp, esi
mov esi, 2
call __mulvsi3
mov esi, ebp
mov edi, eax
call __addvsi3
pop rbp
ret
foo2:
push r12
mov r12d, esi
mov esi, 2
push rbp
mov ebp, edi
sub rsp, 8
call __mulvsi3
mov esi, r12d
mov edi, eax
call __addvsi3
mov esi, ebp
mov edi, eax
call __subvsi3
add rsp, 8
pop rbp
pop r12
ret

This is the code with -fsanitize=signed-integer-overflow :

foo1:
push rbp
push rbx
mov ebx, esi
sub rsp, 24
imul ebp, edi, 2
jo .L8
.L2:
mov eax, ebx
add eax, ebp
jo .L9
.L4:
add rsp, 24
pop rbx
pop rbp
ret
.L8:
movsx rsi, edi
mov edx, 2
mov edi, OFFSET FLAT:.Lubsan_data0
call __ubsan_handle_mul_overflow
jmp .L2
.L9:
movsx rdx, ebp
movsx rsi, ebx
mov edi, OFFSET FLAT:.Lubsan_data1
mov DWORD PTR [rsp+12], eax
call __ubsan_handle_add_overflow
mov eax, DWORD PTR [rsp+12]
jmp .L4
foo2:
push r13
push r12
push rbp
mov ebp, esi
push rbx
mov ebx, edi
sub rsp, 24
imul r13d, edi, 2
jo .L18
.L11:
mov r12d, ebp
add r12d, r13d
jo .L19
.L13:
mov eax, r12d
sub eax, ebx
jo .L20
.L15:
add rsp, 24
pop rbx
pop rbp
pop r12
pop r13
ret
.L18:
movsx rsi, edi
mov edx, 2
mov edi, OFFSET FLAT:.Lubsan_data2
call __ubsan_handle_mul_overflow
jmp .L11
.L20:
movsx rdx, ebx
movsx rsi, r12d
mov edi, OFFSET FLAT:.Lubsan_data4
mov DWORD PTR [rsp+12], eax
call __ubsan_handle_sub_overflow
mov eax, DWORD PTR [rsp+12]
jmp .L15
.L19:
movsx rdx, r13d
movsx rsi, ebp
mov edi, OFFSET FLAT:.Lubsan_data3
call __ubsan_handle_add_overflow
jmp .L13

In what world are these checks "free" ?

The "sanitize" version is significantly better than the "-ftrapv"
version in that it has short paths when no overflows occur. You could
ask why the "-ftrapv" code is not similar - the answer, I think, is that
"-ftrapv" simply has not received much care or attention from gcc
developers for a very long time.

> When correctness of those calculations is of importance then I would
> leave -ftrapv on forever.

In my world, a program that crashes with a message "overflow detected"
is /not/ correct. It is merely broken in a different way from one that
gets the calculation wrong. Whether this is better or worse depends on
the circumstances, but it is /definitely/ not a correct result.

> harder to notice (and in complex whole system the reasons can be
> also harder to find/blame). The defect will sit in product until spring.

> When the results of calculations matter to anything then I prefer a).

-ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a
useful option for testing and debugging. But it is an aid to finding
bugs in the program - it does /not/ improve correctness, it is of no
help in a deployed system (unless you could post-mortems), and it is of
very significant cost.

Use tools for their appropriate purpose. Take your car to the garage
when you have a problem with it - don't tow a caravan behind your car
with a couple of mechanics and a bootload of tools on the off-chance
that your car breaks down on the journey.

"Öö Tiib" <ootiib@hot.ee>: Feb 21 09:35AM -0800

On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote:
> On 21/02/2019 10:24, Öö Tiib wrote:

> In what world are these checks "free" ?

Free in sense that I do not have to write and to use classes that do
same thing and likely even less efficiently than those two options.

> is /not/ correct. It is merely broken in a different way from one that
> gets the calculation wrong. Whether this is better or worse depends on
> the circumstances, but it is /definitely/ not a correct result.

I have repeatedly expressed my lack of knowledge about any beings in
this universe who can write correct programs. Only "programmers" who
do not write incorrect programs are those who do not write programs.

> when you have a problem with it - don't tow a caravan behind your car
> with a couple of mechanics and a bootload of tools on the off-chance
> that your car breaks down on the journey.

So you would prefer b) or have some kind of c) that results with something
better in that situation or what you were trying to say?

"Öö Tiib" <ootiib@hot.ee>: Feb 21 09:50AM -0800

On Thursday, 21 February 2019 11:56:48 UTC+2, Paavo Helde wrote:

> Now it would be up to the management to decide whether the problems
> caused by false alarms on customer sites justify the earlier catching of
> non-false alarms.

Ok, fair enough. Lets say a) 0.2% crash or b) 0.05% of runs result with
incorrect answers. Is it easier to decide that way?

David Brown <david.brown@hesbynett.no>: Feb 21 10:58PM +0100

On 21/02/2019 18:35, Öö Tiib wrote:

>> In what world are these checks "free" ?

> Free in sense that I do not have to write and to use classes that do
> same thing and likely even less efficiently than those two options.

Classes could be more flexible and more efficient (certainly more
efficient than the -ftrapv code), and give you accurate control of when
you use overflow detection instead of an "all or nothing" choice. But I
agree that it takes effort to make and use them.

> I have repeatedly expressed my lack of knowledge about any beings in
> this universe who can write correct programs. Only "programmers" who
> do not write incorrect programs are those who do not write programs.

It is entirely possible to eliminate large classes of bugs - at least on
certain kinds of programming. For most of my work, I know that my code
has /no/ bugs with dynamic memory allocation or freeing. It has /no/
bugs on signed integer overflow. It never has divide by zeros, or
shifts of negative numbers. It does not have code that accidentally
uses variables without initialising them. I can't promise I have no
out-of-bounds array accesses, but they should be extraordinarily rare -
as will buffer overflows.

And even if you assume that it is plausible that you have signed integer
overflow, how do you think -ftrapv improves matters? Clearly it could
be useful in testing and debugging - that is what it is for. But in
released code, who would be happy with a crash and an overflow error
message?

I could understand this more if we were talking about a type of error
that is harder to avoid, or harder to catch in testing - things like
race conditions and deadlocks in multi-threaded code. But it is /easy/
to avoid signed integer overflow. It is /easy/ to write your code in a
way that it can't happen, or that can detect it appropriately and
respond sensibly, rather than crashing.

>> that your car breaks down on the journey.

> So you would prefer b) or have some kind of c) that results with something
> better in that situation or what you were trying to say?

Sorry - it looks like you have cut part of your answer here. I am not
sure what you meant to write.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, February 21, 2019

Digest for comp.lang.c++@googlegroups.com - 18 updates in 2 topics

No comments:

Blog Archive

About Me