Sunday, February 14, 2021

Digest for comp.lang.c++@googlegroups.com - 12 updates in 5 topics

Brian Wood <woodbrian77@gmail.com>: Feb 14 02:56PM -0800

On Friday, February 5, 2021 at 4:43:26 AM UTC-6, David Brown wrote:
> programmers - /that/ gives bragging rights. Finding possible
> improvements in ordinary code written by one ordinary programmer and
> checked by no one is merely part of the daily grind for a coder.
 
Perhaps we can at least agree that services are the most important
form of software today and that C++ is the most important language
for services.
 
> and much more likely to succeed. Or are you merely offering Biblical
> quotations and the promise of Brownie points in the next life? That's a
> harder sell for most potential code reviewers.
 
A lot of code review is done for free:
https://www.reddit.com/r/codereview
 
"Furthermore, the Israelites acted on Moses' word and asked the Egyptians
for articles of silver and gold, and for clothing. And the L-RD gave the
people such favor in the sight of the Egyptians that they granted their
request. In this way they plundered the Egyptians." Exodus 12:36,37
 
The Israelites didn't pay for the items of gold and silver. G-d was
saving them from their oppressors. Unfortunately, some of the
regulars here are oppressors.
 
 
Brian
Ebenezer Enterprises
https://github.com/Ebenezer-group/onwards
Mr Flibble <flibble@i42.REMOVETHISBIT.co.uk>: Feb 14 10:58PM

On 14/02/2021 22:56, Brian Wood wrote:
 
> Brian
> Ebenezer Enterprises
> https://github.com/Ebenezer-group/onwards
 
You might as well be a bot with that reply, fucktard.
 
/Flibble
 
--
😎
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 13 10:39PM -0800

On 2/13/2021 4:54 AM, Jorgen Grahn wrote:
> I.e. std::cout should be in line buffered mode.
 
> At least on Unix (where the whole output/error stream thing comes
> from).
 
I don't know if that's guaranteed.
Manfred <noname@add.invalid>: Feb 14 06:01PM +0100

On 2/14/2021 7:39 AM, Chris M. Thomasson wrote:
 
>> At least on Unix (where the whole output/error stream thing comes
>> from).
 
> I don't know if that's guaranteed.
 
As Öö Tiib pointed out, the difference between std::endl and '\n' in C++
is exactly that the former executes basic_ostream::flush(), the latter
doesn't.
 
From the example in https://en.cppreference.com/w/cpp/io/manip/endl :
 
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 14 08:12PM

On Sun, 2021-02-14, Chris M. Thomasson wrote:
> On 2/13/2021 4:54 AM, Jorgen Grahn wrote:
>> On Tue, 2021-02-09, Chris M. Thomasson wrote:
...
 
>> At least on Unix (where the whole output/error stream thing comes
>> from).
 
> I don't know if that's guaranteed.
 
I think it is, but it would be nice to have it confirmed. I think I
can quote W R Stevens, but he only writes about Unix.
 
If the people with the problems e.g. ran the code in an IDE, that
would explain it. Or piped the output through less(1).
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
scott@slp53.sl.home (Scott Lurndal): Feb 14 10:22PM


>> I don't know if that's guaranteed.
 
>I think it is, but it would be nice to have it confirmed. I think I
>can quote W R Stevens, but he only writes about Unix
 
POSIX requires stdout and stderr to be line buffered if and only
if the underlying file descriptor refers to a terminal, serial port,
console or pseudoterminal device (isatty() == true).
 
Otherwise they'll be fully buffered.
 
The application controls the buffering using setvbuf(3) or setbuf(3),
and it is often useful for an application to explicity set the buffering
mode to line-buffered so that when redirected to a file, the output
is available to other tools like the tail(1) command line by line.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 14 02:59PM +0100

Am 05.02.21 um 21:07 schrieb Chris M. Thomasson:
>> fast as possible.
 
> I will read it carefully. Noticed something like this but I am not sure
> yet. Actually, porting your code over to C++17 would help me out here.
 
I did a rough port to C++17 using atomic<uintptr_t>. Unfortunately time
have changed. The code is no longer reliable in general. :-(
 
It works under OS/2 (the original, x86). It works in a Linux VM (x64).
It does /not/ work on the host of the VM (same hardware, AM4). The
stolen bits counter overflows soon. It works on my local PC (AM3+). It
does not work on a Ryzen 16 core. It does not work on a Xeon neither on
ARMv7 quad core.
 
The maximum stolen bits count scales nonlinear with the number of CPU
cores. It is less on Xeon and ARM than on AMD.
It is very interesting that the maximum count is at least 30% less if
the code is executed in a VM on the same hardware. The scheduler seem to
have some influence. In fact the code runs twice as fast inside the VM!
 
I tested with 300 threads hammering on the same atomic instance in an
infinite loop. The duration of the test has almost no effect. The number
of threads also has no significant effect as long there are enough to
reach the maximum.
 
>> counter never reached the value 3. 2 was sufficient in real life.
 
> This sounds a bit odd to me, but then again, I need to understand your
> code better.
 
It /is/ odd. My tests were quite long ago. And my VM usually used for
development was one of them that worked. (max count 4 of 7 allowed on x64)
There seems no alternative to DWCAS for the atomic version. :-/
 
An intrusive reference counted smart pointer is still useful. But it is
no longer wait free if the platform does not support DWCAS.
 
 
 
> Where R is for the reference count, and C is for the collector index.
 
> Millions of threads can increment the outer counter at the same time. No
> problem.
 
The collector index?
 
 
Marcel
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 14 12:46PM -0800

On 2/14/2021 5:59 AM, Marcel Mueller wrote:
> There seems no alternative to DWCAS for the atomic version. :-/
 
> An intrusive reference counted smart pointer is still useful. But it is
> no longer wait free if the platform does not support DWCAS.
 
Yeah. Looking at your code, I was just worrying about a shi% load of
threads all taking a reference to the strong pointer at the same time.
That would overflow it rather quickly. Now, I have some old proxy
collector code that steals enough bits to hold an 8-bit counter, so
that's 256 threads. However, if more than 256 threads take the strong
count at the same time, the it will overflow. I need to find it on
archive.org. Luckily, it just might have it.
 
https://web.archive.org/web/2017*/http://webpages.charter.net/appcore
 
 
 
>> Millions of threads can increment the outer counter at the same time.
>> No problem.
 
> The collector index?
 
The collector index is embedded within the counter so I can increment
the reference count and grab the collector index in a single atomic RMW,
fetch_add in this case. Then I decode the it from the return value and
use it as an index into collector objects, there are two collectors in
this case. Take a careful look at the following code in my proxy
collector: https://pastebin.com/raw/CYZ78gVj
____________________________
collector& acquire()
{
// increment the master count _and_ obtain current collector.
std::uint32_t current =
m_current.fetch_add(ct_ref_inc, std::memory_order_acquire);
 
// decode the collector index.
return m_collectors[current & ct_proxy_mask];
}
____________________________
 
It returns a reference to the indexed collector.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 14 02:05PM -0800

On 2/7/2021 11:35 PM, Öö Tiib wrote:
>> Do weak_ptrs adjust the reference count at all? Please try to excuse my
>> ignorance here. ;^o
 
> The weak references are simply counted too (as weak references).
 
This quote is interesting to me:
 
https://en.cppreference.com/w/cpp/memory/weak_ptr
 
___________________
std::weak_ptr models temporary ownership: when an object needs to be
accessed only if it exists, and it may be deleted at any time by someone
else, std::weak_ptr is used to track the object, and it is converted to
std::shared_ptr to assume temporary ownership. If the original
std::shared_ptr is destroyed at this time, the object's lifetime is
extended until the temporary std::shared_ptr is destroyed as well.
___________________
 
Does shared_ptr have a "separate" reference count to weak_ptr's?
David Brown <david.brown@hesbynett.no>: Feb 14 11:46AM +0100

On 13/02/2021 20:42, Marcel Mueller wrote:
> holds the current value, one the next value. When you synchronize only
> writers they can safely swap the storage pointer. No need to synchronize
> readers. Often that's enough.
 
No, that won't work. It was the first thing that came to my mind too,
but it is not sufficient.
 
Let's model the object we want to access atomically as:
 
typedef struct T { uint32_t lo; uint32_t hi; } T;
 
You store two copies:
 
volatile T d[2];
 
and a pointer:
 
volatile T* volatile p = &d[0];
 
Updating will be something like:
 
void update(T x) {
get_writer_lock();
volatile T * q = &d[1 - (p - d)];
*q = x;
p = q;
release_lock();
}
 
You are using extra synchronisation for writing, which is not ideal, but
it is not uncommon to have only a single writer.
 
You are then suggesting that this is safe for reading:
 
T read(void) {
return *p;
}
 
Let's break this down. The implementation will be something like:
 
T read(void) {
T x;
volatile T* q = p;
// point 1
x.lo = q->lo;
// point 2
x.hi = q->hi;
return x;
}
 
If the reader thread is pre-empted (or interrupted) at point 1 by the
writer, that's okay - the reader doesn't see the new data, but it gets
the consistent old one, as the writer has modified the new copy. The
same happens if it is pre-empted at point 2. Since the pointer is read
atomically, the data is consistent.
 
Except... what if the writer does two updates? Or two writer threads
run while the reader thread is paused? Then a writer is stomping all
over the data that the reader thread has partially read.
 
 
So it is not nearly as simple as you imply. It is a step towards a
solution, but requires work. A "store/load exclusive" loop can make
reading safe, but you still need a synchronisation mechanism for the
writers that requires locking (and thus fails to be a generic lock-free
mechanism).
 
 
In the common situation of a single writer and a single reader, you can
use this kind of arrangement. But you use three copies, not two, and
you have tracking of which buffer is used by the reader and writer.
Even then it's a bit fiddly, and the most efficient solutions need
knowledge of how the threads can interact and interrupt each other.
 
 
> increment the master pointer atomically before using the storage it
> points to. Now writers know that they should not discard or modify this
> storage.
 
Your mention of "high update rate" is perhaps why I am not happy with
your solutions. You are talking about things that will likely work well
in most cases - I am trying to look at things that are guaranteed to
work in /every/ case.
 
When you have specific cases, you can pick solutions that are efficient
and work given the assumptions that are valid in that case. Maybe you
have only one writer, maybe you know about the synchronisation and which
thread can pre-empt the other - and so on.
 
But a library that comes with a toolchain that implements atomic read
and write of any sized data needs to work in /all/ cases. "Usually
sufficient" is not good enough, nor is "assuming a low update rate" or
any other assumption. It needs to work /every/ time, with no additional
assumptions (except perhaps assumptions that can be checked at compile
or link time, such as specific OS support).
 
>>> with different priorities it is not suitable for this case.
 
>> All RTOS systems are sensitive to priority inversion,
 
> Sure, but lock free algorithms are not. ;-)
 
Agreed.
 
But atomics are not exclusively about being lock-free. Lock-free
atomics let you build lock-free algorithms that can scale well across
multiple cores. However, atomics are fundamentally about making
particular operations appear unbreakable - and that applies even if the
atomic operation in question is not lock-free. Beyond a certain size or
complexity, operations invariably require some kind of lock (or special
hardware support) to be atomic - and then you have locks, and you have
sensitivity to priority inversion.
 
 
> Is it possible to raise the priority of all mutex users for the time of
> the critical section? This will still abuse a time slice if the spin
> lock does not explicitly call sleep. But at least it will not deadlock.
 
Doing something like that would negate all the benefits of trying to use
atomics rather than simply using mutexes in the first place.
 
It is far better (again, in the single core case) to simply disable
interrupts for the short code section needed to do the atomic access.
This has the effect of raising the priority of the current thread to the
maximum - bit it does so for the shortest possible time.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 14 12:55PM -0800

On 2/14/2021 2:46 AM, David Brown wrote:
> but it is not sufficient.
 
> Let's model the object we want to access atomically as:
 
> typedef struct T { uint32_t lo; uint32_t hi; } T;
 
For some reason this reminds me of Joe Seighs 63-bit atomic counter:
 
https://groups.google.com/g/comp.lang.asm.x86/c/FScbTaQEYLc/m/X0gAskwQW44J
 
;^)
 
 
[...]
Brian Wood <woodbrian77@gmail.com>: Feb 13 06:42PM -0800

On Friday, February 12, 2021 at 2:25:27 AM UTC-6, David Brown wrote:
> > size_t stream_counter;
> > };
> Both are horrible and unreadable.
 
What I want to be be sure of is that the second form is a benign
refactoring.
 
>This is C++ - try making a template
> or inherited structures, perhaps with a /single/ conditional compilation
> part at the end to give an alias to the struct you want.
 
This is a C library that I'm using.
 
 
Brian
Ebenezer Enterprises - Enjoying programming again.
https://webEbenezer.net
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: