soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

std::atomic<std::shared_ptr<T>>... - 9 Updates
Onwards and upwards - 7 Updates

std::atomic<std::shared_ptr<T>>...

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 04 08:22PM -0800

On 2/3/2021 12:00 AM, Marcel Mueller wrote:

> I am not 100% sure about the UB. std::atomic<uintptr_t> might solve the
> UB problem with stolen bits. I did not test this, because with a defined
> set of target platforms there was no UB.

std::atomic<uintptr_t> just might do it, and also allow me to give it a
go in Relacy. I should have some free time in a day or two. I need to
_really_ dig into your:

/// Strongly thread safe read
T* acquire() volatile const;

function instead of skimming. That's where a lot of the "magic" happens.
If I am reading this correctly, there is a place where you need strong
CAS. Wrt C++ and its weak and strong CAS variants:

InterlockedCxc in line 281:

if (InterlockedCxc((volatile unsigned*)&Data, old_outer, new_outer) !=
old_outer && new_outer)
// Someone else does the job already => undo.
InterlockedAdd(&((T*)new_outer)->access_counter(), outer_count);
// The global count cannot return to zero here, because we have an
active reference.

Afaict, this will not work with weak CAS. Need to dig into it because
its interesting. Fwiw, in my experimental proxy collector, I was very
happy that I eliminated the need for any CAS.

https://pastebin.com/raw/p1E9WN5i

>>> boost::intrusive_ptr prevented strong thread safety. A reference
>>> counter need to be exposed as atomic<some_integer> somehow for strong
>>> thread safety.

Basically, the inner count needs to be somehow transferred to the outer
count. Using a loopless wait-free method tends to perform better than a CAS.

> programming significantly. Less race conditions, no deadlocks, no
> priority inversion. Basically it is required for anything pointer like
> object to fit into a std:atomic<T>.

Well, its basically required for a thread/process to be able to take a
reference to something that it did not previously have a reference to.

> the more effectively deduplication compresses it.
> Without strong safety any read access needs to be protected by a lock.
> This is almost impossible without introducing a bottleneck.

Have you ever messed around with RCU? Pretty damn nice. A proxy
collector can provide a sort of "poor mans" rcu. A proxy collector can
be created from any strong-thread safe method. Ideally lockfree, even
better when its loopless, and waitfree.

> with an in memory DB anyway.)
> The efficiency was so high that we decided to keep the full change
> history of ten years. It just takes only little additional resources.

Excellent!

> :-)

> More likely they simply discard google groups in the future. There is no
> profit in this service.

Iirc, that the reason why they totally murdered Google+. Damn.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 04 09:09PM -0800

On 2/3/2021 12:00 AM, Marcel Mueller wrote:
>>> performs even better on 64 bit because you can safely use 3 bits. I
>>> am unsure if it can be implemented without introducing UB - although
>>> it is likely to work on any real hardware.
[...]

Humm... I did a another proxy collector, old and nowhere on the net now
since Comcast killed their webhosting, that used stolen bits for the
collectors. Was worried about overflowing them. Wrt, Line 273 in your code:

const unsigned old_outer = InterlockedXad((volatile unsigned*)&Data, 1) + 1;
const unsigned outer_count = old_outer & INT_PTR_COUNTER_MASK;
ASSERT((old_outer & INT_PTR_COUNTER_MASK) != 0); // overflow condition

The assert is here for that right? What am I missing?

The fun part about a proxy collector is that it is not a general purpose
smart pointer. Instead its more of a poor mans rcu where we can steal a
lot of bits by aligning a collector on a large boundary.

If the number of threads that take concurrent access to a strong
reference exceeds the stolen bits at any one time, then overflow can and
will occur. This condition will intrude on the actual pointer part,
corrupting things. Well, that's the way it was with my old code using
stolen bits as a reference count.

Please correct me if I am missing something from your acquire function.
Thanks.

Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 05 07:43AM +0100

Am 05.02.21 um 05:22 schrieb Chris M. Thomasson:

> function instead of skimming. That's where a lot of the "magic" happens.
> If I am reading this correctly, there is a place where you need strong
> CAS. Wrt C++ and its weak and strong CAS variants:

I never heard about a weak CAS primitive before. But yes the code relies
on strong CAS. This is essential to be loop free.

And you should know that the code was exclusively intended for the x86
platform. x86 has very little (or no?) ability of memory reordering. So
there are no further memory barriers required when doing simple atomic
reads and writes. std::atomic should be more sophisticated.

> Afaict, this will not work with weak CAS. Need to dig into it because
> its interesting. Fwiw, in my experimental proxy collector, I was very
> happy that I eliminated the need for any CAS.

CAS is not bad unless it is a CAS /loop/.
Of course, if read-modify-write memory cycles of the hardware can be
used this is even better.

> Basically, the inner count needs to be somehow transferred to the outer
> count. Using a loopless wait-free method tends to perform better than a
> CAS.

It is a long time ago since I wrote this code, but AFAIR it is loopless.
The number of atomic operations per access was limited to exactly 3.
Other approaches may need only 2, but this will not work with only 2
stolen bits.

[strong thread safety]
> Well, its basically required for a thread/process to be able to take a
> reference to something that it did not previously have a reference to.

And this is likely to happen almost every time when you deal with an
atomic instance of some referencing object.

> Have you ever messed around with RCU? Pretty damn nice.

I shortly heard the first time of RCU and only read a few articles.
AFAICS it just delays deletions a bit to keep memory access valid. This
sound a bit like Thread.Sleep or setTimeout solutions. From my
experience this kind of solutions mainly move the bug from the developer
to the customer.

Grace periods are always too short to prevent corruption under some
special cases with extreme load peaks, e.g. with swapping and I/O delays
involved or maybe some DDOS attack (or just children trying to connect
to their school :-)
Well, at some point we need to note that computers also just happen to
work from the quantum mechanics point of view. So probability /is/ a
valid approach. But probabilities of concurrency issues mainly scale
with the number of coincidences required to trigger them not with the
time taken for the step. A single thread may always block for an unknown
reason at any point. But it is rather unlikely that this happens to many
of them at the same instruction at the same time.

If you reduce RCU by just versioning then I have probably used it for
decades. Atomic pointer updates and immutable data structures were the
core of the in memory database application I mentioned. But I did not
use any grace period.
The PM123 application where the code link came from also uses this
pattern for all internal strings. They are immutable and deduplicated.

> collector can provide a sort of "poor mans" rcu. A proxy collector can
> be created from any strong-thread safe method. Ideally lockfree, even
> better when its loopless, and waitfree.

I think I did not really catch the intended use case of your proxy
collector for now.

Marcel

Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 05 08:25AM +0100

Am 05.02.21 um 06:09 schrieb Chris M. Thomasson:
> const unsigned outer_count = old_outer & INT_PTR_COUNTER_MASK;
> ASSERT((old_outer & INT_PTR_COUNTER_MASK) != 0); // overflow condition

> The assert is here for that right? What am I missing?

:-)

Since OS/2 is a 32 bit platform and I did not want to introduce further
alignment I have only two bits for the outer count. So only 3 threads
can safely acquire the pointer at the same time. The 4th one will cause
the counter to return to 0 - an overflow.

But it is not that bad. All I have to do is to clear the counter as fast
as possible. So the transfer of the counter value is done immediately
after the pointer is acquired. The main reference counter is incremented
by the snapshot value of the small counter first. And if resetting the
counter to zero fails another thread already did the job an the
increment of the main ref count is undone.

The assertion never triggered. I tested really hard with ~100 threads
each hammering the same instance in an infinite loop. In fact the
counter never reached the value 3. 2 was sufficient in real life.

This is the price for the stolen bits hack. The platform did not support
64 bit CAS because some (supported) CPUs did not implement the hardware
instruction cmpxchg8b.
The performance impact of the immediate transfer is small. Most likely
any additional atomic access to the same counters are L1 cache hits.

By the way, I could bet 3 stolen bits on 64 bit architectures will also
be sufficient for modern hardware with many cores.
Well, I never tested on SPARC T3 with hundreds of hardware threads or
something like that. But the cache line hopping effectively prevents too
many threads to be likely suspended after the first increments. Clearing
the counter is amortized faster that incrementing it. That's enough to
get no cumulative effect. The number of threads has no significant
impact on the maximum counter value.

> will occur. This condition will intrude on the actual pointer part,
> corrupting things. Well, that's the way it was with my old code using
> stolen bits as a reference count.

The latter point I avoided by clearing the counter fast enough.

> Please correct me if I am missing something from your acquire function.
> Thanks.

Absolutely not. It just did not happen ;-)

Marcel

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 11:52AM -0800

On 2/4/2021 10:43 PM, Marcel Mueller wrote:
>> need strong CAS. Wrt C++ and its weak and strong CAS variants:

> I never heard about a weak CAS primitive before. But yes the code relies
> on strong CAS. This is essential to be loop free.

Indeed. The "problem" we both have with loopfree is an architecture that
uses LL/SC. x86 has pessimistic atomic RMW's where a CAS failure,
actually means that it failed because the destination was not equal to
the comparand. So one can use a CAS for loopfree algorithms. However...
On a system with optimistic RMW's means that a compare_exchange_weak can
spuriously fail. This also means that compare_exchange_strong is going
to use a loop, internally. It's simply not our fault. :^)

> platform. x86 has very little (or no?) ability of memory reordering. So
> there are no further memory barriers required when doing simple atomic
> reads and writes. std::atomic should be more sophisticated.

Completely understood. Well, actually... x86 does require an explicit
membar in the form of MFENCE or a LOCK'ed RMW to implement certain
algorithms. One example is the hazard pointer algorihtm. It depends on a
store followed by a load to another location. This can be reordered on
an x86. Are you familiar with hazard pointers?

> CAS is not bad unless it is a CAS /loop/.
> Of course, if read-modify-write memory cycles of the hardware can be
> used this is even better.

Yup. I have wrote about this in the past about how loopless algorithms
can tend to lose their "luster" when implemented on an arch that uses
optimistic RMW's, like when CAS is implemented in terms of LL/SC.

> The number of atomic operations per access was limited to exactly 3.
> Other approaches may need only 2, but this will not work with only 2
> stolen bits.

Your algorihtm is loopless on x86, and SPARC, and on some others.
However, on a system with LL/SC, not so much. This is just the way it
is. Shit happens.

>> reference to something that it did not previously have a reference to.

> And this is likely to happen almost every time when you deal with an
> atomic instance of some referencing object.

It basically depends on the program, and how it accesses things. Some
programs simply do not require strong thread safety.

> sound a bit like Thread.Sleep or setTimeout solutions. From my
> experience this kind of solutions mainly move the bug from the developer
> to the customer.

What bug? RCU uses nothing like a sleep or timeout.

> special cases with extreme load peaks, e.g. with swapping and I/O delays
> involved or maybe some DDOS attack (or just children trying to connect
> to their school :-)

RCU's not really "ideal" for data-structures with high amounts of
updates. However, its not going to break because of them.

> time taken for the step. A single thread may always block for an unknown
> reason at any point. But it is rather unlikely that this happens to many
> of them at the same instruction at the same time.

You just might like this:

https://youtu.be/DZJPqTTt7MA

>> lockfree, even better when its loopless, and waitfree.

> I think I did not really catch the intended use case of your proxy
> collector for now.

My proxy collector use case is basically the same use case as RCU.
Readers can read, while writers are mutating the data structure, even
deleting things. Actually, it might be better than RCU in high writer
use cases. Humm...

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 12:07PM -0800

On 2/4/2021 11:25 PM, Marcel Mueller wrote:
> alignment I have only two bits for the outer count. So only 3 threads
> can safely acquire the pointer at the same time. The 4th one will cause
> the counter to return to 0 - an overflow.

Indeed.

> But it is not that bad. All I have to do is to clear the counter as fast
> as possible.

I will read it carefully. Noticed something like this but I am not sure
yet. Actually, porting your code over to C++17 would help me out here.

> by the snapshot value of the small counter first. And if resetting the
> counter to zero fails another thread already did the job an the
> increment of the main ref count is undone.

I need to dig into this a lot more.

> The assertion never triggered. I tested really hard with ~100 threads
> each hammering the same instance in an infinite loop. In fact the
> counter never reached the value 3. 2 was sufficient in real life.

This sounds a bit odd to me, but then again, I need to understand your
code better. And, I need to understand your use cases. In my proxy
collector here:

https://pastebin.com/raw/p1E9WN5i

It does not use stolen bits, but another technique. Think of wrt 32-bit:

0xRRRRRRRC

Where R is for the reference count, and C is for the collector index.

Millions of threads can increment the outer counter at the same time. No
problem.

> This is the price for the stolen bits hack.

Indeed.

> something like that. But the cache line hopping effectively prevents too
> many threads to be likely suspended after the first increments. Clearing
> the counter is amortized faster that incrementing it.

Is that _totally_ safe? I need to examine it further.

>> part, corrupting things. Well, that's the way it was with my old code
>> using stolen bits as a reference count.

> The latter point I avoided by clearing the counter fast enough.

fast enough makes we wonder.

>> Please correct me if I am missing something from your acquire
>> function. Thanks.

> Absolutely not. It just did not happen ;-)

Cool!

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 12:09PM -0800

On 2/2/2021 11:13 AM, Öö Tiib wrote:

> The concept is quite handy as same with intrusive pointers
> takes to manually implement some kind of "shut down" state
> and then to check and handle it explicitly.

To be quite honest, I still have never actually used weak pointers wrt
shared_ptr in real code. Need to study up. They seem a bit odd to me, oh
well.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 01:00PM -0800

On 2/5/2021 11:52 AM, Chris M. Thomasson wrote:

> Yup. I have wrote about this in the past about how loopless algorithms
> can tend to lose their "luster" when implemented on an arch that uses
> optimistic RMW's, like when CAS is implemented in terms of LL/SC.
[...]

For some reason, I remember writing about how a spin on an adaptive
mutex, or spinlock can be used to do other useful work. Instead of
spinning on a contended lock, use this moment in time to try to do some
real work. Iirc, the overall high-level pseudo-code was something like this:
_______________________________
void ct_special_try_lock_high_level(...)
{
while (! ct_try_lock(...))
{
if (! ct_try_to_do_some_other_work(...))
{
ct_wait_lock(...);
break;
}
}

return;
}
_______________________________

Without a "limit" on the number of times try_lock can be invoked, it
kind of turned into a little "message pump" type of scenario. Keep in
mind that each "spin" means other meaningful work was actually accomplished!

;^)

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 02:33PM -0800

On 2/5/2021 11:52 AM, Chris M. Thomasson wrote:
>> happens to many of them at the same instruction at the same time.

> You just might like this:

> https://youtu.be/DZJPqTTt7MA
[...]

Basically, calling a blocking operation while holding a mutex is bad!
Blocking while inside a RCU region is bad!

Onwards and upwards

Brian Wood <woodbrian77@gmail.com>: Feb 04 04:01PM -0800

On Thursday, February 4, 2021 at 4:54:11 PM UTC-6, Mr Flibble wrote:
> > resources into it. It's important to me. Anyway, I don't expect you
> > to suddenly appreciate it. Maybe in a few years.
> You mean you have invested thousands of hours and a large percentage of your resources into spamming this newsgroup about it?

I've invested much more in the software and company
than in posting here. But I'm used to your attempted
smears. They were more effective years ago in my
opinion.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 04 07:55PM -0800

On 2/4/2021 8:41 AM, Rick C. Hodgin wrote:
>>> vast. I map out very large and complex projects that I can envision
>>> and see all of the parts for, but I am able to do them in terms of
>>> labor hours due to life things.
[...]

Put your CAlive on a bootable CD/DVD, provide an iso... Therefore,
people can install your OS, tools, and everything. all in one go! Nice
and easy.

calive.iso

Where do I download it? Iirc, you said you would have one?

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Feb 05 12:15AM -0500

On Thu, 4 Feb 2021 20:53:52 +0000

> The reason I write "And Satan invented fossils? Yes?" is because it
> is a sarcastic, satirical and rhetorical question that dismisses your
> childlike creationist beliefs with little effort on my part.

That is exactly what that enemy spirit wants you to believe, Leigh.

But here's what's really happening: Jesus is seated upon His throne.
The world really is moving according to the Biblical plan. Christians
have come to the cross and met Jesus and asked forgiveness, and they
really are passing from death to life. And the warnings they give
people like you really are warnings from God through them.

That is truth. That is reality.

What the lying anti-Christ spirit wants you to believe is that you have
a handle on things when your not only really do not, but are so far
away from the truth that, unless you come to faith in this world, one
day when you stand before God in judgment you will be in awe at how
lost in sin you were, and how twisted your thinking, feelings,
emotions, beliefs, were by the influence of that lying spirit.

-----
You're facing a calamity, Leigh. The way out is the very thing that
enemy spirit entices you to not only dismiss, but also attack outright.

I care about you, Leigh. I care about all people. It's why I teach
you the truth.

Now consider your sign-off to me: "Now .. off." Here's my sign-off to
you: "You are loved, Leigh. Loved by God as part of His creation.
You have sin, and Jesus came to this world to forgive YOU personally,
to take on YOUR sin and set you free from the judgment you're facing
today. God loves you enough to look past your hatred of Him for all
these years. He loves you enough to still call out to you, still
desires you to be a part of His eternal Kingdom of glory and power."

You are beautiful and precious, Leigh. And there's a calling to a
better way for you, your life here, and in eternity. Peace.

--
Rick C. Hodgin

David Brown <david.brown@hesbynett.no>: Feb 05 11:28AM +0100

On 05/02/2021 04:55, Chris M. Thomasson wrote:
> and easy.

> calive.iso

> Where do I download it? Iirc, you said you would have one?

If you want to encourage Rick with this, please do it by email or in his
dedicated CAlive google group. (And I mean that literally - please do
support him if you think there is a future for his projects, but please
do it in appropriate ways.)

We've had more than enough about it here and in c.l.c., where it is
equally off-topic. And we have had more than enough of the pointless
repetitive shooting matches between Rick and Mr. Flibble. Neither of
them seems to recognize a lost cause when it is so clear to everyone else.

David Brown <david.brown@hesbynett.no>: Feb 05 11:43AM +0100

On 04/02/2021 23:09, Brian Wood wrote:
>>> in using?

> Survival of the fittest -- to demonstrate their talents and strengths to
> the world?

So you think people will do your work for the bragging rights of finding
mistakes or poor choices in your code? I think you overestimate the
reputation you have as a programmer. Finding issues in well-developed,
well-reviewed code that has been inspected by large numbers of
programmers - /that/ gives bragging rights. Finding possible
improvements in ordinary code written by one ordinary programmer and
checked by no one is merely part of the daily grind for a coder.

> Or to the victor go the spoils:

Are you offering money for the work? That would be entirely reasonable,
and much more likely to succeed. Or are you merely offering Biblical
quotations and the promise of Brownie points in the next life? That's a
harder sell for most potential code reviewers.

> Maybe because I'm persistent?

You think that if you bug people enough, someone is going to review your
code to make you shut up? That's /really/ stretching it.

> Or some other reason that I haven't considered.

Personally, I think the main reason is that you have not stopped and
taken the time to consider how you come across to others.

Don't get me wrong here - I appreciate that you've put a lot of effort
into your code and you want it to succeed. I'm trying to give
suggestions to how you could improve your chances. And the most obvious
suggestion is that when you are doing the same things again and again
for years, with /zero/ results, it is time to change your tactics. Get
off that "holier-than-thou" high horse of yours and try to understand
what could possibly motivate someone to use your project, or to join you
in its development. People need /concrete/ reasons, not delusions of
grandeur, and not mindless optimism about how you think everyone will
use it in the future.

Mr Flibble <flibble@i42.REMOVETHISBIT.co.uk>: Feb 05 03:24PM

On 05/02/2021 05:15, Rick C. Hodgin wrote:
> really are passing from death to life. And the warnings they give
> people like you really are warnings from God through them.

> That is truth. That is reality.

And Satan invented fossils, yes? Spammer?

/Flibble

--
😎

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 05 12:18PM -0800

On 2/5/2021 2:28 AM, David Brown wrote:
> equally off-topic. And we have had more than enough of the pointless
> repetitive shooting matches between Rick and Mr. Flibble. Neither of
> them seems to recognize a lost cause when it is so clear to everyone else.

Apparently he banned me. However, he did promise me around a year ago to
have an iso. I am willing to install it on an older x86.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, February 5, 2021

Digest for comp.lang.c++@googlegroups.com - 16 updates in 2 topics

No comments:

Blog Archive

About Me