soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Lock-free LRU-cache-algorithm - 6 Updates
Turn a functor into a function pointer - 9 Updates
hash<size_t> - 1 Update
Cacheline Ping-Pong - 1 Update

Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 06:33PM +0200

> look like? LRU-caches are only possible with doubly-linked lists.
> So I thought this woudn't be possible lock-free. But maybe I'm
> wrong here and someone can give me an idea.

I've found a solution that is mega-cool. I'm not gonna reveal it
because I'm thinking about to get an US-patent on it, but it is a
exotic combination of locked and lock-free programming. It scales
almost linear with the number of threads.

scott@slp53.sl.home (Scott Lurndal): Sep 24 04:53PM

>> LRU-caches are only possible with doubly-linked lists.

Hm. That's news, indeed. Have you produced a formal proof?

You may want to consider LRU cache replacement algorithms used
in processor caches.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 07:00PM +0200

> Hm. That's news, indeed. Have you produced a formal proof?

There's nothing to prof. The idea isn't very complex,
but it is very original.

> You may want to consider LRU cache replacement algorithms
> used in processor caches.

That's a totally different issue as the caches are basically
pipelined, but within the pipeline each step is serialized.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:50PM -0700

On 9/24/2019 9:33 AM, Bonita Montero wrote:

> I've found a solution that is mega-cool. I'm not gonna reveal it
> because I'm thinking about to get an US-patent on it, but it is a
> exotic combination of locked and lock-free programming.

Do a deep patent feasibility study! Build a list of as many references
as you can, and read them all. It can safe you some of your hard earned
$$$. Patent lawyers cost a lot money! Or, do you work for Microsoft? For
some reason, I thought about that.

Let me guess... Clever hashed locking for the slow-paths, and lock-free
for the fast paths? That is a basic idiom. RCU uses it a lot. Locks for
the writers and sync free for the readers. The readers do not use any
atomics or memory barriers at all. Well, Alpha aside for a moment...

Here is a very simple example of a mixture of lock-based and wait-free.
100% wait-free on the push side, and "sometimes" lock-based on the
reader side experimental LIFO:

https://groups.google.com/d/msg/comp.arch/8Y0C8zGjtqI/bwg-hBLRAQAJ

> It scales
> almost linear with the number of threads.

Cool!

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:56PM -0700

On 9/24/2019 1:50 PM, Chris M. Thomasson wrote:
> On 9/24/2019 9:33 AM, Bonita Montero wrote:
[...]
> 100% wait-free on the push side, and "sometimes" lock-based on the
> reader side experimental LIFO:

> https://groups.google.com/d/msg/comp.arch/8Y0C8zGjtqI/bwg-hBLRAQAJ

The experimental core of the code above is:
________________________________________
// A work node
struct ct_work
{
std::atomic<ct_work*> m_next;
std::thread::id m_data;
ct_work(std::thread::id data) : m_next(nullptr), m_data(data) {}

void process()
{
// [...]
// User Processing For This Work
}

ct_work* get_next() const
{
ct_work* w = nullptr;

while ((w = m_next.load(CT_MB_RLX)) == CT_WAIT)
{
// we can spin, or even do other work right here...
std::this_thread::yield();
}

return w;
}
};

// Easy Stack, only uses XCHG
struct ct_estack
{
std::atomic<ct_work*> m_head;
ct_estack() : m_head(nullptr) {}

void push(ct_work* n)
{
n->m_next.store(CT_WAIT, CT_MB_RLX);
ct_work* head = m_head.exchange(n, CT_MB_REL); // release
n->m_next.store(head, CT_MB_RLX);
}

ct_work* flush_try()
{
return m_head.exchange(nullptr, CT_MB_ACQ); // acquire
}
};

// Consume an Easy Stack...
void ct_consume(
ct_estack& estack
) {
ct_work* w = estack.flush_try();

while (w)
{
// Process FIRST!
w->process();

// Now, we can gain the next pointer.
ct_work* next = w->get_next();

// Okay, we can delete the work
delete w;
g_allocs.fetch_sub(1, CT_MB_RLX); // dec

w = next;
}
}
________________________________________

The takeaway is that the LIFO class ct_estack only uses XCHG, and no
loops. Now, this can be distributed.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 24 01:58PM -0700

On 9/24/2019 10:00 AM, Bonita Montero wrote:
>> Hm. That's news, indeed. Have you produced a formal proof?

> There's nothing to prof. The idea isn't very complex,
> but it is very original.

It seems like you realized how bad CMPXCHG can be when used in a loop.
Always try XADD and/or XCHG in a loopless algorihtm, before going to a
CAS loop.

Turn a functor into a function pointer

Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 03:31AM -0700

Have they considered adding a one-liner to Boost that can turn a functor into a function pointer, for example so that we can use "std::atexit" with "boost::bind", like so:

extern void closeAnimation(int prefix);

int main()
{
atexit( boost::bind(closeAnimation, 0) );
}

Such a one-liner should really be in boost.

Ian Collins <ian-news@hotmail.com>: Sep 24 10:39PM +1200

On 24/09/2019 22:31, Frederick Gotham wrote:
> atexit( boost::bind(closeAnimation, 0) );
> }

> Such a one-liner should really be in boost.

What's wrong with

atexit( [](){ closeAnimation(0); });

?

--
Ian

Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 04:51AM -0700

On Tuesday, September 24, 2019 at 11:39:57 AM UTC+1, Ian Collins wrote:

> What's wrong with

> atexit( [](){ closeAnimation(0); });

> ?

That will work fine when there's nothing captured by the lambda function.

There could be a situation where a handle is obtained inside a function and stored in a local variable, something like:

void Init_Flight_Simulator()
{
FSim *handle = ObtainSimulator();

if ( nullptr != handle )
atexit( [handle](){ ReleaseSimulator(handle); } );

/* Continue doing stuff with handle */
}

This won't compile for me.

David Brown <david.brown@hesbynett.no>: Sep 24 02:31PM +0200

On 24/09/2019 13:51, Frederick Gotham wrote:

> /* Continue doing stuff with handle */
> }

> This won't compile for me.

It's good that it doesn't compile, because it would go horribly wrong -
your handle is at block scope, but needs to be available at program exit
time. You have to figure out the best way to make it life long enough.
For example, you could write:

void Init_Flight_Simulator()
{
static FSim *handle = ObtainSimulator();

if ( nullptr != handle )
atexit( [](){ ReleaseSimulator(handle); } );

/* Continue doing stuff with handle */
}

But you'd be better doing this with RAII and smart pointers - releasing
resources with "atexit" is not a typical C++ solution.

Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 03:47PM +0300

On 24.09.2019 14:51, Frederick Gotham wrote:

> /* Continue doing stuff with handle */
> }

> This won't compile for me.

And neither would a boost::bind solution. A single function pointer
fixed at compile time just cannot contain or otherwise hold another
value ('handle' here) whose value will become available at run time only.

Technically it could be done via trampolines by run-time code
generation, but I believe it's a good thing this is not part of C++.

Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 04:02PM +0300

On 24.09.2019 15:47, Paavo Helde wrote:
> value ('handle' here) whose value will become available at run time only.

> Technically it could be done via trampolines by run-time code
> generation, but I believe it's a good thing this is not part of C++.

PS. In Linux there is also on_exit() which takes an extra callback
pointer and would work fine with this example, but is arguably even more
unfriendly to lambdas.

Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 08:05AM -0700

I've been playing around with this code that still has a few issues:

#include <functional>

/* Pretend that these next 2 functions are defined in another translation unit */
int GetHandle(void) { return 5; }
void ReleaseHandle(int) { /* Whatever */ }

template <class RetType, class T>
RetType (*(Make_Funcptr(T &&functor)))(void)
{
class InnerClass {
public:

static T *Set_Or_Get_Functor(T *const arg)
{
static T *pfunctor = nullptr;

if ( nullptr != arg )
pfunctor = arg;

return pfunctor;
}

static RetType ActualFunction(void)
{
return (*(Set_Or_Get_Functor(0)))();
}
};

InnerClass::Set_Or_Get_Functor(&functor);

return InnerClass::ActualFunction;
}

void SomeFunc(void)
{
int h = GetHandle();

std::bind(ReleaseHandle, h);

atexit( Make_Funcptr<void>( std::bind(ReleaseHandle, h) ) );
}

int main(void)
{
SomeFunc();
}

A few problems:
(1) Taking the address of the temporary object returned from "bind"
(2) The static pointer will have one instance for each combination of T and RetType (when really we need a static pointer for every invocation of Make_FuncPtr)

Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 08:55PM +0300

On 24.09.2019 18:05, Frederick Gotham wrote:

> A few problems:
> (1) Taking the address of the temporary object returned from "bind"
> (2) The static pointer will have one instance for each combination of T and RetType (when really we need a static pointer for every invocation of Make_FuncPtr)

To accomplish such things with atexit() you will need a global static
anyway, so better be explicit about this:

#include <deque>
#include <functional>

int GetHandle(void) { return 5; }
void ReleaseHandle(int) { /* Whatever */ }

static std::deque<std::function<void()>> globalRegistry;

void SomeFunc()
{
int h = GetHandle();
globalRegistry.push_back([h]() {ReleaseHandle(h);});
}

int main() {
atexit([]() {
for (auto& functor : globalRegistry) {
functor();
}}
);

SomeFunc();
}

legalize+jeeves@mail.xmission.com (Richard): Sep 24 07:38PM

[Please do not mail me a copy of your followup]

Frederick Gotham <cauldwell.thomas@gmail.com> spake the secret code
> atexit( [handle](){ ReleaseSimulator(handle); } );

> /* Continue doing stuff with handle */
>}

IMO, this particular example would better be served by an RAII wrapper
class rather than through the use of atexit.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

hash<size_t>

Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 06:27AM +0200

So I added another hashcode-generation.
I don't show the changes but the full source now:

#include <iostream>
#include <cstddef>
#include <random>
#include <limits>
#include <vector>
#include <cmath>
#include <functional>
#include <algorithm>
#include <intrin.h>

using namespace std;

void showHashStatistics( vector<unsigned> &v );

int main()
{
size_t const MAX64_PRIME = 0xFFFFFFFFFFFFFFC5u;
size_t const N_BUCKETS = 0x100000;
random_device rd;
uniform_int_distribution<size_t> uidHash( 0,
numeric_limits<size_t>::max() );
vector<unsigned> v;

v.resize( N_BUCKETS, 0 );
cout << "simple prime hash:" << endl;
for( size_t i = 0; i != 10'000'000; ++i )
++v[(size_t)(uidHash( rd ) * MAX64_PRIME) % N_BUCKETS];
showHashStatistics( v );

size_t low, high;
fill( v.begin(), v.end(), 0 );
cout << "extended prime hash:" << endl;
for( size_t i = 0; i != 10'000'000; ++i )
low = _mul128( uidHash( rd ), MAX64_PRIME, &(__int64 &)high ),
++v[(low ^ high) % N_BUCKETS];
showHashStatistics( v );

cout << endl;
cout << "hash<size_t>:" << endl;
fill( v.begin(), v.end(), 0 );
hash<size_t> hasher;
for( size_t i = 0; i != 10'000'000; ++i )
++v[hasher( i ) % N_BUCKETS];
showHashStatistics( v );

double mean = (double)numeric_limits<size_t>::max() / 2.0;
normal_distribution<double> nd( mean, sqrt( mean ) );
cout << endl;
cout << "normal distribution:" << endl;
fill( v.begin(), v.end(), 0 );
for( size_t i = 0; i != 10'000'000; ++i )
++v[((size_t)nd( rd ) * MAX64_PRIME) % N_BUCKETS];
showHashStatistics( v );
}

void showHashStatistics( vector<unsigned> &v )
{
unsigned min = numeric_limits<unsigned>::max();
unsigned max = 0;
for( unsigned x : v )
min = x < min ? x : min,
max = x > max ? x : max;
cout << "min: " << min << " max: " << max << endl;

double avg = 0;
for( unsigned x : v )
avg += x;
avg /= v.size();
cout << "average: " << avg << endl;

double rmsd = 0.0;
double sq;
for( unsigned x : v )
sq = (x - avg),
rmsd += sq * sq;
rmsd = sqrt( rmsd / v.size() );
cout << "standard deviation: " << rmsd << endl << endl;

cout << "distrubution:" << endl;
vector<unsigned> vDist;
vDist.resize( max - min + 1, 0 );
for( unsigned x : v )
++vDist[x - min];
for( unsigned i = 0; i < max; ++i )
cout << vDist[i] << endl;
}

The relevant new part here is:

size_t low, high;
fill( v.begin(), v.end(), 0 );
cout << "extended prime hash:" << endl;
for( size_t i = 0; i != 10'000'000; ++i )
low = _mul128( uidHash( rd ), MAX64_PRIME, &(__int64 &)high ),
++v[(low ^ high) % N_BUCKETS];
showHashStatistics( v );

There I'm multiplying a random value with the maximum prime fitting
in 64 bit to a 128 bit value through an intrinsic and xor the higher
64 bit to the lower 64 bit. One might think you'd get pure randomness
here, but the distribution of bucket-depths is almost the same as
simply multiplying the input-value with the mentioned prime. So we
have bucket-depthts vom 0 to 29 instead 0 to 28, the average is
exactly the same and the standard-deviation is almost the same;
so WTF is going on here? Where's the relationship between randomness
basing on prime numbers and normal distribution?

Cacheline Ping-Pong

Bonita Montero <Bonita.Montero@gmail.com>: Sep 24 04:39AM +0200

> added to v with the CAS? Also, fwiw, atomic::compare_exchange_weak
> automatically returns the previous value in v on failure, no need to
> reload it on ever iteration of the loop.

I'm just testing the latency of moving cachelines between cores in two
variants. The source was a bit misordered because I pastet it twice, so
here it is another time.

#if defined(_MSC_VER)
#include <Windows.h>
#elif defined(__unix__)
#include <sys/sysinfo.h>
#include <sched.h>
#include <pthread.h>

soft and program

Tuesday, September 24, 2019

Digest for comp.lang.c++@googlegroups.com - 17 updates in 4 topics

No comments:

Blog Archive

About Me