soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Tricky ... - 18 Updates
How to get mantissa of long double? - 5 Updates
Most efficient prefetching distance - 1 Update
"Improving Stability with Modern C++, Part 1" by Ralph Kootker - 1 Update

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 05:14PM -0700

On 10/1/2021 3:13 PM, Branimir Maksimovic wrote:

> Same thing practically, except linux futex, which is same thing.
> Interrestingly Darwin does not have it and I am really interrested
> how Apple immplements pthread_mutex?

Ohhhh... Good question. I am not sure about Darwin. Actually, there is a
way to implement the mutex I showed you using binary semaphores for the
slow path. Iirc, it went like this, with the futex part commented out,
and the initialization of the binary sema to zero, auto-reset event on
windoze, also out:
____________________________
struct ct_futex_mutex
{
ULONG alignas(CT_L2_ALIGNMENT) m_state;

ct_futex_mutex() : m_state(0)
{

}

void lock()
{
if (InterlockedExchange(&m_state, 1))
{
while (InterlockedExchange(&m_state, 2))
{
//ULONG cmp = 2;
//WaitOnAddress(&m_state, &cmp, sizeof(ULONG), INFINITE);

WaitForSingleObject(m_event, INFINITE);
}
}
}

void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
//WakeByAddressSingle(&m_state);

SetEvent(m_event);
}
}
};
____________________________

That works as well. Humm...

https://github.com/apple/darwin-libpthread/blob/main/man/pthread_mutex_lock.3

https://github.com/apple/darwin-libpthread/blob/main/src/pthread_mutex.c

Need to example this! There is another way to create a nice FIFO mutex
using a fast semaphore. Iirc, it was called a benaphore:

https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 05:16PM -0700

On 10/1/2021 5:14 PM, Chris M. Thomasson wrote:
> On 10/1/2021 3:13 PM, Branimir Maksimovic wrote:
>> On 2021-10-01, Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
[...]

> There is another way to create a nice FIFO mutex
> using a fast semaphore. Iirc, it was called a benaphore:

> https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26

Basically, its adding the ability to avoid the kernel using a fast-path
on the semaphore logic. Also, its wait-free on the fast-path because it
can be implemented using XADD.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 06:00PM -0700

On 9/30/2021 12:10 AM, David Brown wrote:
> synchronisation right - and a lot of /really/ tough stuff in getting it
> right and optimally efficient on big processors. But "acquire" and
> "release" semantics are helpfully named!

They are named nicely. Back on the SPARC acquire is:

MEMBAR #LoadStore | #LoadLoad

Release is:

MEMBAR #LoadStore | #StoreStore

Ahhh, then we have hardcore: #StoreLoad. SMR required this on the SPARC.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 06:01PM -0700

On 10/1/2021 6:00 PM, Chris M. Thomasson wrote:

> Release is:

> MEMBAR #LoadStore | #StoreStore

> Ahhh, then we have hardcore: #StoreLoad. SMR required this on the SPARC.

Heck, it even required it on the x86! LOCK'ed atomic or MFENCE.

Branimir Maksimovic <branimir.maksimovic@icloud.com>: Oct 02 01:44AM

> slow path. Iirc, it went like this, with the futex part commented out,
> and the initialization of the binary sema to zero, auto-reset event on
> windoze, also out:
Problem is that Apple act like student newbs. They don't care about
API stability and code in general. Puring empty to void, you have
to waste time a lot if programming for macOS...

> };
> ____________________________

> That works as well. Humm...
yeah...

> Need to example this! There is another way to create a nice FIFO mutex
> using a fast semaphore. Iirc, it was called a benaphore:

> https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26
I look...

--

7-77-777
Evil Sinner!

Branimir Maksimovic <branimir.maksimovic@icloud.com>: Oct 02 02:14AM

> Need to example this! There is another way to create a nice FIFO mutex using
> a fast semaphore. Iirc, it was called a benaphore:

> https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26
Here is Apple version:
#include <locale>
#include <iostream>
#include <thread>
#include <semaphore.h>
#include <functional>

#define CT_L2_ALIGNMENT 128
#define CT_THREADS 32
#define CT_ITERS 666666
using ULONG = unsigned long;

struct ct_futex_mutex
{
sem_t sema;
alignas(CT_L2_ALIGNMENT) ULONG m_state;

ct_futex_mutex() : m_state(0)
{
sem_init(&sema,0,0);
}

void lock()
{
if (__sync_swap(&m_state, 1))
{
while (__sync_swap(&m_state, 2))
{
sem_wait(&sema);
}
}
}

void unlock()
{
if (__sync_swap(&m_state, 0) == 2)
{
sem_post(&sema);
}
}
};

struct ct_shared
{
ct_futex_mutex m_mtx;
unsigned long m_count;

ct_shared() : m_count(0) {}

~ct_shared()
{
if (m_count != 0)
{
std::cout << "counter is totally fubar!\n";
}
}
};

void ct_thread(ct_shared& shared)
{
for (unsigned long i = 0; i < CT_ITERS; ++i)
{
shared.m_mtx.lock();
++shared.m_count;
shared.m_mtx.unlock();

shared.m_mtx.lock();
--shared.m_count;
shared.m_mtx.unlock();
}
}

int main()
{
std::locale mylocale("");
std::cout.imbue(mylocale); // use locale number formatting style
std::thread *threads = new std::thread[CT_THREADS];

std::cout << "Starting up...\n";

{
ct_shared shared;

for (unsigned long i = 0; i < CT_THREADS; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared));
}

std::cout << "Running...\n";

for (unsigned long i = 0; i < CT_THREADS; ++i)
{
threads[i].join();
}
}

std::cout << "Completed!\n";

return 0;
}

/**
;^)

>>> stacks.

>> Sure. Whatever you say Bonita. Cough.... Cough... ;^)
> Heh, rsyncing something, looking bloat and have no patience :p

;^)
*/

--

7-77-777
Evil Sinner!

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 07:33PM -0700

On 10/1/2021 7:14 PM, Branimir Maksimovic wrote:
>> a fast semaphore. Iirc, it was called a benaphore:

>> https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26
> Here is Apple version:
[...]

Excellent! That is basically identical to the bin-sema version! For what
its worth, take reference to a man by the name of Alexander Terekhov!
And look at the code in pthreads-win32 sources. I used to converse with
this genius way back on comp.programming.threads. He was a pleasure to
talk to. Actually, I am SenderX way back here:

https://groups.google.com/g/comp.programming.threads/c/KepRbFWBJA4/m/pg83oJTzPUIJ

;^)

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:39AM +0200

Am 01.10.2021 um 22:04 schrieb Chris M. Thomasson:

>> If it hasn't anything other to do than "waiting" for a new entry it
>> spins.

> Huh? Ever heard of a futex, or an eventcount?

Lock-free is without any kernel-structures and polling only.
No futex.

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:40AM +0200

Am 01.10.2021 um 22:20 schrieb Chris M. Thomasson:
>> spins. Lock-free and wait-free datastructures are idiocracy except
>> from lock-free stacks.

> So a lock-free stack is okay with you, but not a lock-free queue? Why?

Because the use-cases of lock-free stacks are so that the
lock-free stacks are never polled.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 10:12PM -0700

On 10/1/2021 9:39 PM, Bonita Montero wrote:

>> Huh? Ever heard of a futex, or an eventcount?

> Lock-free is without any kernel-structures and polling only.
> No futex.

Lock-free on the fast-path... Ever heard of such a thing? Wow.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 10:12PM -0700

On 10/1/2021 9:40 PM, Bonita Montero wrote:

>> So a lock-free stack is okay with you, but not a lock-free queue? Why?

> Because the use-cases of lock-free stacks are so that the
> lock-free stacks are never polled.

Huh? What are you talking about?

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 10:14PM -0700

On 10/1/2021 9:40 PM, Bonita Montero wrote:

>> So a lock-free stack is okay with you, but not a lock-free queue? Why?

> Because the use-cases of lock-free stacks are so that the
> lock-free stacks are never polled.

Never polled? A slow-path on a lock-free stack can be waited on.

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 07:35AM +0200

Am 02.10.2021 um 07:12 schrieb Chris M. Thomasson:

>> Lock-free is without any kernel-structures and polling only.
>> No futex.

> Lock-free on the fast-path... Ever heard of such a thing? Wow.

There is no slow path with lock-free structures.

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 07:36AM +0200

Am 02.10.2021 um 07:14 schrieb Chris M. Thomasson:

>> Because the use-cases of lock-free stacks are so that the
>> lock-free stacks are never polled.

> Never polled? A slow-path on a lock-free stack can be waited on.

Then it isn't lock-free.
I think I'm talking to a complete moron here.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 10:49PM -0700

On 10/1/2021 10:35 PM, Bonita Montero wrote:
>>> No futex.

>> Lock-free on the fast-path... Ever heard of such a thing? Wow.

> There is no slow path with lock-free structures.

Ummmm... You are just trolling me right? A slow path would be what to do
when one needs to wait, on say, an empty condition? Humm... Why do you
troll?

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 01 10:55PM -0700

On 10/1/2021 10:36 PM, Bonita Montero wrote:

>> Never polled? A slow-path on a lock-free stack can be waited on.

> Then it isn't lock-free.
> I think I'm talking to a complete moron here.

Fast-path lock/wait-free, slow-path might have to hit the kernel to wait
on certain conditions. You seem to have never differentiated between the
two possible paths?

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 08:00AM +0200

Am 02.10.2021 um 07:49 schrieb Chris M. Thomasson:

> Ummmm... You are just trolling me right? A slow path would be what to do
> when one needs to wait, on say, an empty condition? Humm... Why do you
> troll?

Lock-free is when there's no kernel-locking involved.
And a slow-path involves kernel-locking.

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 08:00AM +0200

Am 02.10.2021 um 07:55 schrieb Chris M. Thomasson:

> Fast-path lock/wait-free, slow-path might have to hit the kernel to wait
> on certain conditions. You seem to have never differentiated between the
> two possible paths?

Lock-free is when there's no kernel-locking involved.
And a slow-path involves kernel-locking.

How to get mantissa of long double?

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 01 08:19PM -0400

On 10/1/21 2:53 PM, Branimir Maksimovic wrote:
...
> long double max = cumeric_limits<long double>::max();
> long mantissa = max; // impicit conversion
"A prvalue of a floating-point type can be converted to a prvalue of an
integer type. The conversion truncates; that is, the fractional part is
discarded. The behavior is undefined if the truncated value cannot be
represented in the destination type." (7.3.10p1).

While it's not required to be the case, on most implementations
std::numeric_limits<long double>::max() is WAY too large to be
represented by a long, so the behavior of such code is undefined. The
minimum value of LDBL_MAX (set by the C standard, inherited by the C++
standard) is 1e37, which would require long to have at least 123 bits in
order for that conversion to have defined behavior.
And even when it has defined behavior, I can't imagine how you would
reach the conclusion that this conversion should be the value of the
mantissa.

wij <wyniijj@gmail.com>: Oct 01 05:23PM -0700

> > The frexp functions break a floating-point number into a normalized fraction and an integer exponent. They store the integer in the int object pointed to by p .

> So iexp should be set. When I ran the code, it got set to a value of
> 16384. That's not the problem.

// ----- file t.cpp -----
#include <math.h>
#include <limits>
#include <iostream>

using namespace std;
#define ENDL endl

template<typename T>
int64_t get_mant(T x) {
int iexp;
x=frexp(x,&iexp);
x=ldexp(x,numeric_limits<T>::digits);
return static_cast<int64_t>(x);
};

int main()
{
cout << dec << get_mant(numeric_limits<float>::max()) << ", "
<< hex << get_mant(numeric_limits<float>::max()) << ENDL;
cout << dec << get_mant(numeric_limits<double>::max()) << ", "
<< hex << get_mant(numeric_limits<double>::max()) << ENDL;
cout << dec << get_mant(numeric_limits<long double>::max()) << ", "
<< hex << get_mant(numeric_limits<long double>::max()) << ENDL;
return 0;
};
// end file t.cpp -----

$ g++ t.cpp
]$ ./a.out
16777215, ffffff
9007199254740991, 1fffffffffffff
-9223372036854775808, 8000000000000000

Branimir Maksimovic <branimir.maksimovic@icloud.com>: Oct 02 02:16AM

> And even when it has defined behavior, I can't imagine how you would
> reach the conclusion that this conversion should be the value of the
> mantissa.
It is not undefined as long is larger then mantissa part.
ok correct is long long :P

--

7-77-777
Evil Sinner!

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 02 01:28AM -0400

On 10/1/21 10:16 PM, Branimir Maksimovic wrote:
>> reach the conclusion that this conversion should be the value of the
>> mantissa.
> It is not undefined as long is larger then mantissa part.

No, the relevant issue isn't the size of the mantissa. As stated above
in my quote from the standard, it's whether "the truncated value cannot
be represented in the destination type". std::numeric_limits<long
double>::max() doesn't have a fractional part, so the truncated value is
the same as the actual value. On my system, for instance, that value is
1.18973e+4932.

> ok correct is long long :P

On my system, changing it to long long doesn't make any different - the
maximum value representable by long long is still 9223372036854775807,
the same as the maximum value for long; it's still far too small to
represent 1.18973e+4932, so the behavior of the conversion is undefined.
The actual behavior on my system appears to be saturating at LLONG_MAX
== 9223372036854775807. If I change the second line to

long long mantissa = 0.75*max;

0.75*max is 8.92299e+4931, which should certainly not have the same
mantissa as max itself, but the value loaded into "mantissa" is still
9223372036854775807.

James Kuyper <jameskuyper@alumni.caltech.edu>: Oct 02 01:56AM -0400

I accidentally sent this message first to Branimir by e-mail, and he
responded in kind.

On 10/2/21 1:31 AM, Branimir Maksimovic wrote:

>> On 02.10.2021., at 07:27, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

>> On 10/1/21 10:16 PM, Branimir Maksimovic wrote:
...
> Problem is that neither long double nor long is defined how small
> or large can be…
> so it can fit or not…
The C++ standard cross-references the C standard for such purposes, and
the C standard imposes strict limits on how small those things can be:
LLONG_MAX is required to be at least 9223372036854775807, and LDBL_MAX
is supposed to be at least 1e37.
You are right, however, about there being no limits on how large they
can be. It is therefore permissible for an implementation to have
LLONG_MAX >= LDBL_MAX, but do you know of any such implementation?
In any event, the relevant issue is not the limits imposed on those
values by the standard, but the actual values of LLONG_MAX and LDBL_MAX
for the particular implementation you're using, and it's perfectly
feasible to determine those values from <climits>, <cfloat>, or
std::numeric_limits<>::max. What are those values on the implementation
you're using?
> but question is how to extract mantissa which was answer :P

No, that is not the answer. If max did have a value small enough to make
the conversion to long long have defined behavior, the result of that
conversion would be the truncated value itself (7.3.10p1), NOT the
mantissa of the truncated value. What makes you think otherwise?

Most efficient prefetching distance

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 07:09AM +0200

> Modern CPUs for the last decade have included automatic prefetchers
> in the cache subsystems. Usually a mix of stride-based and/or predictive
> fetchers.

If they would be better my program would give the best result of
zero prefetching. And there would be no prefetching-instructions
at all.

> It's very seldom necessary for an application to provide an
> explicit prefetching hint except in very unusual circumstances.

Automatic prefetchers are dumb.

"Improving Stability with Modern C++, Part 1" by Ralph Kootker

Lynn McGuire <lynnmcguire5@gmail.com>: Oct 01 08:04PM -0500

"Improving Stability with Modern C++, Part 1" by Ralph Kootker

https://medium.com/factset/improving-stability-with-modern-c-part-1-getting-started-f7025e97e1c3

"C++ is a workhorse programming language across the financial industry.
At FactSet, it has powered many of our products for almost 30 years.
Because our code base predates C++ standardization, we have a lot of
legacy code to support. However, we've recently completed a major
compiler upgrade that enables our workstation developers to start using
modern C++ for the first time. While C++ had a reputation for a steep
learning curve, we've found the new features safer, easy to adopt, and
more stable for our clients.
While the C++11 standard recently had its 10th birthday, there are still
developers out there using modern C++ for the first time. With that in
mind, we've prepared a series of short introductory posts on the
features of C++11 and beyond. We're sharing our journey with the wider
C++ community in the hopes that others will find it as useful as we have."

Lynn

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, October 1, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me