soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Strange optimization - 6 Updates
Turn off Address Sanitiser for just one function call - 1 Update
Creating a termination analyzer that doesn't get fooled - 1 Update
Semaphore thread Flip: 20.000 clock cycles - 6 Updates
Compilers still disagree on the Most Vexing Parse - 1 Update

David Brown <david.brown@hesbynett.no>: Jun 17 07:57PM +0200

On 17/06/2023 00:17, Alf P. Steinbach wrote:
>> would be
>> simpler and equally safe.

> On that we agree. :)

If we knew that the original pointer pointed to a uint64_t, then we
could /all/ agree that a plain assignment would be simpler, clearer, and
as efficient as possible.

But as far as I know, no such guarantee exists. My guess, from how
functions like this are sometimes used, is that the original data is in
an array of unsigned char - perhaps a buffer for a received network
packet or a file that has been read.

And if the data does not start as a uint64_t (or compatible type), then
reading it through a uint64_t glvalue is /not/ safe, even if alignment
is guaranteed.

>> claiming that uint64_t is a character type?

> It may be that my English isn't good enough to understand a one-way
> nature of the GCC docs' wording.

The "cppreference" site is often clearer than the standards language:

<https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>

The key point is that a reinterpret_cast (or equivalent via a C-style
cast) does not let you access incompatible types. It does not, in gcc
parlance, side-step the strict aliasing rules.

Like a C cast, reinterpret_cast is a way to tell the compiler that you
think it is safe to change the type of an object (usually a pointer).
But it does not /make/ it safe. And if the compiler can see that the
actual object type is not compatible with the way you are accessing it,
then you have a conflict - you are lying to the compiler, and no good
will come of it. A reinterpret_cast will not change that situation.
(And separate compilation will only hide the problem.)

> value, it can optimize, e.g. not reload that value from memory when it's
> already in a register. If T = float and U* = int*, then it can make this
> assumption. Similarly if T = int and U* = float*, it can assume this.

Yes.

> But if T = char-type, such as the first char in an array, and U* is a
> double*, then it can not reasonably make this assumption, and as I read
> the docs quote g++ doesn't make this assumption for T = char-type.

No. You can use a char-type pointer to access a non-char object, but
you cannot use a non-char pointer to access a char (array) object.

I think gcc's documentation could certainly be clearer here, and I also
think that the documentation for "-fstrict-aliasing" flag could be moved
from an optimisation flag to the "code generation options" page. IMHO,
using "-fno-strict-aliasing" is a significant change to the semantics of
the language, making previously undefined behaviour into defined
behaviour (much like "-fwrapv" does for signed integer overflow).

> And similarly, if T = double and U* is a char-type*, then it can not
> reasonably make this assumption, and as I read the docs quote g++
> doesn't make this assumption for U* = char-type*.

Correct.

However, gcc/g++ is not saying anything more or less than the standards
say here. The behaviour - both the defined behaviour, and the undefined
behaviour - comes straight from the standard. (The only exception is
for type-punning unions. C90 did not explicitly say they were allowed,
but the gcc documentation says they are allowed even in C90 mode. C99
onwards allows them, while C++ never has.)

Bonita Montero <Bonita.Montero@gmail.com>: Jun 17 08:08PM +0200

> ... The memcpy() would only be reasonable if it were possible
> for "data' to be a misaligned pointer; otherwise simple assignment
> would be simpler and equally safe.

I'm aliasing a char-array.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jun 17 08:37PM +0200

On 2023-06-17 7:57 PM, David Brown wrote:
>> read the docs quote g++ doesn't make this assumption for T = char-type.

> No. You can use a char-type pointer to access a non-char object, but
> you cannot use a non-char pointer to access a char (array) object.

Assuming no padding bits, which for clarity is not checked (AFAIK there
is no extant compiler that introduces padding bits):

#include <new>
#include <assert.h>
#include <stdio.h>

auto main() -> int
{
alignas( int ) char chars[sizeof( int )] = {};
int* const p = std::launder( new( (void*) chars ) int );
assert( *p == 0 );
chars[0] = 42;
printf( "This system is %s-endian.\n", (*p == 42)? "little" :
"big" );
}

One possible way to reconcile that working example with your assertion
"cannot" is that in your view the `*p` expressions do not access the
character array but the `double` object that resides there.

If that is the case then the assertion is meaningless nonsense,
consistent with your sequence of self-contradictions in this thread.

[snip]

- Alf

David Brown <david.brown@hesbynett.no>: Jun 18 12:30AM +0200

On 17/06/2023 20:37, Alf P. Steinbach wrote:

> One possible way to reconcile that working example with your assertion
> "cannot" is that in your view the `*p` expressions do not access the
> character array but the `double` object that resides there.

It is an int, in your example, not a double, but otherwise that is
/exactly/ what happens. *p is accessing the int object, not a char array.

Then "chars[0] = 42;" is using a character pointer to access the memory
used for storing an int.

<https://en.cppreference.com/w/cpp/utility/launder>

> If that is the case then the assertion is meaningless nonsense,
> consistent with your sequence of self-contradictions in this thread.

I haven't contradicted myself - I have only contradicted you.

David Brown <david.brown@hesbynett.no>: Jun 18 12:36AM +0200

On 16/06/2023 19:27, Alf P. Steinbach wrote:

> [snip]

>> The ball is in your court.

> In the above you contradict yourself:

Your English is better than most native speakers - but you have to /try/
to read and understand posts. It is your comprehension that is at
fault, and I suspect also your assumptions about C and C++. (To be
fair, many of your assumptions are true in all realistic implementations
in the modern world.)

> * First you claim that there are no padding bits for `uint64_t`.

Correct. That's what the C standard says, and that's what the C++
standard says by delegation to the C standard.

> * Then you paraphrase the standard that "there may be padding bits".

You asked for references - I gave them. Look them up. You will see
that in general, unsigned integer types (other than unsigned char) may
have padding bits. Only specific types such as the size-specific types
in <stdint.h> are guaranteed not to have padding bits - /if/ they exist.
An implementation might not support these non-padded types at all. It
might support them as extended integer types, and have padding bits in
the standard integer types. It might have lots of unsigned types - some
with padding, some without.

But uint64_t and the other size-specific unsigned types are guaranteed
not to have padding.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jun 18 01:11AM +0200

On 2023-06-18 12:30 AM, David Brown wrote:

>> If that is the case then the assertion is meaningless nonsense,
>> consistent with your sequence of self-contradictions in this thread.

> I haven't contradicted myself - I have only contradicted you.

Well I give up, as usual when I encounter strong religious beliefs with
argumentation consisting of self-contradictions, denials and advice.

- Alf

Turn off Address Sanitiser for just one function call

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Jun 17 03:39PM -0700

I'm currently writing a program that decrypts its own machine code at runtime.

The function that writes to the code memory is causing a crash because it's being flagged by Address Sanitizer.

I want to have a way to be able to call any function and to have the choice of disabling address sanitiser for that one invocation of the function (but not for other invocations).

Here's what I've written:

template<typename Lambda>
void NoSanitizeAddress(Lambda &f) __attribute__((__no_sanitize_address__));

template<typename Lambda>
void NoSanitizeAddress(Lambda &f)
{
f();
}

int main(void)
{
NoSanitizeAddress( [](){ SomeFunction(); } );
}

This works fine. I just wonder if there's a better way of doing this though.

Creating a termination analyzer that doesn't get fooled

olcott <polcott2@gmail.com>: Jun 17 02:23PM -0500

sci.logic Daryl McCullough Jun 25, 2004, 6:30:39 PM
You ask someone (we'll call him "Jack") to give a truthful
yes/no answer to the following question:

Will Jack's answer to this question be no?
Jack can't possibly give a correct yes/no answer to the question.

When the halting problem is construed as requiring a correct yes/no
answer to a self-contradictory question it cannot be solved.

My semantic linguist friends understand that the context of the question
must include who the question is posed to otherwise the same
word-for-word question acquires different semantics.

The input D to H is the same as Jack's question posed to Jack, has no
correct answer because within this context the question is
self-contradictory.

When we ask someone else what Jack's answer will be or we present a
different TM with input D the same word-for-word question (or bytes of
machine description) acquires entirely different semantics and is no
longer self-contradictory.

When we construe the halting problem as determining whether or not an
(a) Input D will halt on its input <or>
(b) Either D will not halt or D has a pathological relationship with H

Then this halting problem cannot be showed to be unsolvable by any of
the conventional halting problem proofs.

The x86utm operating system
(includes several termination analyzers)
https://github.com/plolcott/x86utm

--
Copyright 2023 Olcott "Talent hits a target no one else can hit; Genius
hits a target no one else can see." Arthur Schopenhauer

Semaphore thread Flip: 20.000 clock cycles

Bonita Montero <Bonita.Montero@gmail.com>: Jun 17 06:22PM +0200

I wanted to test how many time it takes for a thread to signal
a semaphore to another thread and to wait to be signalled back.
That's essential for mutexes when they're contended. I tested
this under Windows 11 with a Ryzen 9 7950X system.
I tested different combinations of logical cores. The first
thread is always fixed on the first core and the other thread
is varying. I print the X2 APIC ID along with the result.
The fastest result I get is about 20.000 clock cycles for one
switch to the other thread. I think that's enormous.
A similar benchmark written for linux using Posix semapohres
gives about 8.000 clock cylces per switch on a 3990X system.
That's a huge difference since the CPU is a Zen2-CPU with a
much lower clock rate than the 7950X Zen4 system.

#include <Windows.h>
#include <iostream>
#include <thread>
#include <system_error>
#include <chrono>
#include <latch>
#include <charconv>
#include <intrin.h>

using namespace std;
using namespace chrono;

int main( int argc, char **argv )
{
static auto errTerm = []( bool succ, char const *what )
{
if( succ )
return;
cerr << what << endl;
terminate();
};
int regs[4];
__cpuid( regs, 0 );
errTerm( (unsigned)regs[0] >= 0xB, "max CPUID below 0xB" );
bool fPrio = SetPriorityClass( GetCurrentProcess(),
REALTIME_PRIORITY_CLASS )
|| SetPriorityClass( GetCurrentProcess(), HIGH_PRIORITY_CLASS );
errTerm( fPrio, "can't set process priority class" );
unsigned nCPUs = jthread::hardware_concurrency();
for( unsigned cpuB = 1; cpuB != nCPUs; ++cpuB )
{
auto init = []( HANDLE &hSem, bool set )
{
hSem = CreateSemaphoreA( nullptr, set, 1, nullptr );
errTerm( hSem, "can't create semaphore" );
};
HANDLE hSemA, hSemB;
init( hSemA, false );
init( hSemB, true );
atomic_int64_t tSum( 0 );
latch latRun( 3 );
auto flipThread = [&]( HANDLE hSemMe, HANDLE hSemYou, size_t n,
uint32_t *pX2ApicId )
{
latRun.arrive_and_wait();
auto start = high_resolution_clock::now();
for( ; n--; )
errTerm( WaitForSingleObject( hSemMe, INFINITE ) == WAIT_OBJECT_0,
"can't wait for semaphore" ),
errTerm( ReleaseSemaphore( hSemYou, 1, nullptr ), "can't post
semaphore" );
tSum.fetch_add( duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count(), memory_order::relaxed );
if( !pX2ApicId )
return;
int regs[4];
__cpuidex( regs, 0xB, 0 );
*pX2ApicId = regs[3];
};
constexpr size_t ROUNDS = 10'000;
uint32_t x2ApicId;
jthread
thrA( flipThread, hSemA, hSemB, ROUNDS, nullptr ),
thrB( flipThread, hSemB, hSemA, ROUNDS, &x2ApicId );
errTerm( SetThreadAffinityMask( thrA.native_handle(), 1 ), "can't set
CPU affinity" );
errTerm( SetThreadAffinityMask( thrB.native_handle(), (DWORD_PTR)1 <<
cpuB ), "can't set CPU affinity" );
latRun.arrive_and_wait();
thrA.join();
thrB.join();
cout << x2ApicId << ": " << (double)tSum.load( memory_order::relaxed )
/ (2.0 * ROUNDS) << endl;
};
}

Bo Persson <bo@bo-persson.se>: Jun 17 06:37PM +0200

On 2023-06-17 at 18:22, Bonita Montero wrote:
> gives about 8.000 clock cylces per switch on a 3990X system.
> That's a huge difference since the CPU is a Zen2-CPU with a
> much lower clock rate than the 7950X Zen4 system.

I have Windows 10 with a Core i9 9900K, and get results between 7500 and
8000.

So is it the Windows version or the CPU model that is most important?

Bonita Montero <Bonita.Montero@gmail.com>: Jun 17 06:58PM +0200

Am 17.06.2023 um 18:37 schrieb Bo Persson:

> I have Windows 10 with a Core i9 9900K, and get results between 7500 and
> 8000.

That's the number of nanoseconds for each switch.
You must check the current clock rate.

Bo Persson <bo@bo-persson.se>: Jun 17 07:37PM +0200

On 2023-06-17 at 18:58, Bonita Montero wrote:
>> and 8000.

> That's the number of nanoseconds for each switch.
> You must check the current clock rate.

Ok, so if running at 5 Ghz, you multiply by 5?

Then it goes from good to really bad! :-)

Bonita Montero <Bonita.Montero@gmail.com>: Jun 17 07:51PM +0200

Am 17.06.2023 um 19:37 schrieb Bo Persson:
>> You must check the current clock rate.

> Ok, so if running at 5 Ghz, you multiply by 5?

> Then it goes from good to really bad! :-)

I used CoreTemp to see how the clocking of
the cores develops while running that code.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Jun 17 02:28PM -0400

Bonita Montero wrote:
> gives about 8.000 clock cylces per switch on a 3990X system.
> That's a huge difference since the CPU is a Zen2-CPU with a
> much lower clock rate than the 7950X Zen4 system.

Comparing the performance of POSIX mutices and Windows semaphores for
thread signalling is apple-to-orange. A Windows critical section is the
closest analogue to the POSIX inter-thread recursive non-robust mutex. I
don't think there is a close analogue to a non-recursive non-robust
mutex -- which is potentially the fastest.

Compilers still disagree on the Most Vexing Parse

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Jun 16 11:45PM -0400

Andrey Tarasevich wrote:

> What about GCC, which accepts the code? Is this a consequence of GCC
> sticking to a different interpretation of that "could possibly be"? Or
> is this a GCC-specific "extension"?

I am guessing, the former. I did not find a documented extension, even
though the most vexing parse is documented with several examples; so
leaving such an extension undocumented seems unlikely.

Also, my immediate interpretation of "could possibly be" is that it is
equivalent to "could be a part of a legal C++ program" and a legal C++
program has to be compilable. In other words, if the program can be
compiled in more than one ways, there is an ambiguity; else, there is no
ambiguity and there is nothing to resolve. Long story short, I am with
gcc here.

HTH,
-Pavel

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Saturday, June 17, 2023

Digest for comp.lang.c++@googlegroups.com - 15 updates in 5 topics

No comments:

Blog Archive

About Me