soft and program: March 2020

comp.lang.c++@googlegroups.com

Google Groups

Invert every 2nd byte in a container of raw data - 25 Updates

Invert every 2nd byte in a container of raw data

Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 04:09AM +0200

>> There's no way to make it simpler.

> Well, you could just go back to the original code, and let the
> compiler's optimizer unroll the loop for you...

The discussion was about the simplicity of the source and not
of the compiled code.

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 31 04:38AM

> two pipelined 128 bit halves) and you said that is is wrong. I
> proved that you were wrong.
> Boy, youre so stupid.

I am talking about AVX troughoutput. 256 bit vdivpd is twice
faster then on Haswell... my err.

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Jorgen Grahn <grahn+nntp@snipabacken.se>: Mar 31 07:33AM

On Tue, 2020-03-31, Melzzzzz wrote:
>> Boy, youre so stupid.

> I am talking about AVX troughoutput. 256 bit vdivpd is twice
> faster then on Haswell... my err.

Twice faster if you disregard the data cache, surely?

I haven't paid attention to this thread, but won't memory bandwidth
be the bottleneck in the end anyway?

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 09:51AM +0200

>> I am talking about AVX troughoutput. 256 bit vdivpd is twice
>> faster then on Haswell... my err.

> Twice faster if you disregard the data cache, surely?

Here are the VDIVPD-numbers from agner.org for the first generation
Ryzen and for Haswell: Ryzen: 8 - 13 cycles latency, 8 - 9 cycles
throughput, Haswell 19-35 cycles latency, 18-28 cycles througput.
But Coffe Lake has a much higher performance than Haswell: 13 - 14
cycles latency, 8 cycles throughput.

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 31 07:58AM

> Twice faster if you disregard the data cache, surely?

> I haven't paid attention to this thread, but won't memory bandwidth
> be the bottleneck in the end anyway?

Sure. I am talking when data is in cache...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 31 08:01AM

> throughput, Haswell 19-35 cycles latency, 18-28 cycles througput.
> But Coffe Lake has a much higher performance than Haswell: 13 - 14
> cycles latency, 8 cycles throughput.

This is because since Skylake, Intel has 256 bit divpd.
Ryzen has 128 bit units but in pairs, so that 256
bits are single op. Only thing that drives Ryzen first gen
behind are FMA instructions as it can execute only one per cycle...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 10:06AM +0200

> This is because since Skylake, Intel has 256 bit divpd.
> Ryzen has 128 bit units but in pairs, so that 256
> bits are single op. ..

Do I have to write a benchmark comparing DIVPD and VDIVPD
on my 1800X?

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 31 08:22AM

>> bits are single op. ..

> Do I have to write a benchmark comparing DIVPD and VDIVPD
> on my 1800X?

If you wish:
~/.../examples/assembler >>> ./latency
recip1
15.833327168900179712 0.063157919326280312 0.063157919326280328
700.059641597344125328 0.001428449721395557 0.001428449721395557
860.050613320340289648 0.001162722268331821 0.001162722268331821
12.280964395431137600 0.081426829994884368 0.081426829994884448
144.000000 16.920612
108.000000 16.134408
108.000000 16.479540
108.000000 16.828776
144.000000 17.158536
144.000000 17.091432
108.000000 17.163324
144.000000 16.072596
108.000000 17.177688
144.000000 12.160980
72.000000 10.093536
recip2
15.833327168900179712 0.063157919326280328 0.063157919326280328
700.059641597344125328 0.001428449721395557 0.001428449721395557
860.050613320340289648 0.001162722268331821 0.001162722268331821
12.280964395431137600 0.081426829994884448 0.081426829994884448
72.000000 13.325616
72.000000 13.353768
36.000000 13.353624
72.000000 13.296960
72.000000 13.292208
72.000000 13.476024
72.000000 13.329972
72.000000 13.335264
72.000000 13.297500
72.000000 13.315464
72.000000 14.205312
recip3
15.833327168900179712 0.063157919326280328 0.063157919326280328
700.059641597344125328 0.001428449721395557 0.001428449721395557
860.050613320340289648 0.001162722268331821 0.001162722268331821
12.280964395431137600 0.081426829994884448 0.081426829994884448
72.000000 9.000108
72.000000 9.066672
72.000000 9.042948
72.000000 9.023184
72.000000 9.018360
108.000000 9.027612
72.000000 9.032760
72.000000 9.024768
72.000000 9.034740
72.000000 9.000072
72.000000 9.023256

This is latency bench on my 2700x. recip3 is pure divpd, while recip1 is
and recip 2 is newton-rapshon aprox.
As you can see divpd is fastest, unline on Intel where recip1 is 8
cycles and recip2 12 cycles (slow FMA on Ryzen).

~/.../examples/assembler >>> cat latency.asm
; latency test
format elf64
public recip
public recip1
public recip2
public recip3
public _rdtsc
section '.text' executable
N = 1000000
recip:
recip1:
; Load constants and input
vbroadcastsd ymm1, [one]
vpbroadcastq ymm4, [magic]
mov eax, N
.loop:
vmovdqu ymm0, [rdi]
vpsubq ymm2, ymm4, ymm0
vfnmadd213pd ymm0, ymm2, ymm1
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm0, ymm2, ymm2
dec eax
jnz .loop
vmovups [rdi], ymm0
ret

recip2:
; Load constants and input
vbroadcastsd ymm1, [one]
mov eax, N
.loop:
vmovdqu ymm0, [rdi]
vcvtpd2ps xmm2,ymm0
vrcpps xmm2,xmm2
vcvtps2pd ymm2,xmm2
vfnmadd213pd ymm0, ymm2, ymm1
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm2, ymm2, ymm0
vmulpd ymm0, ymm0, ymm0
vfmadd132pd ymm0, ymm2, ymm2
dec eax
jnz .loop
vmovups [rdi], ymm0
ret

recip3:
; Load constants and input
vbroadcastsd ymm1, [one]
mov eax, N
.loop:
vmovdqu ymm0, [rdi]
vdivpd ymm0,ymm1,ymm0
dec eax
jnz .loop
vmovups [rdi], ymm0
ret

_rdtsc:
rdtscp
shl rdx, 32
or rax, rdx
ret

section '.data' writeable align 16
align 16
one dq 3FF0000000000000h
magic dq 7FDE6238502484BAh

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 10:43AM +0200

Here's my code:

#define NOMINMAX
#if defined(_MSC_VER)
#include <Windows.h>

Monday, March 30, 2020

Digest for comp.lang.c++@googlegroups.com - 21 updates in 2 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

Invert every 2nd byte in a container of raw data - 20 Updates
Technical paper in 6 languages (including code comments) - 1 Update

Invert every 2nd byte in a container of raw data

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 08:32AM +0200

> for( size_t i = 0; i < n; i+=2 )
> p[i] = ~p[i];
> }

If I didn't make a mistake this should be faster:

void invertSecondD( uint8_t *p, size_t n )
{
if( !n-- )
return;
++p;
size_t head = (((intptr_t)p + 7) & (intptr_t)-8) - (intptr_t)p;
head = head >= n ? head : n;
for( uint8_t *w = p, *end = p + head; w < end; w += 2 )
*w = ~*w;
if( head <= n )
return;
int64_t odd = (int64_t)p & 1;
n -= head;
p += head;
if( n / 8 )
{
union
{
uint8_t u8Mask[8] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00 };
uint64_t mask;
};
uint64_t *w = (uint64_t *)p,
*end = w + n / 8;
mask ^= -odd;
do
*w ^= mask;
while( w != end );
p = (uint8_t *)w;
n = n % 8;
}
if( n <= (size_t)odd )
return;
p += (size_t)odd;
n -= (size_t)odd;
for( uint8_t *end = p + n; p < n; p += 2 )
p = ~*p;
}

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 08:38AM +0200

> for( uint8_t *end = p + n; p < n; p += 2 )
> p = ~*p;

for( uint8_t *end = p + n; p < end; p += 2 )
*p = ~*p;

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 29 11:54AM -0400

On 3/29/20 11:40 AM, Frederick Gotham wrote:
...
>I use "bind2nd" with "bit_xor" and "0xFF" in this snippet because I can't find the unary "bit_not" class. Is there such a thing?

The bit_not class is described in 23.14.9.4, immediately after
23.14.9.3, which describes the bit_xor class.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 29 06:00PM +0200

>> for( int i = 0; i < g_LEN; ++i ) { if( i%2 ) { data[i] = ~data[i]; }

> Assuming we have a 2s-complement this would be faster:

Or maybe not because the branch-prediction is able
to predict the regular pattern of your branches.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 29 05:56PM +0200

> for( int i = 0; i < g_LEN; ++i ) { if( i%2 ) { data[i] = ~data[i]; }

Assuming we have a 2s-complement this would be faster:

void invertSecond( uint8_t *p, size_t n )
{
for( size_t i = 0; i != n; ++i )
p[i] ^= -(int8_t)(i & 1);
}

Bonita Montero <Bonita.Montero@gmail.com>: Mar 29 06:26PM +0200

> Or maybe not because the branch-prediction is able
> to predict the regular pattern of your branches.

I tested it; my variant is faster although the loop is
very larger:

#include <iostream>
#include <chrono>
#include <cstdint>
#include <vector>
#include <algorithm>

using namespace std;
using namespace chrono;

void invertSecondA( uint8_t *p, size_t n )
{
for( size_t i = 0; i != n; ++i )
p[i] ^= -(int8_t)(i & 1);
}

void invertSecondB( uint8_t *p, size_t n )
{
for( size_t i = 0; i != n; ++i )
if( i % 2 )
p[i] = ~p[i];
}

int main()
{
size_t const SIZE = 1024, // fits in L1
ROUNDS = 1'000'000;
vector<uint8_t> v( SIZE, 0 );
time_point<high_resolution_clock> start = high_resolution_clock::now();
for( size_t round = ROUNDS; round; --round )
invertSecondA( &v[0], SIZE );
double sA = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0E9;
start = high_resolution_clock::now();
for( size_t round = ROUNDS; round; --round )
invertSecondB( &v[0], SIZE );
double sB = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0E9;
cout << sA << endl << sB << endl;
}

This are the MSVC 2019 times:
0.533812
1.1952
And this are the g++ 6.3.0 times:
0.190201
0.401627

Christian Gollwitzer <auriocus@gmx.de>: Mar 29 06:39PM +0200

Am 29.03.20 um 18:26 schrieb Bonita Montero:
> if( i % 2 )
> p[i] = ~p[i];
> }

How about:
for( size_t i = 0; i < n; i+=2 )
p[i] = ~p[i];
}

?

Christian

Bonita Montero <Bonita.Montero@gmail.com>: Mar 29 06:49PM +0200

>       for( size_t i = 0; i < n; i+=2 )
>               p[i] = ~p[i];
>       }

That's too simple. ;-)

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 11:10AM +0200

This is a blocked version that adapts to 32- and 64-bitness:

void invertSecondBlocked( uint8_t *p, size_t n )
{
if( !n-- )
return;
++p;
size_t head = (((intptr_t)p + 7) & (intptr_t)-8) - (intptr_t)p;
head = head <= n ? head : n;
for( uint8_t *w = p, *end = p + head; w < end; w += 2 )
*w = ~*w;
if( n == head )
return;
size_t odd = (size_t)p & 1; // assume size_t or ptrdiff_t is our
register width
n -= head;
p += head;
if constexpr( sizeof(size_t) == 8 )
{
if( n / 8 )
{
union
{
uint8_t u8Mask[8] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 8;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 8;
}
}
else if constexpr( sizeof(size_t) == 4 )
if( n / 4 )
{
union
{
uint8_t u8Mask[4] = { 0xFF, 0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 4;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 4;
}
if( n <= odd )
return;
p += odd;
n -= odd;
uint8_t *end = p + n;
do
*p = ~*p;
while( (p += 2) < end );
}

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 30 09:39AM

> *p = ~*p;
> while( (p += 2) < end );
> }
What does this do?

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 11:55AM +0200

Am 30.03.2020 um 11:39 schrieb Melzzzzz:
>> while( (p += 2) < end );
>> }
> What does this do?

It inverts every second byte. First the head until the first aligned
uint64_t-block, then the uint64_t-blocks and then the tail. It's more
than 10 times faster than the fastest algorithm so far.

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 30 09:59AM

> It inverts every second byte. First the head until the first aligned
> uint64_t-block, then the uint64_t-blocks and then the tail. It's more
> than 10 times faster than the fastest algorithm so far.

Have you measured? It looks pretty complicated for task at hand?
Also I really doubt it is faster then SSE2 solution...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 12:03PM +0200

> Have you measured? It looks pretty complicated for task at hand?

There's no way to make it simpler.

> Also I really doubt it is faster then SSE2 solution...

A SSE-solution would have a bit more complexity and it wouldn't
be portable.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 12:37PM +0200

Here's a SSE-enabled version:

void invertSecondBlocked( uint8_t *p, size_t n )
{
size_t const BLOCKSIZE = SSE_BLOCKS ? 16 : sizeof(size_t);
if( !n-- )
return;
++p;
size_t head = (((intptr_t)p + (BLOCKSIZE - 1)) &
-(intptr_t)BLOCKSIZE) - (intptr_t)p;
head = head <= n ? head : n;
for( uint8_t *w = p, *end = p + head; w < end; w += 2 )
*w = ~*w;
if( n == head )
return;
size_t odd = (size_t)p & 1; // assume size_t or ptrdiff_t is our
register width
n -= head;
p += head;
if constexpr( SSE_BLOCKS )
{
if( n / 16 )
{
union
{
uint8_t u8Mask[16] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00,
0xFF, 0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00 };
__m128 mask;
};
static const union
{
uint8_t u8Invert[16] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF };
__m128 invert;
};
__m128 *w = (__m128 *)p,
*end = w + n / 16;
if( odd )
mask = _mm_xor_ps( mask, invert );
do
*w = _mm_xor_ps( mask, *w );
while( ++w != end );
p = (uint8_t *)w;
n = n % 16;
}
}
else if constexpr( sizeof(size_t) == 8 )
{
if( n / 8 )
{
union
{
uint8_t u8Mask[8] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 8;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 8;
}
}
else if constexpr( sizeof(size_t) == 4 )
{
if( n / 4 )
{
union
{
uint8_t u8Mask[4] = { 0xFF, 0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 4;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 4;
}
}
if( n <= odd )
return;
p += odd;
n -= odd;
uint8_t *end = p + n;
do
*p = ~*p;
while( (p += 2) < end );
}

It's almost twice as fast on my CPU.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 01:57PM +0200

This is SSE as well as AVX-optimized.
Unfortunately this runs only slightly faster on my old Ryzen 1800X
because the first two generations of Ryzen-CPUs split an 256 bit
AVX-operation into two 128 bit operations.

void invertSecondBlocked( uint8_t *p, size_t n )
{
size_t const BLOCKSIZE = AVX_BLOCKS ? 32 : SSE_BLOCKS ? 16 :
sizeof(size_t);
if( !n-- )
return;
++p;
size_t head = (((intptr_t)p + BLOCKSIZE - 1) &
-(intptr_t)BLOCKSIZE) - (intptr_t)p;
head = head <= n ? head : n;
for( uint8_t *w = p, *end = p + head; w < end; w += 2 )
*w = ~*w;
if( n == head )
return;
size_t odd = (size_t)p & 1; // assume size_t or ptrdiff_t is our
register width
n -= head;
p += head;
#if AVX_BLOCKS != 0
if( n / 32 )
{
union
{
uint8_t u8Mask[32] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00,
0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00,
0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00,
0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00 };
__m256 mask;
};
static const union
{
uint8_t u8Invert[32] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF };
__m256 invert;
};
__m256 *w = (__m256 *)p,
*end = w + n / 32;
if( odd )
mask = _mm256_xor_ps( mask, invert );
do
*w = _mm256_xor_ps( *w, mask );
while( ++w != end );
p = (uint8_t *)w;
n = n % 32;
}
#elif SSE_BLOCKS != 0
if( n / 16 )
{
union
{
uint8_t u8Mask[16] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00,
0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00,
0xFF, 0x00 };
__m128 mask;
};
static const union
{
uint8_t u8Invert[16] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF };
__m128 invert;
};
__m128 *w = (__m128 *)p,
*end = w + n / 16;
if( odd )
mask = _mm_xor_ps( mask, invert );
do
*w = _mm_xor_ps( *w, mask );
while( ++w != end );
p = (uint8_t *)w;
n = n % 16;
}
#else
if constexpr( sizeof(size_t) == 8 )
{
if( n / 8 )
{
union
{
uint8_t u8Mask[8] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 8;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 8;
}
}
else if constexpr( sizeof(size_t) == 4 )
{
if( n / 4 )
{
union
{
uint8_t u8Mask[4] = { 0xFF, 0x00, 0xFF, 0x00 };
size_t mask;
};
size_t *w = (size_t *)p,
*end = w + n / 4;
mask ^= -(ptrdiff_t)odd;
do
*w ^= mask;
while( ++w != end );
p = (uint8_t *)w;
n = n % 4;
}
}

Digest for comp.programming.threads@googlegroups.com - 4 updates in 2 topics

comp.programming.threads@googlegroups.com

Google Groups

Topic digest
View all topics

About Windows and processor groups.. - 3 Updates
Fight the coronavirus 100% and save LOTS of CASH -- Combattez le coronavirus 100% et économisez BEAUCOUP d'ARGENT - 1 Update

About Windows and processor groups..

aminer68@gmail.com: Mar 29 02:19PM -0700

Hello,

About Windows and processor groups..

Microsoft processor groups enable developers of multi-threaded applications to transcend the previous 64-thread restrictions.

For any system with more than 64 logical threads, Windows will evenly divide the threads into processor groups such that no group has more than 64 threads. On a dual-socket system with two 28-core CPUs and 112 total threads, for example, Windows will create two processor groups, each with 56 threads. On a single socket system with 64 cores and 128 threads, two processor groups will be created, each with 64 threads.

Windows defines the data structure for processor groups as a processor number, and within that structure is a data value called a group, and a group is a word data type, which is defined as a 16-bit unsigned integer. This means that one could have a maximum of 65,536 processor groups containing 64 threads each. So Microsoft Windows supports up to to 4,194,304 logical processors!

This is why i have implemented processor groups in many of my software projects, please look at my following website to notice it:

https://sites.google.com/site/scalable68/

Even my Parallel Conjugate Gradient Linear System Solver Library that scales very well supports processor groups, here is my C++ version for Windows and Linux:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

And here is my Delphi and Freepascal versions for Windows and Linux:

Parallel implementation of Conjugate Gradient Dense Linear System solver library that is NUMA-aware and cache-aware that scales very well

https://sites.google.com/site/scalable68/scalable-parallel-implementation-of-conjugate-gradient-dense-linear-system-solver-library-that-is-numa-aware-and-cache-aware

PARALLEL IMPLEMENTATION OF CONJUGATE GRADIENT SPARSE LINEAR SYSTEM SOLVER LIBRARY THAT SCALES VERY WELL

https://sites.google.com/site/scalable68/scalable-parallel-implementation-of-conjugate-gradient-sparse-linear-system-solver

And here is also why i have implemented my Parallel Conjugate Gradient Linear System Solver Library that scales very well:

The finite element method finds its place in games

Read more here:

https://translate.google.com/translate?hl=en&sl=auto&tl=en&u=https%3A%2F%2Fhpc.developpez.com%2Factu%2F288260%2FLa-methode-des-elements-finis-trouve-sa-place-dans-les-jeux-AMD-propose-la-bibliotheque-FEMFX-pour-une-simulation-en-temps-reel-des-deformations%2F

But you have to be aware that finite element method uses Conjugate Gradient Method for Solution of Finite Element Problems, read here to notice it:

Conjugate Gradient Method for Solution of Large Finite Element Problems on CPU and GPU

https://pdfs.semanticscholar.org/1f4c/f080ee622aa02623b35eda947fbc169b199d.pdf

Thank you,
Amine Moulay Ramdane.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 30 11:20AM +0200

> This is why i have implemented processor groups in many of my software projects, please look at my following website to notice it:
> https://sites.google.com/site/scalable68/

I won't trust software not being tested on a system it is designed for.

Melzzzzz <Melzzzzz@zzzzz.com>: Mar 30 09:37AM

>> This is why i have implemented processor groups in many of my software projects, please look at my following website to notice it:
>> https://sites.google.com/site/scalable68/

> I won't trust software not being tested on a system it is designed for.
Just look at the code. His C++ is rubbish as I can see...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Fight the coronavirus 100% and save LOTS of CASH -- Combattez le coronavirus 100% et économisez BEAUCOUP d'ARGENT

xyz91987@gmail.com: Mar 28 07:51PM

Protect yourself and your loved ones !
KILL the coronavirus right now !
And save LOTS OF CASH
New tested, scientificly proven and amazing antivirus against coronavirus using Chloroquine and Colchicine
at very low price (33% discount)
Satisfaction garanteed or your money back !

http://als0p.atwebpages.com/coronavirus/coronavirus-en.php

Protégez-vous et vos proches!
TUEZ le coronavirus dès maintenant!
Et économisez BEAUCOUP D'ARGENT
Nouvel antivirus testé, scientifiquement prouvé et étonnant contre le coronavirus utilisant la Chloroquine et la Colchicine
à très bas prix (33% de réduction)
Satisfaction garantie ou agrent remie !

http://als0p.atwebpages.com/coronavirus/coronavirus-fr.php

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

Sunday, March 29, 2020

Digest for comp.lang.c++@googlegroups.com - 16 updates in 4 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

Algorithms for integer ranges? - 4 Updates
Invert every 2nd byte in a container of raw data - 9 Updates
Some kind of unwanted optimization going on here ??? And if so, how do I prevent it ??? - 2 Updates
Fight the coronavirus 100% and save LOTS of CASH -- Combattez le coronavirus 100% et économisez BEAUCOUP d'ARGENT - 1 Update

Algorithms for integer ranges?

Jorgen Grahn <grahn+nntp@snipabacken.se>: Mar 28 08:51PM

On Sat, 2020-03-28, red floyd wrote:
> integers. For example, for the range [1, N), where N can be arbitrarily
> large. The problem to me appears to be that library algorithms only
> work on iterators.

Seems to me the real problem is you don't have iterators over [1, N).

Isn't there such a thing in Boost or something? If not, I imagine it
would be easy to write something so that:

const Range<int> r{1, 4711};
std::for_each(r.begin(), r.end(), f);

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Mar 29 09:28PM +0100

On Sat, 28 Mar 2020 11:04:00 -0700
> [END CODE]

> I don't want to duplicate something already written, is there
> something similar in the Standard Library?

I don't think there is anything in the standard library but I am not
update with everything that C++17/20 has been up to.

However, rather than provide a lazy version of for_each, you would be
better off using an lazy iterator to which you can apply std::for_each,
std::transform and so on. Something like:

class IntIter {
public:
typedef int value_type;
typedef int reference; // read only
typedef void pointer; // read only
typedef int difference_type;
typedef std::random_access_iterator_tag iterator_category;
private:
int val;
public:
explicit IntIter(value_type val_ = 0) noexcept : val(val_) {}
IntIter(const IntIter&) noexcept = default;
IntIter& operator=(const IntIter&) noexcept = default;
IntIter& operator++() noexcept {++val; return *this;}
IntIter operator++(int) noexcept {IntIter tmp = *this; ++val; return tmp;}
IntIter& operator--() noexcept {--val; return *this;}
IntIter operator--(int) noexcept {IntIter tmp = *this; --val; return tmp;}
IntIter& operator+=(difference_type n) noexcept {val += n; return *this;}
IntIter& operator-=(difference_type n) noexcept {val -= n; return *this;}
reference operator[](difference_type n) const noexcept {return val + n;}
reference operator*() const noexcept {return val;}
};

Take your pick with standard comparison operators:

inline bool operator==(IntIter iter1, IntIter iter2) noexcept {
return *iter1 == *iter2;
}
inline bool operator!=(IntIter iter1, IntIter iter2) noexcept {
return !(iter1 == iter2);
}
inline bool operator<(IntIter iter1, IntIter iter2) noexcept {
return *iter1 < *iter2;
}
inline bool operator>(IntIter iter1, IntIter iter2) noexcept {
return iter2 < iter1;
}
inline bool operator<=(IntIter iter1, IntIter iter2) noexcept {
return !(iter1 > iter2);
}
inline bool operator>=(IntIter iter1, IntIter iter2) noexcept {
return !(iter1 < iter2);
}
inline IntIter::difference_type operator-(IntIter iter1, IntIter iter2) noexcept {
return *iter1 - *iter2;
}
inline IntIter operator+(IntIter iter, IntIter::difference_type n) noexcept {
return IntIter{*iter + n};
}
inline IntIter operator-(IntIter iter, IntIter::difference_type n) noexcept {
return IntIter{*iter - n};
}
inline IntIter operator+(IntIter::difference_type n, IntIter iter) noexcept {
return iter + n;
}

jacobnavia <jacob@jacob.remcomp.fr>: Mar 29 11:18PM +0200

Le 28/03/2020 à 19:04, red floyd a écrit :

> I don't want to duplicate something already written, is there
> something similar in the Standard Library?

> -- red floyd

In C you woduld write:

1 typedef long long IntType;
2
3 IntType doRange(IntType start,IntType end,
4 IntType increment, IntType(*func)(IntType))
5 {
6 IntType i;
7 for (i=start; i<end; i+= increment)
8 func(i);
9 return i;
10 }

I hope I understood correctly...

danielaparker@gmail.com: Mar 29 02:52PM -0700

On Sunday, March 29, 2020 at 5:18:29 PM UTC-4, jacobnavia wrote:
> 9 return i;
> 10 }

> I hope I understood correctly...

That's my understanding. Also, note that

for (i=start; i<end; i+= increment)
func(i);

is more robust than the iterator alternative, for example

integer_iterator<int> first(1,10); // starting from 1, increment
// in steps of 10
integer_iterator<int> last(100, 10);

std::for_each(first, last, func);

would result in an infinite loop.

Daniel

Invert every 2nd byte in a container of raw data

Frederick Gotham <cauldwell.thomas@gmail.com>: Mar 29 08:40AM -0700

size_t constexpr g_LEN = 16u;

array<uint8_t, g_LEN> data;

auto const my_range =
v
| boost::adaptors::sliced(1,15)
| boost::adaptors::strided(2);

boost::range::transform(my_range,
my_range.begin(),
std::bind2nd(std::bit_xor<uint8_t>(),0xFF)
);

It's annoying that there isn't a form of "boost::range::transform" that only takes two arguments.

Can anyone think of a cleaner way of doing this that will work with any kind of container of objects of type "uint8_t"? I use "bind2nd" with "bit_xor" and "0xFF" in this snippet because I can't find the unary "bit_not" class. Is there such a thing?

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Mar 29 05:48PM +0200

On 29.03.2020 17:40, Frederick Gotham wrote:
> any kind of container of objects of type "uint8_t"? I use "bind2nd"
> with "bit_xor" and "0xFF" in this snippet because I can't find the
> unary "bit_not" class. Is there such a thing? >

Well, assuming that your code is always dealing with 8-bit bytes, try

for( int i = 0; i < g_LEN; ++i ) { if( i%2 ) { data[i] = ~data[i]; }

Oh look, it's a one-liner. ;-)

- Alf

Bonita Montero <Bonita.Montero@gmail.com>: Mar 29 06:49PM +0200

>       for( size_t i = 0; i < n; i+=2 )
>               p[i] = ~p[i];
>       }

That's too simple. ;-)

Christian Gollwitzer <auriocus@gmx.de>: Mar 29 08:27PM +0200

Am 29.03.20 um 18:49 schrieb Bonita Montero:
>> p[i] = ~p[i];
>> }

> That's too simple. ;-)

Indeed, that version is faster than what Alf posted, but not nearly as
fast as yours. Here is the output (I'm on macOS)

Apfelkiste:Tests chris$ clang++ -O2 --std=c++17 invert.cpp -march=native
Apfelkiste:Tests chris$ ./a.out
0.193238
1.62945
1.03101
Apfelkiste:Tests chris$ clang++ -v
Apple LLVM version 10.0.0 (clang-1000.11.45.5)

Looking at the assembly output, your version is vectorized by the
compiler - and that is the secret behind the 5x speed improvement
(compared to my version). Because the whole thing is memory bound, and
by vectorizing the number of memory accesses can be significantly reduced.

Christian

#include <iostream>
#include <chrono>
#include <cstdint>
#include <vector>
#include <algorithm>

using namespace std;
using namespace chrono;

void invertSecondA( uint8_t *p, size_t n )
{
for( size_t i = 0; i != n; ++i )
p[i] ^= -(int8_t)(i & 1);
}

void invertSecondB( uint8_t *p, size_t n )
{
for( size_t i = 0; i != n; ++i )
if( i % 2 )
p[i] = ~p[i];
}

void invertSecondC( uint8_t *p, size_t n )
{
for( size_t i = 0; i < n; i+=2 )
p[i] = ~p[i];
}

int main()
{
size_t const SIZE = 1024, // fits in L1
ROUNDS = 5'000'000;
vector<uint8_t> v( SIZE, 0 );
time_point<high_resolution_clock> start = high_resolution_clock::now();
for( size_t round = ROUNDS; round; --round )
invertSecondA( &v[0], SIZE );
double sA = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0E9;
start = high_resolution_clock::now();
for( size_t round = ROUNDS; round; --round )
invertSecondB( &v[0], SIZE );
double sB = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0E9;
start = high_resolution_clock::now();
for( size_t round = ROUNDS; round; --round )
invertSecondC( &v[0], SIZE );
double sC = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0E9;
cout << sA << endl << sB << endl << sC << endl;
}

Some kind of unwanted optimization going on here ??? And if so, how do I prevent it ???

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 28 06:25PM -0400

On 3/28/20 5:12 PM, Öö Tiib wrote:
>> regardless of which value is passed to the function.

> Is it removed that when conditional expression evaluates into throw
> expression then it is not constant expression?
In n3337.pdf dated 2012-01-16, section 5.19p2 includes "an invocation of
a constexpr function with arguments that, when substituted by function
invocation substitution (7.1.5), do not produce a constant expression;"
as one of the things that fails to qualify as a constant expression.

n3797.pdf, dated 2013-10-13, drops that item from the list.

So it would appear that the change you're complaining about took place
about 7-8 years ago.

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Mar 28 04:11PM -0700

> n3797.pdf, dated 2013-10-13, drops that item from the list.

> So it would appear that the change you're complaining about took place
> about 7-8 years ago.

Drafts of the C++ standard are available in this git repo:

https://github.com/cplusplus/draft

The commit that removed that wording is:

commit 7cb8947bec0eaf7788e75e3b36e23f4714e157a8
Author: Richard Smith <richard@metafoo.co.uk>
Date: 2012-05-05 19:59:36 -0700

N3652 Relaxing constraints on constexpr functions

Conflicts resolved as indicated by N3652.

N3652 is here: https://isocpp.org/files/papers/N3652.html

Relaxing constraints on constexpr functions

constexpr member functions and implicit const

This paper describes the subset of N3597 selected for inclusion in
C++14, relaxing a number of restrictions on constexpr
functions. These changes all received overwhelmingly strong or
unopposed support under review of the Evolution Working Group. It
also incorporates Option 2 of N3598.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Fight the coronavirus 100% and save LOTS of CASH -- Combattez le coronavirus 100% et économisez BEAUCOUP d'ARGENT

xyz91987@gmail.com: Mar 28 07:47PM

Protect yourself and your loved ones !
KILL the coronavirus right now !
And save LOTS OF CASH
New tested, scientificly proven and amazing antivirus against coronavirus using Chloroquine and Colchicine
at very low price (33% discount)
Satisfaction garanteed or your money back !

http://als0p.atwebpages.com/coronavirus/coronavirus-en.php

Protégez-vous et vos proches!
TUEZ le coronavirus dès maintenant!
Et économisez BEAUCOUP D'ARGENT
Nouvel antivirus testé, scientifiquement prouvé et étonnant contre le coronavirus utilisant la Chloroquine et la Colchicine
à très bas prix (33% de réduction)
Satisfaction garantie ou agrent remie !

http://als0p.atwebpages.com/coronavirus/coronavirus-fr.php

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

Algorithms for integer ranges? - 10 Updates
Some kind of unwanted optimization going on here ??? And if so, how do I prevent it ??? - 15 Updates

Algorithms for integer ranges?

red floyd <no.spam@its.invalid>: Mar 28 07:01PM -0700

On 3/28/20 1:51 PM, Jorgen Grahn wrote:
>> large. The problem to me appears to be that library algorithms only
>> work on iterators.

> Seems to me the real problem is you don't have iterators over [1, N).

TBH, I'm surprised the Committee hasn't addressed this.

Ned Latham <nedlatham@woden.valhalla.oz>: Mar 28 09:14PM -0500

red floyd wrote:
> > > work on iterators.

> > Seems to me the real problem is you don't have iterators over [1, N).

> TBH, I'm surprised the Committee hasn't addressed this.

Can you construct a dummy object for the iterator's sake?

danielaparker@gmail.com: Mar 28 08:44PM -0700

On Saturday, March 28, 2020 at 2:04:11 PM UTC-4, red floyd wrote:
> I was looking for a way to use standard algorithms with a range of
> integers.

Option 1:

#include <iterator>
#include <algorithm>

template <class Iterator,class Enable = void>
class integer_iterator
{
};

template <class T>
class integer_iterator<T,typename std::enable_if<std::is_integral<T>::value
>::type>
{
T value_;
T step_;
public:
using iterator_category = std::random_access_iterator_tag;

using value_type = T;
using difference_type = std::ptrdiff_t;
using pointer = typename std::conditional<std::is_const<T>::value,
value_type*,
const value_type*>::type;
using reference = typename std::conditional<std::is_const<T>::value,
value_type&,
const value_type&>::type;

public:
explicit integer_iterator(T n = 0, T step = 1) : value_(n), step_(step)
{
}

integer_iterator(const integer_iterator&) = default;
integer_iterator(integer_iterator&&) = default;
integer_iterator& operator=(const integer_iterator&) = default;
integer_iterator& operator=(integer_iterator&&) = default;

template <class U,
class=typename std::enable_if<!std::is_same<U,T>::value &&
std::is_convertible<U,T>::value>::type>
integer_iterator(const integer_iterator<U>& other)
: value_(other.value_)
{
}

reference operator*() const
{
return value_;
}

T operator->() const
{
return &value_;
}

integer_iterator& operator++()
{
value_ += step_;
return *this;
}

integer_iterator operator++(int)
{
integer_iterator temp = *this;
++*this;
return temp;
}

integer_iterator& operator--()
{
value_ -= step_;
return *this;
}

integer_iterator operator--(int)
{
integer_iterator temp = *this;
--*this;
return temp;
}

integer_iterator& operator+=(const difference_type offset)
{
value_ += offset;
return *this;
}

integer_iterator operator+(const difference_type offset) const
{
integer_iterator temp = *this;
return temp += offset;
}

integer_iterator& operator-=(const difference_type offset)
{
return *this += -offset;
}

integer_iterator operator-(const difference_type offset) const
{
integer_iterator temp = *this;
return temp -= offset;
}

difference_type operator-(const integer_iterator& rhs) const
{
return value_ - rhs.value_;
}

reference operator[](const difference_type offset) const
{
return *(*this + offset);
}

bool operator==(const integer_iterator& rhs) const
{
return value_ == rhs.value_;
}

bool operator!=(const integer_iterator& rhs) const
{
return !(*this == rhs);
}

bool operator<(const integer_iterator& rhs) const
{
return value_ < rhs.value_;
}

bool operator>(const integer_iterator& rhs) const
{
return rhs < *this;
}

bool operator<=(const integer_iterator& rhs) const
{
return !(rhs < *this);
}

bool operator>=(const integer_iterator& rhs) const
{
return !(*this < rhs);
}

inline
friend integer_iterator<T> operator+(
difference_type offset, integer_iterator<T> next)
{
return next += offset;
}
};

int main(int argc, char** argv)
{
integer_iterator<int> first(1,10);
integer_iterator<int> last(100000001, 10);

int n = 0;
auto f = [&n](int i) {n += i;};
std::for_each(first, last, f);
}

Option 2

int main(int argc, char** argv)
{
int n = 0;
auto f = [&n](int i) {n += i;};
for (int i = 1; i < 100000001; i += 10)
{
f(i);
}
}

Decisions, decisions ...

Daniel

red floyd <no.spam@its.invalid>: Mar 28 11:04AM -0700

I was looking for a way to use standard algorithms with a range of
integers. For example, for the range [1, N), where N can be arbitrarily
large. The problem to me appears to be that library algorithms only
work on iterators. I didn't want to just create a collection containing
the integers, because I was playing with some math which might use large
values (on the order of over a million), and I didn't want to waste the
space.

So for my purposes, I came up with the following.

[BEGIN CODE]
namespace my {
template<typename Integral, typename UnaryFunction>
Integral for_each(Integral first, Integral last, UnaryFunction func)
{
while (first < last)
{
func(first);
++first;
}
return first;
}
template<typename Integral, typename UnaryFunction>
Integral for_each(Integral first, Integral last,
Integral interval, UnaryFunction func)
{
while (first < last)
{
func(first);
first += interval;
}
return first;
}
}
[END CODE]

I don't want to duplicate something already written, is there
something similar in the Standard Library?

-- red floyd

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Mar 29 02:46PM +0200

On 28.03.2020 19:04, red floyd wrote:
> [END CODE]

> I don't want to duplicate something already written, is there
> something similar in the Standard Library?

In C++20 there will be std::something::idiota, or "iota" as the Greeks
tend to shorten it, used like

for( const int i: idiota( n ) ) do { //... use `i` here

With the
under-construction-library-that-I-really-need-to-finish-sometime you can
write similar but much more clear code like

for( const int i: zero_to( n ) ) { //... Use `i` here.

<url:
https://github.com/alf-p-steinbach/cppx-core-language/blob/master/source/cppx-core-language/syntax/collection-util/Sequence_.hpp>.

This is not an `idiota` copy; the library and concept, though not this
particular implementation, predates `idiota` by many years.

There are good reasons for all the shenanigans in that code, so it
illustrates what you're dealing with for defining a general integers
iterator. There may still be bugs. It's not extensively tested.

- Alf

Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 08:30PM +0100

You could use an iterator which wraps a normal iterator and includes
a gap-value. Something like this (don't know if everything is correct):

#include <iterator>

template<typename RandIt>
struct gap_iterator
{
using value_type = typename std::iterator_traits<RandIt>::value_type;
gap_iterator() = default;
gap_iterator( RandIt it, std::size_t gap );
gap_iterator( gap_iterator const &other );
gap_iterator &operator =( gap_iterator const &rhs );
bool operator !=( gap_iterator const &other );
value_type &operator *();
value_type *operator ->();
gap_iterator &operator ++();
gap_iterator &operator +=( std::size_t offset );

private:
RandIt m_it;
size_t m_gap;
};

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( RandIt it, std::size_t gap )
{
m_it = it;
m_gap = gap;
}

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( gap_iterator const &other )
{
m_it = other.m_it;
m_gap = other.m_gap;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::gap_iterator
&gap_iterator<RandIt>::operator =( gap_iterator const &rhs )
{
m_it = rhs.m_it;
m_gap = rhs.m_gap;
return *this;
}

template<typename RandIt>
inline
bool gap_iterator<RandIt>::operator !=( gap_iterator const &rhs )
{
return m_it != rhs.m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
&gap_iterator<RandIt>::operator *()
{
return *m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
*gap_iterator<RandIt>::operator ->()
{
return &*m_it;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator ++()
{
m_it += m_gap;
return *this;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator +=( std::size_t
offset )
{
m_it += offset * m_gap;
return *this;
}

#include <iostream>
#include <algorithm>
#include <cstdlib>

using namespace std;

int main()
{
int ai[16 * 16];
gap_iterator<int *> giiBegin( ai, 16 ),
giiEnd( giiBegin );
giiEnd += 16;
for_each( giiBegin, giiEnd, []( int &v ) { v = rand(); } );
for_each( giiBegin, giiEnd, []( int v ) { cout << v << " "; } );
}

Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 08:46PM +0100

Why does this changed version not work ?

#include <iterator>

template<typename RandIt>
struct gap_iterator
{
using value_type = typename std::iterator_traits<RandIt>::value_type;
gap_iterator() = default;
gap_iterator( RandIt it, std::size_t gap );
gap_iterator( gap_iterator const &other );
gap_iterator &operator =( gap_iterator const &rhs );
bool operator !=( gap_iterator const &other );
value_type &operator *();
value_type *operator ->();
gap_iterator &operator ++();
gap_iterator &operator +=( std::size_t offset );
friend gap_iterator operator +( gap_iterator const &gi, std::size_t
offset );

private:
RandIt m_it;
size_t m_gap;
};

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( RandIt it, std::size_t gap )
{
m_it = it;
m_gap = gap;
}

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( gap_iterator const &other )
{
m_it = other.m_it;
m_gap = other.m_gap;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::gap_iterator
&gap_iterator<RandIt>::operator =( gap_iterator const &rhs )
{
m_it = rhs.m_it;
m_gap = rhs.m_gap;
return *this;
}

template<typename RandIt>
inline
bool gap_iterator<RandIt>::operator !=( gap_iterator const &rhs )
{
return m_it != rhs.m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
&gap_iterator<RandIt>::operator *()
{
return *m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
*gap_iterator<RandIt>::operator ->()
{
return &*m_it;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator ++()
{
m_it += m_gap;
return *this;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator +=( std::size_t
offset )
{
m_it += offset * m_gap;
return *this;
}

template<typename RandIt>
inline
gap_iterator<RandIt> operator +( gap_iterator<RandIt> const &gi,
std::size_t offset )
{
gap_iterator<RandIt> ret( gi );
gi.m_it += offset * gi.m_gap;
return ret;
}

#include <iostream>
#include <algorithm>
#include <cstdlib>

using namespace std;

int main()
{
int ai[16 * 16];
gap_iterator<int *> giiBegin( ai, 16 ),
giiEnd( giiBegin + 16 );
//giiEnd += 16;
for_each( giiBegin, giiEnd, []( int &v ) { v = rand(); } );
for_each( giiBegin, giiEnd, []( int v ) { cout << v << " "; } );
}

Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 08:47PM +0100

> gi.m_it += offset * gi.m_gap;
ret.m_it += offset * ret.m_gap;

Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 08:49PM +0100

Am 28.03.2020 um 20:47 schrieb Bonita Montero:
>> gi.m_it += offset * gi.m_gap;
> ret.m_it += offset * ret.m_gap;
ret.m_it = ret.m_it + offset * ret.m_gap;
The contained iterator might not know +=.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 08:58PM +0100

Got it:

#include <iterator>

template<typename RandIt>
struct gap_iterator
{
using value_type = typename std::iterator_traits<RandIt>::value_type;
gap_iterator() = default;
gap_iterator( RandIt it, std::size_t gap );
gap_iterator( gap_iterator const &other );
gap_iterator &operator =( gap_iterator const &rhs );
bool operator !=( gap_iterator const &other );
value_type &operator *();
value_type *operator ->();
gap_iterator &operator ++();
gap_iterator &operator +=( std::size_t offset );
template<typename RandIt>
friend gap_iterator<RandIt> operator +( gap_iterator<RandIt> const
&gi, std::size_t offset );

private:
RandIt m_it;
size_t m_gap;
};

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( RandIt it, std::size_t gap )
{
m_it = it;
m_gap = gap;
}

template<typename RandIt>
inline
gap_iterator<RandIt>::gap_iterator( gap_iterator const &other )
{
m_it = other.m_it;
m_gap = other.m_gap;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::gap_iterator
&gap_iterator<RandIt>::operator =( gap_iterator const &rhs )
{
m_it = rhs.m_it;
m_gap = rhs.m_gap;
return *this;
}

template<typename RandIt>
inline
bool gap_iterator<RandIt>::operator !=( gap_iterator const &rhs )
{
return m_it != rhs.m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
&gap_iterator<RandIt>::operator *()
{
return *m_it;
}

template<typename RandIt>
inline
typename gap_iterator<RandIt>::value_type
*gap_iterator<RandIt>::operator ->()
{
return &*m_it;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator ++()
{
m_it += m_gap;
return *this;
}

template<typename RandIt>
inline
gap_iterator<RandIt> &gap_iterator<RandIt>::operator +=( std::size_t
offset )
{
m_it += offset * m_gap;
return *this;
}

template<typename RandIt>
inline
gap_iterator<RandIt> operator +( gap_iterator<RandIt> const &gi,
std::size_t offset )
{
gap_iterator<RandIt> ret( gi );
ret.m_it = ret.m_it + offset * ret.m_gap;
return ret;
}

#include <iostream>
#include <algorithm>
#include <cstdlib>

using namespace std;

int main()
{
int ai[16 * 16];
gap_iterator<int *> giiBegin( ai, 16 ),
giiEnd( giiBegin + 16 );
//giiEnd += 16;
for_each( giiBegin, giiEnd, []( int &v ) { v = rand(); } );
for_each( giiBegin, giiEnd, []( int v ) { cout << v << " "; } );
}

Some kind of unwanted optimization going on here ??? And if so, how do I prevent it ???

"Öö Tiib" <ootiib@hot.ee>: Mar 28 05:29PM -0700

On Sunday, 29 March 2020 00:25:27 UTC+2, James Kuyper wrote:

> n3797.pdf, dated 2013-10-13, drops that item from the list.

> So it would appear that the change you're complaining about took place
> about 7-8 years ago.

I am really not complaining about that since my constexpr functions
don't throw ... it is just another inconvenience. Are these now
ill-formed constant expressions instead of being non-constant
expressions?

I am complaining that constant expression is not detectable with code.
So I must keep track if something evaluates to constant expression or
not as monkey-coder; I can't make code that does it automatically.

The requirement that was useful that noexcept was required to return
true to calls that were constant expressions was removed.
Things that I do not see much use for in their current form:
std::is_constant_evaluated, constinit and consteval were added.

David Brown <david.brown@hesbynett.no>: Mar 29 10:24AM +0200

On 28/03/2020 20:42, Öö Tiib wrote:
> constant expression". So if foo(43) is not constant expression but
> foo(42) is then noexcept(foo(43)) should be false and noexcept(foo(42))
> true.

As I said - noexcept() has never been about detecting constant expressions.

I understand that you think it's odd that a constant expression can
return "false" for noexcept. But that is beside the point.

You claimed the change to noexcept() broke the tool for detecting
constant expressions. I was hoping for a justification for that, rather
than another side-track.

David Brown <david.brown@hesbynett.no>: Mar 29 10:31AM +0200

On 28/03/2020 21:16, Öö Tiib wrote:
>> compile-time programming.

> It does not work and so you are under some kind of illusion from
> worm-tongue-wording of standard.

I haven't read the standard here yet.

But I must admit I assumed that std::is_constant_evaluated() was
basically a standardisation of the useful gcc extension
__builtin_constant_p(...). Unfortunately, it is not.

"Öö Tiib" <ootiib@hot.ee>: Mar 29 01:47AM -0700

On Sunday, 29 March 2020 11:24:47 UTC+3, David Brown wrote:
> > foo(42) is then noexcept(foo(43)) should be false and noexcept(foo(42))
> > true.

> As I said - noexcept() has never been about detecting constant expressions.

But C++ has never been what you said but it is always been what standard
said. And C++14 said that noexcept operator should detect constant
expressions:

The result of the noexcept operator is false if in a
potentially-evaluated context the expression would contain

- a potentially-evaluated call83 to a function, member function,
function pointer, or member function pointer that does not have
a non-throwing exception-specification ([except.spec]),
unless the call is a constant expression ([expr.const]),

> I understand that you think it's odd that a constant expression can
> return "false" for noexcept. But that is beside the point.

You misrepresent. I do think it's odd that such change was made
and hastily implemented in all compilers with retroactive effect in
C++11 and C++14 modes too. Despite in C++11 and C++14 the noexcept
operator was required to detect constant expressions.

> You claimed the change to noexcept() broke the tool for detecting
> constant expressions. I was hoping for a justification for that, rather
> than another side-track.

I do not even understand what you expect. Some kind of philosophical
debate? Facts are there. I do not know justifications to those facts.

"Öö Tiib" <ootiib@hot.ee>: Mar 29 02:08AM -0700

On Sunday, 29 March 2020 11:31:29 UTC+3, David Brown wrote:

> But I must admit I assumed that std::is_constant_evaluated() was
> basically a standardisation of the useful gcc extension
> __builtin_constant_p(...). Unfortunately, it is not.

Also __builtin_constant_p does not work. Only noexcept did but stopped.
Linus Torvalds trying to figure fixes to __builtin_constant_p :
https://lkml.org/lkml/2018/3/17/184

David Brown <david.brown@hesbynett.no>: Mar 29 11:38AM +0200

On 29/03/2020 10:47, Öö Tiib wrote:

> But C++ has never been what you said but it is always been what standard
> said. And C++14 said that noexcept operator should detect constant
> expressions:

No!

C++14 and earlier said constant expressions were /part/ of the logic in
determining the result of the noexcept operator. That does not mean, as
you have been claiming, that noexcept is a tool for determining if
something is a constant expression.

> and hastily implemented in all compilers with retroactive effect in
> C++11 and C++14 modes too. Despite in C++11 and C++14 the noexcept
> operator was required to detect constant expressions.

I guess the prevailing opinion (of those who know about these things,
unlike me) is that the earlier behaviour was a mistake.

>> than another side-track.

> I do not even understand what you expect. Some kind of philosophical
> debate? Facts are there. I do not know justifications to those facts.

Neither the facts, the standards (earlier or later versions), nor what
you have now written support the idea that noexcept() was a way to
identify constant expressions, or that the changes broke such methods,
or that the changes were made maliciously, callously, or without careful
consideration.

David Brown <david.brown@hesbynett.no>: Mar 29 11:48AM +0200

On 29/03/2020 11:08, Öö Tiib wrote:

> Also __builtin_constant_p does not work. Only noexcept did but stopped.
> Linus Torvalds trying to figure fixes to __builtin_constant_p :
> https://lkml.org/lkml/2018/3/17/184

What do you mean, "__builtin_constant_p does not work" ? It works
perfectly well, and I have used it - as have many people. It might not
do what you think in all cases, such as the one Torvalds is describing -
but see the follow-ups for an explanation of that.

Perhaps what you mean is "It does not work for my particular case, while
an abuse of noexcept() happened to work. But I won't tell anyone the
details."

"Öö Tiib" <ootiib@hot.ee>: Mar 29 03:27AM -0700

On Sunday, 29 March 2020 12:39:01 UTC+3, David Brown wrote:
> determining the result of the noexcept operator. That does not mean, as
> you have been claiming, that noexcept is a tool for determining if
> something is a constant expression.

It was the only thing in standard that had requirement of detecting
if something is constant expression, makeing nothing ill-formed doing
so and returning constant expression itself. Nothing else (including
non-standard __builtin_constant_p) had those properties.
So it was useful as such tool but now it isn't.

> > operator was required to detect constant expressions.

> I guess the prevailing opinion (of those who know about these things,
> unlike me) is that the earlier behaviour was a mistake.

It is void argument to me. I have just my work to do and I
want to do it well. You post zero code how I should do it now,
just popular votes and alleged opinions of your "authorities".

> identify constant expressions, or that the changes broke such methods,
> or that the changes were made maliciously, callously, or without careful
> consideration.

I would probably be "fixed" too by now if I had direct evidence about
malice. Forces that can enforce retroactive "fixes" in all compilers
have enough resources to "fix" people too. But mine is just opinion
based on facts there are.

"Öö Tiib" <ootiib@hot.ee>: Mar 29 03:33AM -0700

On Sunday, 29 March 2020 12:48:51 UTC+3, David Brown wrote:

> Perhaps what you mean is "It does not work for my particular case, while
> an abuse of noexcept() happened to work. But I won't tell anyone the
> details."

You unlike me are posting zero code, quotes and/or cites. So I feel your
posture that I am withholding details insulting.

David Brown <david.brown@hesbynett.no>: Mar 29 01:56PM +0200

On 29/03/2020 12:33, Öö Tiib wrote:
>> details."

> You unlike me are posting zero code, quotes and/or cites. So I feel your
> posture that I am withholding details insulting.

I am not posting code, because I don't have a problem with any code.
This whole discussion started with /your/ claim that C++20 breaks
volatile and makes it impossible to write low-level code that uses it.
What would you like me to post - code that has a volatile variable and
writes to it?

#define output_register *((volatile uint32_t *) 0x1234)

void set_output(uint32_t x) {
output_register = x;
}

That works fine in every C and C++ standard.

I don't have any sample code with noexcept, because I disable exceptions
in my embedded C++ code. (Yes, I know that is a non-conforming
extension in my compiler.)

/You/ are the one with complaints about the standards and how changes
break your code. /You/ are the one that said C++20 changes volatile and
breaks your code - the onus is on /you/ to post code that shows this
effect. /You/ are the one that said C++17 changes noexcept and breaks
your code (though I still haven't a clue how that relates to your
complain about "volatile").

I can't give any code snippets or references to problems with volatile
or noexcept, because I don't have any problems. I can't give any code
snippets or references regarding /your/ problems with volatile and
noexcept, because you won't say what they are or show examples. You
have given some references, but no indication as to why they are relevant.

If you want to warn people about standards changes that break real,
useful code, then post some examples so that we can learn.

If you want to get help making your code robust in the face of changes,
post some samples so that we can help. (I am not guaranteeing help, and
I am sure others would be more helpful than I could be - but at least
people can try.)

If you want to complain about the changes causing you trouble, post some
examples so that we can see the problem and sympathise, and possibly
make useful guesses as to the reason behind the changes. We all know
the C++ committee and compiler writers are mere humans and make mistakes.

If you want to convince people that you have misunderstood the changes,
or that you have been writing weird code that merely happens to work by
luck, then continue posting red-herrings, mixing up topics, claiming
malice and incompetence on the part of the committees and compiler
writers, and failing to answer direct questions or show samples. If
that's the way you want to play it, I am done here - I can't see any way
to make progress.

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 28 01:22PM -0400

On 3/28/20 11:44 AM, David Brown wrote:
> On 27/03/2020 19:52, Öö Tiib wrote:
...

> (This is based on my understanding and reasoning, rather than the
> standards - frankly, I haven't managed to get my head around the
> standards here as yet.)

"The predicate indicating whether a function cannot exit via an
exception is called the exception specification of the function. If the
predicate is false, the function has a potentially-throwing exception
specification, otherwise it has a non-throwing exception specification.
The exception specification is either defined implicitly, or defined
explicitly by using a noexcept-specifier as a suffix of a function
declarator (11.3.5).
noexcept-specifier:
noexcept ( constant-expression )
noexcept
throw ( )"
(18.4p1).

Note that the phrases "exception specification", "potentially-throwing
exception specification" and "non-throwing exception specification" are
all italicized, an ISO convention indicating that this clause
constitutes the official definition of those terms.

Therefore, the value of a noexcept() expression is determined entirely
by the exception specification of the function. The body of the function
plays no role in determining that value, not even if it's declared
inline in the same translation unit.
...
> consider it imaginary. (I believe that this is due to your
> misunderstandings about what the changes to volatile in C++20 mean, not
> because I think you are being deliberately awkward.)

That's why we both keep asking him to provide examples - so we can say
to him "such code will not be deprecated in C++2020".

"Öö Tiib" <ootiib@hot.ee>: Mar 29 05:30AM -0700

On Sunday, 29 March 2020 14:56:31 UTC+3, David Brown wrote:

> > You unlike me are posting zero code, quotes and/or cites. So I feel your
> > posture that I am withholding details insulting.

> I am not posting code, because I don't have a problem with any code.

So when I am posting code that stopped working for me then you reply
that it was weird trash anyway that merely worked by luck while it
was clearly mandated by C++14 standard to work. You do not post where
I supposedly misinterpreted standard nor advice how should I write
the code instead nor any explanations why it was retroactively
"fixed" in all compilers. Just insults. What "help" is that?

David Brown <david.brown@hesbynett.no>: Mar 28 08:35PM +0100

On 28/03/2020 18:42, Öö Tiib wrote:
> So fine, I am just stupid and silly when I complain that our tools were
> clearly and deliberately damaged. Damaged with unprecedented retroactive
> effect in C++11 and C++14 modes as well.

noexcept() has never been about "detecting constant expressions"!

Until C++20, there was no way of detecting constant expressions. Now
there is "is_constant_evaluated", constinit and consteval. To my mind,
C++20 has significantly improved the scope for constant expressions and
compile-time programming.

> possibility to detect volatile this with overload resolution.
> You have technically already rejected usefulness of it with your
> praise of changes and don't see value in it.

I have asked you repeatedly to show an example that /you/ think is good
code design, and which relies on a feature of volatile that is (or which
you think is) deprecated in C++20. I have not asked you to show code
you think /I/ will like - that would be entirely unreasonable.

> constant expressions weren't made detectable with some
> std::is_constexpr(foo(42)) but it wasn't needed as there was possible
> to write it using noexcept.

You are conflating three completely separate concepts here. noexcept()
has nothing to do with detecting constant expressions. And neither of
these is remotely connected with volatile or the changes to volatile in
C++20.

I am still unsure as to whether you have misunderstood something in
C++20 (or perhaps in an earlier version), or if you have a point which
you are failing to demonstrate. Until we get an example, no one can tell.

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 28 03:38PM -0400

On 3/28/20 1:42 PM, Öö Tiib wrote:
> On Saturday, 28 March 2020 17:45:09 UTC+2, David Brown wrote:
...
>> misunderstandings about what the changes to volatile in C++20 mean, not
>> because I think you are being deliberately awkward.)

> The change deprecates volatile qualified member functions and so

P1152R0 mentions deprecating volatile-qualified member functions. The
notes concerning the changes between r0 and r1 list the results of votes
on several different issues involving volatile-qualified member
functions, all but one of which was approved. However, in r4, I can find
no other mention of deprecating them. Can you point me to the clause (in
P1152R4) where that occurs?

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 28 03:54PM -0400

On 3/28/20 3:42 PM, Öö Tiib wrote:
> constant expression". So if foo(43) is not constant expression but
> foo(42) is then noexcept(foo(43)) should be false and noexcept(foo(42))
> true.

A call to a constexpr function whose definition has been provided, has
defined behavior, and which meets the requirements for a constexpr
function (10.1.5) doesn't match any of the items in the list under
8.20p2 - as such, it can never fail to qualify as a constant expression,
regardless of which value is passed to the function.

soft and program

Tuesday, March 31, 2020

Digest for comp.lang.c++@googlegroups.com - 5 updates in 2 topics

Digest for comp.programming.threads@googlegroups.com - 1 update in 1 topic

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

Monday, March 30, 2020

Digest for comp.lang.c++@googlegroups.com - 21 updates in 2 topics

Digest for comp.programming.threads@googlegroups.com - 4 updates in 2 topics

Sunday, March 29, 2020

Digest for comp.lang.c++@googlegroups.com - 16 updates in 4 topics

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

Blog Archive

About Me