Monday, January 29, 2024

Digest for comp.lang.c++@googlegroups.com - 4 updates in 3 topics

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 29 02:45PM -0800

On 1/27/2024 1:05 PM, Chris M. Thomasson wrote:
>> #include "thread_pool.h"
>> #include "invoke_on_destruct.h"
 
>> #if defined(_WIN32)
^^^^^^^^^^^^^
 
Why use _WIN32 here to disable all of those important warnings?
 
_MSC_VER instead?
 
Also, why disable all of those warnings in MSVC?
 
 
Bonita Montero <Bonita.Montero@gmail.com>: Jan 29 09:56AM +0100

Am 28.01.2024 um 20:18 schrieb Chris M. Thomasson:
 
> Try padding and aligning the blocks. iirc, std::vector works with
> alignas. Actually, it's pretty nice.
 
I'm testing all 64 offsets. If offset zero becomes physically offset
one in the cacheline doesn't matter since physical offset zero would
then be occupied by logical offset 63.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 29 02:12PM -0800

On 1/29/2024 12:56 AM, Bonita Montero wrote:
 
> I'm testing all 64 offsets. If offset zero becomes physically offset
> one in the cacheline doesn't matter since physical offset zero would
> then be occupied by logical offset 63.
 
You don't want to straddle any cache lines. Properly aligning and
padding the blocks gets around that...
Vir Campestris <vir.campestris@invalid.invalid>: Jan 29 09:31PM

I retired last year, and I haven't really written any code since. This
has turned out to be quite a fun little thing of a type I haven't had
time for for YEARS. And oddly I still don't seem to have enough time for
it... It's the garden, and the kid's gardens, and my mum's garden, and
all those holidays :)
 
But some optimisations. You'll remember in Bonita's first version the
bitmap was initialised to 0xaaaa, because it's a waste of time doing the
sieve for 2.
 
I pointed out that we don't need to even store the even numbers.
 
But there's more.
 
If you look at the bitmap when you've sieved for 2 you see
 
12 34 56 78
11 10 10 10...
which is a repeat of 2 after an initialisation word. That's the aaaa.
 
You can do the same with 3
 
123 456 789
111 110 110 110
 
except this time the repeat is 3. And annoyingly that doesn't map well
down onto a byte-based architecture. You end up with an initial word,
then a 3-word repeat. (If your word was 24 or 36 bytes it would only be
1 word, but I haven't seen that since the 1970s)
 
In hex, with lowest byte first, that is
fd b6 6d db b6 6d db
 
(That BTW is the same if you only store the odd numbers - a 3-word repeat.)
 
So rather than start off with your bitmap all set to 1s you can set it
to this repeating pattern. That replaces all the ANDs for all the values
of three with a memory fill with far fewer accesses.
 
You can do the same with 5:
 
12345 6789a bcdef
11111 11110 11110 11110
 
You can then AND your pattern for 3 with the one for 5, and get one with
a repeat length of 15, and set that into your bitmap. You've now
replaced about a fifth of all your AND operations with a flood fill.
 
This can carry on - for a while. Only a short while. You _can_ make a
sequence for lots of primes. But it gets quite long, quite quickly. For
up to 23, and not storing the evens, it's over 1E8 words long!
 
I was implementing a version of that when something else occurred to me.
You can sacrifice speed for store size if you're prepared to do an
integer divide for every prime lookup.
 
Group our numbers into 6s like this
 
012345
6789ab
cdefgh (YKWIM)
ijklmn
 
Mask out the ones divisible by 2 or 3 and you see
 
0123-5
-7---b
-d---h
-i---n
 
Except for the first "page" the pattern is identical. Your algorithm can be:
Divide your candidate by 6.
If the result is zero look up in the page zero table 0123-5
If it is non-zero index into a translation table like this
010002
If the translation is zero the number is not prime.
If it is not zero it is the index of the bit for this page which tells
me if the number is prime. And there need only be 2 bits for 6 numbers.
 
(6 is the product of the first primes 2 and 3)
 
It's still worth masking out the even numbers - they don't need the
divide, a simple AND detects them - and I suspect it's worth using a
much larger number than 6. 3*5*11*13 is 15,015, which seems as though it
might be a convenient size. And only about a third of the number aren't
known to be primes (5760 to be exact)
 
I might play with that idea some time.
 
But a note for the group of course - optimising this to the max has
nothing to do whatever with C++. The only C++ optimising I've found
myself doing is using raw pointers, not vector's operator[]. (certainly
not the at function). And also I found myself using emplace_back a lot.
It's a PITA because you can only emplace back a single item, and it is slow.
 
Andy
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: