Sunday, December 24, 2023

Digest for comp.lang.c++@googlegroups.com - 6 updates in 2 topics

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Dec 24 03:53PM -0500

Ben Bacarisse wrote:
 
> but not this one:
 
> switch (argc)
> if (1) case 1: std::cout << 1 << "\n";
I think "if (1)" *is* equivalent to no-op here or, in AT's terms, cannot
"potentially generate executable instructions". I would say "there is no
code to reach hence there is no switch-unreachable warning".
Ben Bacarisse <ben.usenet@bsb.me.uk>: Dec 24 10:21PM

> I think "if (1)" *is* equivalent to no-op here or, in AT's terms, cannot
> "potentially generate executable instructions". I would say "there is no
> code to reach hence there is no switch-unreachable warning".
 
Sounds reasonable.
 
--
Ben.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Dec 24 12:36AM -0800

>> that gives a 25% reduction in space compared to a 6n+/-1 scheme.
 
> I found that on my system the modulo operation was so slow this wasn't
> worth doing.
 
Depending on how the code is written, no modulo operations need
to be done, because they will be optimized away and done at
compile time. If we look at multiplying two numbers represented
by bits in bytes i and j, the two numbers are
 
i*30 + a
j*30 + b
 
for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }.
 
The values we're interested in are the index of the product and
the residue of the product, namely
 
(i*30+a) * (j*30+b) / 30 (for the index)
(i*30+a) * (j*30+b) % 30 (for the residue)
 
Any term with a *30 in the numerator doesn't contribute to
the residue, and also can be combined with the by-30 divide
for computing the index. Thus these expressions can be
rewritten as
 
i*30*j + i*b + j*a + (a*b/30) (for the index)
a*b%30 (for the residue)
 
When a and b have values that are known at compile time,
neither the divide nor the remainder result in run-time
operations being done; all of that heavy lifting is
optimized away and done at compile time. Of course there
are some multiplies, but they are cheaper than divides, and
also can be done in parallel. (The multiplication a*b also
can be done at compile time.)
 
The residue needs to be turned into a bit mask to do the
logical operation on the byte of bits, but here again that
computation can be optimized away and done at compile time.
 
Does that all make sense?
Bonita Montero <Bonita.Montero@gmail.com>: Dec 24 11:03AM +0100

Am 23.12.2023 um 21:52 schrieb Chris M. Thomasson:
 
>> and the end of the thread-local segment overlaps with other thread; but
>> I do it anyway to have maximum performance.
 
> That's is a good habit to get into. :^)
 
I experimentally removed the masking of the lower bits of the
partition bounds according to the cacheline size and there was
no measurable performance-loss.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Dec 24 10:49AM -0800

>> represented).
 
> Primes up to 1e9.
 
> I have another idea though, watch this space...
 
Does your have enough memory to compute all the primes up
to 24e9? If it does I suggest that for your next milestone.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 24 01:24PM -0800

On 12/24/2023 2:03 AM, Bonita Montero wrote:
 
> I experimentally removed the masking of the lower bits of the
> partition bounds  according to the cacheline size and there was
> no measurable performance-loss.
 
Still, imvvho, it _is_ a good practice to get into wrt padding and
aligning...
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: