- Unglaublich, dass das funktioniert: - 2 Updates
- Sieve of Erastosthenes optimized to the max - 4 Updates
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Dec 24 03:53PM -0500 Ben Bacarisse wrote: > but not this one: > switch (argc) > if (1) case 1: std::cout << 1 << "\n"; I think "if (1)" *is* equivalent to no-op here or, in AT's terms, cannot "potentially generate executable instructions". I would say "there is no code to reach hence there is no switch-unreachable warning". |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Dec 24 10:21PM > I think "if (1)" *is* equivalent to no-op here or, in AT's terms, cannot > "potentially generate executable instructions". I would say "there is no > code to reach hence there is no switch-unreachable warning". Sounds reasonable. -- Ben. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Dec 24 12:36AM -0800 >> that gives a 25% reduction in space compared to a 6n+/-1 scheme. > I found that on my system the modulo operation was so slow this wasn't > worth doing. Depending on how the code is written, no modulo operations need to be done, because they will be optimized away and done at compile time. If we look at multiplying two numbers represented by bits in bytes i and j, the two numbers are i*30 + a j*30 + b for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }. The values we're interested in are the index of the product and the residue of the product, namely (i*30+a) * (j*30+b) / 30 (for the index) (i*30+a) * (j*30+b) % 30 (for the residue) Any term with a *30 in the numerator doesn't contribute to the residue, and also can be combined with the by-30 divide for computing the index. Thus these expressions can be rewritten as i*30*j + i*b + j*a + (a*b/30) (for the index) a*b%30 (for the residue) When a and b have values that are known at compile time, neither the divide nor the remainder result in run-time operations being done; all of that heavy lifting is optimized away and done at compile time. Of course there are some multiplies, but they are cheaper than divides, and also can be done in parallel. (The multiplication a*b also can be done at compile time.) The residue needs to be turned into a bit mask to do the logical operation on the byte of bits, but here again that computation can be optimized away and done at compile time. Does that all make sense? |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 24 11:03AM +0100 Am 23.12.2023 um 21:52 schrieb Chris M. Thomasson: >> and the end of the thread-local segment overlaps with other thread; but >> I do it anyway to have maximum performance. > That's is a good habit to get into. :^) I experimentally removed the masking of the lower bits of the partition bounds according to the cacheline size and there was no measurable performance-loss. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Dec 24 10:49AM -0800 >> represented). > Primes up to 1e9. > I have another idea though, watch this space... Does your have enough memory to compute all the primes up to 24e9? If it does I suggest that for your next milestone. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 24 01:24PM -0800 On 12/24/2023 2:03 AM, Bonita Montero wrote: > I experimentally removed the masking of the lower bits of the > partition bounds according to the cacheline size and there was > no measurable performance-loss. Still, imvvho, it _is_ a good practice to get into wrt padding and aligning... |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment