- Sieve of Erastosthenes optimized to the max - 6 Updates
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 05 07:21PM -0800 On 1/3/2024 7:37 PM, Bonita Montero wrote: > The Pentium 4's L1 data cache is between 16 and 32kB, so there > can't be a 64kB aliasing. And aliasing can be only on a set basis > and the sets are 4kB or 8kB large. Are you trying to tell me that the aliasing problem on those older Intel hyperthreaded processors and the workaround (from Intel) was a myth? lol. ;^) |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 08:18AM +0100 Am 06.01.2024 um 04:21 schrieb Chris M. Thomasson: > Are you trying to tell me that the aliasing problem on those older Intel > hyperthreaded processors and the workaround (from Intel) was a myth? > lol. ;^) Intel just made a nerd-suggestion. With four-way associativity there's no frequent aliasing problem in the L1 data dache of Pentium 4. |
Kaz Kylheku <433-929-6894@kylheku.com>: Jan 06 08:31AM > Intel just made a nerd-suggestion. With four-way associativity > there's no frequent aliasing problem in the L1 data dache of > Pentium 4. I think the L1 cache was 8K on that thing, and the blocks are 32 bytes. I think how it works on the P4 is that the address is structured is like this: 31 11 10 5 4 0 | | | | | | [ 21 bit tag ] [ 6 bit cache set ] [ 5 bit offset into 32 bit block ] Thus say we have an area of the stack with the address range nnnnFF80 to nnnnFFFF (128 bytes, 4 x 32 byte cache blocks). These four blocks all map to the same set: they have the same six bits in the "cache set" part of the address. So if a thread is accessing something in all four blocks, it will completely use that cache set, all by itself. If any other thread has a similar block in its stack, with the same cache set ID, it will cause evictions against this thread. Sure, if each of these threads confines itself to working with just one cacheline-sized aperture of the stack, it looks better. You're forgetting that the sets are very small and that groups of adjacent four 32 byte blocks map to the same set. Touch four adjacent cache blocks that are aligned on a 128 byte boundary, and you have hit full occupancy in the cache set corresponding to that block! (I suspect the references to 64K should not be kilobytes but sets. The 8K cache has 64 sets.) In memory, 128 byte blocks that is aligned maps to, and precisely covers a cache set. If two such blocks addresses that are equal modulo 8K, they collide to the same cache set. If one of those blocks is fully present in the cache, the other must be fully evicted. It's really easy to see how things can go south under hyperthreading. If two hyperthreads are working with clashing 128 byte areas that each want to hog the same cache set, and the core is switching between them on a fine-grained basis, ... you get the picture. It's very easy for the memory mapping allocations used for thread stacks to produce addresses such tha the delta between them is a multiple of 8K. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 10:30AM +0100 Am 06.01.2024 um 09:31 schrieb Kaz Kylheku: > It's very easy for the memory mapping allocations used for thread > stacks to produce addresses such tha the delta between them is a > multiple of 8K. Of course it's easy to intentionally provoke frequent aliasing with the P4's L1 cache, but actually this doesn't happen often. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:15PM -0800 On 1/6/2024 1:30 AM, Bonita Montero wrote: >> multiple of 8K. > Of course it's easy to intentionally provoke frequent aliasing > with the P4's L1 cache, but actually this doesn't happen often. Fwiw, some people were complaining about bad performance using hyperthreading. Turning it off in bios improved performance. Hence the paper was written to show them how to vastly improve performance when hyperthreading was turned on. You call it nerd stuff, and I still cannot figure out why? |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:19PM -0800 On 1/6/2024 1:15 PM, Chris M. Thomasson wrote: > paper was written to show them how to vastly improve performance when > hyperthreading was turned on. You call it nerd stuff, and I still cannot > figure out why? Humm... I can see it know. Bonita works for Intel and received the complaints... Bonita says shut up you stupid nerds! Humm... ;^o |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment