Saturday, January 6, 2024

Digest for comp.lang.c++@googlegroups.com - 6 updates in 1 topic

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 05 07:21PM -0800

On 1/3/2024 7:37 PM, Bonita Montero wrote:
 
> The Pentium 4's L1 data cache is between 16 and 32kB, so there
> can't be a 64kB aliasing. And aliasing can be only on a set basis
> and the sets are 4kB or 8kB large.
 
Are you trying to tell me that the aliasing problem on those older Intel
hyperthreaded processors and the workaround (from Intel) was a myth?
lol. ;^)
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 08:18AM +0100

Am 06.01.2024 um 04:21 schrieb Chris M. Thomasson:
 
> Are you trying to tell me that the aliasing problem on those older Intel
> hyperthreaded processors and the workaround (from Intel) was a myth?
> lol. ;^)
 
Intel just made a nerd-suggestion. With four-way associativity
there's no frequent aliasing problem in the L1 data dache of
Pentium 4.
Kaz Kylheku <433-929-6894@kylheku.com>: Jan 06 08:31AM


> Intel just made a nerd-suggestion. With four-way associativity
> there's no frequent aliasing problem in the L1 data dache of
> Pentium 4.
 
I think the L1 cache was 8K on that thing, and the blocks are 32 bytes.
 
I think how it works on the P4 is that the address is structured is like
this:
 
31 11 10 5 4 0
| | | | | |
[ 21 bit tag ] [ 6 bit cache set ] [ 5 bit offset into 32 bit block ]
 
Thus say we have an area of the stack with the address
range nnnnFF80 to nnnnFFFF (128 bytes, 4 x 32 byte cache blocks).
 
These four blocks all map to the same set: they have the same six
bits in the "cache set" part of the address.
 
So if a thread is accessing something in all four blocks, it will
completely use that cache set, all by itself.
 
If any other thread has a similar block in its stack, with the same
cache set ID, it will cause evictions against this thread.
 
Sure, if each of these threads confines itself to working with just one
cacheline-sized aperture of the stack, it looks better.
 
You're forgetting that the sets are very small and that groups of
adjacent four 32 byte blocks map to the same set. Touch four adjacent
cache blocks that are aligned on a 128 byte boundary, and you have
hit full occupancy in the cache set corresponding to that block!
 
(I suspect the references to 64K should not be kilobytes but sets.
The 8K cache has 64 sets.)
 
In memory, 128 byte blocks that is aligned maps to, and precisely covers
a cache set. If two such blocks addresses that are equal modulo 8K, they
collide to the same cache set. If one of those blocks is fully present
in the cache, the other must be fully evicted.
 
It's really easy to see how things can go south under hyperthreading.
If two hyperthreads are working with clashing 128 byte areas that each
want to hog the same cache set, and the core is switching between them
on a fine-grained basis, ... you get the picture.
 
It's very easy for the memory mapping allocations used for thread
stacks to produce addresses such tha the delta between them is a
multiple of 8K.
 
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 10:30AM +0100

Am 06.01.2024 um 09:31 schrieb Kaz Kylheku:
 
> It's very easy for the memory mapping allocations used for thread
> stacks to produce addresses such tha the delta between them is a
> multiple of 8K.
 
Of course it's easy to intentionally provoke frequent aliasing
with the P4's L1 cache, but actually this doesn't happen often.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:15PM -0800

On 1/6/2024 1:30 AM, Bonita Montero wrote:
>> multiple of 8K.
 
> Of course it's easy to intentionally provoke frequent aliasing
> with the P4's L1 cache, but actually this doesn't happen often.
 
Fwiw, some people were complaining about bad performance using
hyperthreading. Turning it off in bios improved performance. Hence the
paper was written to show them how to vastly improve performance when
hyperthreading was turned on. You call it nerd stuff, and I still cannot
figure out why?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:19PM -0800

On 1/6/2024 1:15 PM, Chris M. Thomasson wrote:
> paper was written to show them how to vastly improve performance when
> hyperthreading was turned on. You call it nerd stuff, and I still cannot
> figure out why?
 
Humm... I can see it know. Bonita works for Intel and received the
complaints... Bonita says shut up you stupid nerds! Humm... ;^o
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: