soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Sieve of Erastosthenes optimized to the max - 13 Updates

Sieve of Erastosthenes optimized to the max

Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 05:42AM +0100

Am 29.12.2023 um 23:12 schrieb Chris M. Thomasson:

> Wait a minute! I might have found it, lets see:
> https://www.intel.com/content/dam/www/public/us/en/documents/training/developing-multithreaded-applications.pdf
> Ahhh section 5.3! Nice! I read this a while back, before 2005.

This is nonsense what it says if the cache is really four-way
associative like in the other paper mentioned here. And of
course that has nothing to do with false sharing

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 29 08:45PM -0800

On 12/29/2023 8:42 PM, Bonita Montero wrote:

> This is nonsense what it says if the cache is really four-way
> associative like in the other paper mentioned here. And of
> course that has nothing to do with false sharing

You should write Intel a letter. ;^o

Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 05:45AM +0100

Am 29.12.2023 um 18:29 schrieb Kaz Kylheku:

> the stack pointer moves by n bytes. If you then call a function,
> its stack frame will be offset by that much (plus any alignment if
> n is not aligned).

According to the paper Scott mentioned the associativity of the
Pentium 4's L1 data cache is four. With that it's not necessary
to have such aliasing preventions.

Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 04:51AM

> Wait a minute! I might have found it, lets see:

> https://www.intel.com/content/dam/www/public/us/en/documents/training/developing-multithreaded-applications.pdf

> Ahhh section 5.3! Nice! I read this a while back, before 2005.

Wow, I guessed that one. Elsewhere in the thread, I made a remark
similar to "imagine that thread stacks are aligned at an address like
nnnnFFFF" I.e. the top of the stack starts at the top of a 64 kB
aligned window. I was exactly thinking of a typical L1 cache size
from aroudn that era, in fact.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 04:56AM

> This is nonsense what it says if the cache is really four-way
> associative like in the other paper mentioned here. And of
> course that has nothing to do with false sharing

If it's four way associative, you star to have a performance problem as
soon as five things collide on it. For instance, suppose you have two
thread stack tops mapped to the same cache lines, plus three more data
structures heavily being accessed.

Oh, and the above Intel paper does actually mention alloca:

"The easiest way to adjust the initial stack address for each thread is
to call the memory allocation function, _alloca, with varying byte
amounts in the intermediate thread function."

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 06:09AM +0100

Am 30.12.2023 um 05:56 schrieb Kaz Kylheku:

> If it's four way associative, you star to have a performance problem as
> soon as five things collide on it. ...

With four-way associativity that's rather unlikely.

Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 05:51AM

>> If it's four way associative, you star to have a performance problem as
>> soon as five things collide on it. ...

> With four-way associativity that's rather unlikely.

Under four-way associativity, five isn't greater than four?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 10:15AM +0100

Am 30.12.2023 um 06:51 schrieb Kaz Kylheku:

> Under four-way associativity, five isn't greater than four?

Two-way associativeness would leave no space if both threads have
synchronous stack frames. With four-way associativeness there's
not much likehood for that to happen.

David Brown <david.brown@hesbynett.no>: Dec 30 07:27PM +0100

On 29/12/2023 17:04, Bonita Montero wrote:
> vector<>. If you append more than the internal array<> can handle the
> objects are moved to an internal vector. I think Boost's small_array
> is similar to that.

I'm sure that's all very nice, but it is completely and utterly
irrelevant to the issue being discussed. Perhaps you didn't understand
your own post?

>> faster.

> You've got strange ideas. alloca() has been completely removed from the
> Linux kernel.

Citation? You are usually wrong in your claims, or at least mixed-up,
so I won't trust you without evidence. (That does not mean you are
wrong here - I don't know either way.)

Of course, avoiding alloca() within the kernel is, again, utterly
irrelevant to the point under discussion.

> The point is that if there's a fixed upper limit you would
> allocate you could allocate it always statically.

No, that would be useless.

The idea is to have /different/ allocation sizes in different threads,
so that the different threads have their stacks at wildly different
address bits in their tags for the processor caches.

I can't tell you how helpful or not this may be in practice. I am
merely trying to explain to you what the idea is, since you said you did
not understand it.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 30 11:58AM -0800

On 12/29/2023 8:45 PM, Bonita Montero wrote:

> According to the paper Scott mentioned the associativity of the
> Pentium 4's L1 data cache is four. With that it's not necessary
> to have such aliasing preventions.

Huh? Wow, you really need to write Intel a letter about it wrt their
older hyperthreaded processors! Although, it seems like you simply do
not actually _understand_ what is going on here...

Intel's suggestions for how to mitigate the problem in their earlier
hyperhtreaded processors actually worked wrt improving performance. Keep
in mind this was a while back in 2004-2005. I am happy that the way back
machine has my older code.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 30 12:00PM -0800

On 12/29/2023 8:51 PM, Kaz Kylheku wrote:
> nnnnFFFF" I.e. the top of the stack starts at the top of a 64 kB
> aligned window. I was exactly thinking of a typical L1 cache size
> from aroudn that era, in fact.

Yup! You pretty much got it. Thanks Kaz )

Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 08:35PM

> Two-way associativeness would leave no space if both threads have
> synchronous stack frames. With four-way associativeness there's
> not much likehood for that to happen.

My comment makes it clear that there are two thread stacks vying
for that cache line, plus a couple of other accesses that
are not thread stacks.

(By the way, set associative caches don't always have full LRU
replacement strategies.)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

red floyd <no.spam.here@its.invalid>: Dec 30 02:58PM -0800

On 12/30/2023 11:58 AM, Chris M. Thomasson wrote:
> hyperhtreaded processors actually worked wrt improving performance. Keep
> in mind this was a while back in 2004-2005. I am happy that the way back
> machine has my older code.

Oh, come on, Chris. It's clear that Bonita knows more about what's
going on inside an Intel processor than Intel does.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Saturday, December 30, 2023

Digest for comp.lang.c++@googlegroups.com - 13 updates in 1 topic

No comments:

Blog Archive

About Me