- Sieve of Erastosthenes optimized to the max - 13 Updates
Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 05:42AM +0100 Am 29.12.2023 um 23:12 schrieb Chris M. Thomasson: > Wait a minute! I might have found it, lets see: > https://www.intel.com/content/dam/www/public/us/en/documents/training/developing-multithreaded-applications.pdf > Ahhh section 5.3! Nice! I read this a while back, before 2005. This is nonsense what it says if the cache is really four-way associative like in the other paper mentioned here. And of course that has nothing to do with false sharing |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 29 08:45PM -0800 On 12/29/2023 8:42 PM, Bonita Montero wrote: > This is nonsense what it says if the cache is really four-way > associative like in the other paper mentioned here. And of > course that has nothing to do with false sharing You should write Intel a letter. ;^o |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 05:45AM +0100 Am 29.12.2023 um 18:29 schrieb Kaz Kylheku: > the stack pointer moves by n bytes. If you then call a function, > its stack frame will be offset by that much (plus any alignment if > n is not aligned). According to the paper Scott mentioned the associativity of the Pentium 4's L1 data cache is four. With that it's not necessary to have such aliasing preventions. |
Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 04:51AM > Wait a minute! I might have found it, lets see: > https://www.intel.com/content/dam/www/public/us/en/documents/training/developing-multithreaded-applications.pdf > Ahhh section 5.3! Nice! I read this a while back, before 2005. Wow, I guessed that one. Elsewhere in the thread, I made a remark similar to "imagine that thread stacks are aligned at an address like nnnnFFFF" I.e. the top of the stack starts at the top of a 64 kB aligned window. I was exactly thinking of a typical L1 cache size from aroudn that era, in fact. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. |
Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 04:56AM > This is nonsense what it says if the cache is really four-way > associative like in the other paper mentioned here. And of > course that has nothing to do with false sharing If it's four way associative, you star to have a performance problem as soon as five things collide on it. For instance, suppose you have two thread stack tops mapped to the same cache lines, plus three more data structures heavily being accessed. Oh, and the above Intel paper does actually mention alloca: "The easiest way to adjust the initial stack address for each thread is to call the memory allocation function, _alloca, with varying byte amounts in the intermediate thread function." -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 06:09AM +0100 Am 30.12.2023 um 05:56 schrieb Kaz Kylheku: > If it's four way associative, you star to have a performance problem as > soon as five things collide on it. ... With four-way associativity that's rather unlikely. |
Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 05:51AM >> If it's four way associative, you star to have a performance problem as >> soon as five things collide on it. ... > With four-way associativity that's rather unlikely. Under four-way associativity, five isn't greater than four? -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 30 10:15AM +0100 Am 30.12.2023 um 06:51 schrieb Kaz Kylheku: > Under four-way associativity, five isn't greater than four? Two-way associativeness would leave no space if both threads have synchronous stack frames. With four-way associativeness there's not much likehood for that to happen. |
David Brown <david.brown@hesbynett.no>: Dec 30 07:27PM +0100 On 29/12/2023 17:04, Bonita Montero wrote: > vector<>. If you append more than the internal array<> can handle the > objects are moved to an internal vector. I think Boost's small_array > is similar to that. I'm sure that's all very nice, but it is completely and utterly irrelevant to the issue being discussed. Perhaps you didn't understand your own post? >> faster. > You've got strange ideas. alloca() has been completely removed from the > Linux kernel. Citation? You are usually wrong in your claims, or at least mixed-up, so I won't trust you without evidence. (That does not mean you are wrong here - I don't know either way.) Of course, avoiding alloca() within the kernel is, again, utterly irrelevant to the point under discussion. > The point is that if there's a fixed upper limit you would > allocate you could allocate it always statically. No, that would be useless. The idea is to have /different/ allocation sizes in different threads, so that the different threads have their stacks at wildly different address bits in their tags for the processor caches. I can't tell you how helpful or not this may be in practice. I am merely trying to explain to you what the idea is, since you said you did not understand it. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 30 11:58AM -0800 On 12/29/2023 8:45 PM, Bonita Montero wrote: > According to the paper Scott mentioned the associativity of the > Pentium 4's L1 data cache is four. With that it's not necessary > to have such aliasing preventions. Huh? Wow, you really need to write Intel a letter about it wrt their older hyperthreaded processors! Although, it seems like you simply do not actually _understand_ what is going on here... Intel's suggestions for how to mitigate the problem in their earlier hyperhtreaded processors actually worked wrt improving performance. Keep in mind this was a while back in 2004-2005. I am happy that the way back machine has my older code. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 30 12:00PM -0800 On 12/29/2023 8:51 PM, Kaz Kylheku wrote: > nnnnFFFF" I.e. the top of the stack starts at the top of a 64 kB > aligned window. I was exactly thinking of a typical L1 cache size > from aroudn that era, in fact. Yup! You pretty much got it. Thanks Kaz ) |
Kaz Kylheku <433-929-6894@kylheku.com>: Dec 30 08:35PM > Two-way associativeness would leave no space if both threads have > synchronous stack frames. With four-way associativeness there's > not much likehood for that to happen. My comment makes it clear that there are two thread stacks vying for that cache line, plus a couple of other accesses that are not thread stacks. (By the way, set associative caches don't always have full LRU replacement strategies.) -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. |
red floyd <no.spam.here@its.invalid>: Dec 30 02:58PM -0800 On 12/30/2023 11:58 AM, Chris M. Thomasson wrote: > hyperhtreaded processors actually worked wrt improving performance. Keep > in mind this was a while back in 2004-2005. I am happy that the way back > machine has my older code. Oh, come on, Chris. It's clear that Bonita knows more about what's going on inside an Intel processor than Intel does. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment