- About data races in parallel programming.. - 1 Update
- I correct a last typo, read again.. - 1 Update
- About Memory Ordering and Atomicity.. - 1 Update
aminer68@gmail.com: Jan 16 03:04PM -0800 Hello, About data races in parallel programming.. I have just read the following webpage: Benign Data Races: What Could Possibly Go Wrong? https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong And as you have noticed in the above webpage that you have to be careful with compilers in parallel programming so that to avoid data races, because compilers can optimize global variables if it is not volatile, and this can cause data races in parallel programming, i am understanding this issue, and i am for example using dynamic memory in Freepascal and Delphi for global variables that are written by the main thread and read by many other threads , also i know how to use memory fences to force visibility, and i also know around how how much time takes the store buffer to drain etc. Read my following thoughts to undertand more: About the store buffer and memory visibility.. More about memory visibility.. I said before: As you know that in parallel programming you have to take care not only of memory ordering , but also take care about memory visibility, read this to notice it: A store barrier, "sfence" instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state "visible" to other CPUs so they can act on it if necessary. Read more here to understand correctly: "However under x86-TSO, the stores are cached in the store buffers, a load consult only shared memory and the store buffer of the given thread, wich means it can load data from memory and ignore values from the other thread." Read more here: https://books.google.ca/books?id=C2R2DwAAQBAJ&pg=PA127&lpg=PA127&dq=immediately+visible+and+m+fence+and+store+buffer+and+x86&source=bl&ots=yfGI17x1YZ&sig=ACfU3U2EYRawTkQmi3s5wY-sM7IgowDlWg&hl=en&sa=X&ved=2ahUKEwi_nq3duYPkAhVDx1kKHYoyA5UQ6AEwAnoECAgQAQ#v=onepage&q=immediately%20visible%20and%20m%20fence%20and%20store%20buffer%20and%20x86&f=false Now can we ask the question of how much time takes the store buffer to drain ? So read here to notice: https://nicknash.me/2018/04/07/speculating-about-store-buffer-capacity/ So as you are noticing he is giving around 500 no-ops to allow the store buffer to drain, and i think that it can take less than that for the store buffer to drain. Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 16 11:55AM -0800 Hello, I correct a last typo, read again.. About Memory Ordering and Atomicity.. Read the following it says: "The performance gain from allowing memory reordering is small, and it doesn't make up for the extra headaches that come from difficult-to-find failures." Read more here: http://www.informit.com/articles/article.aspx?p=1676714&seqNum=5 So then i think what i wrote previously about it is true, read it again carefully: Here is another problem with ARM processors.. About SC and TSO and RMO hardware memory models.. I have just read the following webpage about the performance difference between: SC and TSO and RMO hardware memory models I think TSO is better, it is just around 3% ~ 6% less performance than RMO and it is a simpler programming model than RMO. So i think ARM must support TSO to be compatible with x86 that is TSO. Read more here to notice it: https://infoscience.epfl.ch/record/201695/files/CS471_proj_slides_Tao_Marc_2011_1222_1.pdf About memory models and sequential consistency: As you have noticed i am working with x86 architecture.. Even though x86 gives up on sequential consistency, it's among the most well-behaved architectures in terms of the crazy behaviors it allows. Most other architectures implement even weaker memory models. ARM memory model is notoriously underspecified, but is essentially a form of weak ordering, which provides very few guarantees. Weak ordering allows almost any operation to be reordered, which enables a variety of hardware optimizations but is also a nightmare to program at the lowest levels. Read more here: https://homes.cs.washington.edu/~bornholt/post/memory-models.html Memory Models: x86 is TSO, TSO is Good Essentially, the conclusion is that x86 in practice implements the old SPARC TSO memory model. The big take-away from the talk for me is that it confirms the observation made may times before that SPARC TSO seems to be the optimal memory model. It is sufficiently understandable that programmers can write correct code without having barriers everywhere. It is sufficiently weak that you can build fast hardware implementation that can scale to big machines. Read more here: https://jakob.engbloms.se/archives/1435 Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 16 11:47AM -0800 Hello, About Memory Ordering and Atomicity.. Read the following it says: "The performance gain from allowing memory reordering is small, and it doesn't make up for the extra headaches that come from difficult-to-find failures." Read more here: http://www.informit.com/articles/article.aspx?p=1676714&seqNum=5 So then i think what i wrote previously about it is true, read i again carefully: Here is another problem with ARM processors.. About SC and TSO and RMO hardware memory models.. I have just read the following webpage about the performance difference between: SC and TSO and RMO hardware memory models I think TSO is better, it is just around 3% ~ 6% less performance than RMO and it is a simpler programming model than RMO. So i think ARM must support TSO to be compatible with x86 that is TSO. Read more here to notice it: https://infoscience.epfl.ch/record/201695/files/CS471_proj_slides_Tao_Marc_2011_1222_1.pdf About memory models and sequential consistency: As you have noticed i am working with x86 architecture.. Even though x86 gives up on sequential consistency, it's among the most well-behaved architectures in terms of the crazy behaviors it allows. Most other architectures implement even weaker memory models. ARM memory model is notoriously underspecified, but is essentially a form of weak ordering, which provides very few guarantees. Weak ordering allows almost any operation to be reordered, which enables a variety of hardware optimizations but is also a nightmare to program at the lowest levels. Read more here: https://homes.cs.washington.edu/~bornholt/post/memory-models.html Memory Models: x86 is TSO, TSO is Good Essentially, the conclusion is that x86 in practice implements the old SPARC TSO memory model. The big take-away from the talk for me is that it confirms the observation made may times before that SPARC TSO seems to be the optimal memory model. It is sufficiently understandable that programmers can write correct code without having barriers everywhere. It is sufficiently weak that you can build fast hardware implementation that can scale to big machines. Read more here: https://jakob.engbloms.se/archives/1435 Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment