Friday, January 17, 2020

Digest for comp.programming.threads@googlegroups.com - 3 updates in 3 topics

aminer68@gmail.com: Jan 16 03:04PM -0800

Hello,
 
 
About data races in parallel programming..
 
I have just read the following webpage:
 
Benign Data Races: What Could Possibly Go Wrong?
 
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
 
 
And as you have noticed in the above webpage that you have to be careful with compilers in parallel programming so that to avoid data races, because compilers can optimize global variables if it is not volatile, and this can cause data races in parallel programming, i am understanding this issue, and i am for example using dynamic memory in Freepascal and Delphi for global variables that are written by the main thread and read by many other threads , also i know how to use memory fences to force visibility, and i also know around how how much time takes the store buffer to drain etc.
 
Read my following thoughts to undertand more:

 
About the store buffer and memory visibility..
 
 
More about memory visibility..
 
I said before:
 
As you know that in parallel programming you have to take care
not only of memory ordering , but also take care about memory visibility, read this to notice it:
 
A store barrier, "sfence" instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state "visible" to other CPUs so they can act on it if necessary.
 
 
Read more here to understand correctly:
 
"However under x86-TSO, the stores are cached in the store buffers,
a load consult only shared memory and the store buffer of the given thread, wich means it can load data from memory and ignore values from
the other thread."
 
Read more here:
 
https://books.google.ca/books?id=C2R2DwAAQBAJ&pg=PA127&lpg=PA127&dq=immediately+visible+and+m+fence+and+store+buffer+and+x86&source=bl&ots=yfGI17x1YZ&sig=ACfU3U2EYRawTkQmi3s5wY-sM7IgowDlWg&hl=en&sa=X&ved=2ahUKEwi_nq3duYPkAhVDx1kKHYoyA5UQ6AEwAnoECAgQAQ#v=onepage&q=immediately%20visible%20and%20m%20fence%20and%20store%20buffer%20and%20x86&f=false
 
 
Now can we ask the question of how much time takes the
store buffer to drain ?
 
 
So read here to notice:
 
https://nicknash.me/2018/04/07/speculating-about-store-buffer-capacity/
 
 
So as you are noticing he is giving around 500 no-ops to allow the store
buffer to drain, and i think that it can take less than that for the store buffer to drain.
 
 
 
Thank you,
Amine Moulay Ramdane.
aminer68@gmail.com: Jan 16 11:55AM -0800

Hello,
 
 
I correct a last typo, read again..
 
About Memory Ordering and Atomicity..
 
Read the following it says:
 
"The performance gain from allowing memory reordering is small, and it doesn't make up for the extra headaches that come from difficult-to-find failures."
 
Read more here:
 
http://www.informit.com/articles/article.aspx?p=1676714&seqNum=5
 
 
So then i think what i wrote previously about it is true, read it again carefully:
 
Here is another problem with ARM processors..
 
About SC and TSO and RMO hardware memory models..
 
I have just read the following webpage about the performance difference
between: SC and TSO and RMO hardware memory models
 
I think TSO is better, it is just around 3% ~ 6% less performance
than RMO and it is a simpler programming model than RMO. So i think ARM
must support TSO to be compatible with x86 that is TSO.
 
Read more here to notice it:
 
https://infoscience.epfl.ch/record/201695/files/CS471_proj_slides_Tao_Marc_2011_1222_1.pdf
 
About memory models and sequential consistency:
 
As you have noticed i am working with x86 architecture..
 
Even though x86 gives up on sequential consistency, it's among the most
well-behaved architectures in terms of the crazy behaviors it allows.
Most other architectures implement even weaker memory models.
 
ARM memory model is notoriously underspecified, but is essentially a
form of weak ordering, which provides very few guarantees. Weak ordering
allows almost any operation to be reordered, which enables a variety of
hardware optimizations but is also a nightmare to program at the lowest
levels.
 
Read more here:
 
https://homes.cs.washington.edu/~bornholt/post/memory-models.html
 
 
Memory Models: x86 is TSO, TSO is Good
 
Essentially, the conclusion is that x86 in practice implements the old
SPARC TSO memory model.
 
The big take-away from the talk for me is that it confirms the
observation made may times before that SPARC TSO seems to be the optimal
memory model. It is sufficiently understandable that programmers can
write correct code without having barriers everywhere. It is
sufficiently weak that you can build fast hardware implementation that
can scale to big machines.
 
Read more here:
 
https://jakob.engbloms.se/archives/1435
 
 
Thank you,
Amine Moulay Ramdane.
aminer68@gmail.com: Jan 16 11:47AM -0800

Hello,
 
 
About Memory Ordering and Atomicity..
 
Read the following it says:
 
"The performance gain from allowing memory reordering is small, and it doesn't make up for the extra headaches that come from difficult-to-find failures."
 
Read more here:
 
http://www.informit.com/articles/article.aspx?p=1676714&seqNum=5
 
 
So then i think what i wrote previously about it is true, read i again carefully:
 
Here is another problem with ARM processors..
 
About SC and TSO and RMO hardware memory models..
 
I have just read the following webpage about the performance difference
between: SC and TSO and RMO hardware memory models
 
I think TSO is better, it is just around 3% ~ 6% less performance
than RMO and it is a simpler programming model than RMO. So i think ARM
must support TSO to be compatible with x86 that is TSO.
 
Read more here to notice it:
 
https://infoscience.epfl.ch/record/201695/files/CS471_proj_slides_Tao_Marc_2011_1222_1.pdf
 
About memory models and sequential consistency:
 
As you have noticed i am working with x86 architecture..
 
Even though x86 gives up on sequential consistency, it's among the most
well-behaved architectures in terms of the crazy behaviors it allows.
Most other architectures implement even weaker memory models.
 
ARM memory model is notoriously underspecified, but is essentially a
form of weak ordering, which provides very few guarantees. Weak ordering
allows almost any operation to be reordered, which enables a variety of
hardware optimizations but is also a nightmare to program at the lowest
levels.
 
Read more here:
 
https://homes.cs.washington.edu/~bornholt/post/memory-models.html
 
 
Memory Models: x86 is TSO, TSO is Good
 
Essentially, the conclusion is that x86 in practice implements the old
SPARC TSO memory model.
 
The big take-away from the talk for me is that it confirms the
observation made may times before that SPARC TSO seems to be the optimal
memory model. It is sufficiently understandable that programmers can
write correct code without having barriers everywhere. It is
sufficiently weak that you can build fast hardware implementation that
can scale to big machines.
 
Read more here:
 
https://jakob.engbloms.se/archives/1435
 
 
Thank you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: