comp.programming.threads@googlegroups.com | Google Groups | ![]() |
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
- W have to be smart - 4 Updates
Ramine <ramine@1.1>: Nov 18 06:41PM -0800 Hello, I have just read the following PhD paper about NUMA Cohort locks... http://dspace.mit.edu/handle/1721.1/72670 and i have read also the following Master research paper about NUMA cohort locks... https://cs.brown.edu/research/pubs/theses/masters/2012/ma.pdf So we have to be smart please , so follow with me... I have just read the above papers and i have completly understood there algorithm, it uses local locks and a global lock and it look like a distributed algorithm that tries to minimize at best the inter-socket coherence traffic , so i have not had a problem to understand it easily, but i have not been satisfied with those papers, why ? if you read carefully the PhD paper above you will notice that there benchmarks are saying that the Lock cohort scales to about 6x compared with a non-Numa Lock... and they have explained this 6x scaling by the fact that there Lock cohort tries to minimize at best the inter-socket coherence traffic , but i am not convinced by there explanation, cause this 6x scaling comes instead from the fact that there is parallelism inside the function that permit us to enter the local locks first , this parallelism is around 6 CPU clocks or so and there is another serial part of the cache-line transfer in there other function of two integers tranfers from the L2 cache memory to the CPU that is around two clocks CPU and there is a serial part that spins for about 4 ms, so from the Amdahl's law this will scale to around 6x , so the scaling don't come from the fact that they are minimizing at best the inter-socket cache coherence traffic as they are saying, but from the Amdahl's law that says so as i have just explained it to you, so if you are transfering more data from the L2 cache to the CPU inside the critical section of the Lock cohort there benchmarks with a Lock cohort will scale much less than 6x, so if you have undertstood what i want to say to you , is that the lock cohort doesn't bring you much scalability if you are transfering more than 4 bytes from the L2 cache to the CPU inside the critical section of the Lock cohort cause this will scale much less than 6x, so what i want to say is that the non-NUMA locks are still useful i think , and my scalable MLock can be used also in realtime systems, the NUMA lock cohort can not. You can download my scalable MLock from https://sites.google.com/site/aminer68/scalable-mlock Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 18 07:10PM -0800 Hello, Please read the PhD paper, it says about the benchmarks that scales to 6x that: "The graph shows the average throughput in terms of number of critical and non-critical section pairs executed per second. The critical section accesses two distinct cache blocks (increments 4 integer counters on each block), and the non-critical section is an idle spin loop of up to 4 microseconds." So as i have just explained to you, the serial part inside the critical section takes around 1 clock and the parallel part inside the function that enters the local locks takes around 6 clocks , this is why it gives 6x scalability from the calculation results of the Amdahl's law. Hope you have understood well what i want to say, that the scalability of 6x is not the result of the minimization at best of the inter-socket coherence traffic, but it is the result of the parallel part inside the function that enters the local locks and the serial part inside the critical section of the lock cohort, this is what the PhD paper doesn't explain to you, and also you have to know that if you are transfering more than 4 bytes from the L2(local or remote) to the CPU, the Lock cohort will scale less and less than 6x , this is why my scalable MLock is still useful and my scalable MLock can be used in realtime critical systems, the Lock cohort can not cause it is unfair. Thank you, Amine Moulay Ramdane,. |
Ramine <ramine@1.1>: Nov 18 07:46PM -0800 Hello, The PhD paper says about the benchmarks that gives 6x scalability that: "The graph shows the average throughput in terms of number of critical and non-critical section pairs executed per second. The critical section accesses two distinct cache blocks (increments 4 integer counters on each block), and the non-critical section is an idle spin loop of up to 4 microseconds." So i think from the benchmark that the idle loop doen't count as a serial part cause it is not a critical section and it doesn't affect by much the Amdahl's law, so the critical section that accesses the two distinct cache blocks and increments 4 integer counters on each block will use ILP (instruction level parallelism) of the processor to lower the amount of CPU clocks that it takes in the critical section that means it will lower the time of serial part of the Amdahl's law, and the parallel part is when you enter the the local locks and this will take a number of CPU clocks, and i think this is what is giving the 6x scaling with the Amdahl's law calculation, and what i want to explain is that it is not the mimization of the inter-socket coherence traffic that is giving the 6x scalability, but it's the Amdahl's law that is giving 6x scalability. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 18 08:02PM -0800 On 11/18/2014 7:46 PM, Ramine wrote: > will take a number of CPU clocks, and i think this is what is giving the > 6x scaling with the Amdahl's law calculation, and what i want to explain > is that it is not the mimization of the inter-socket coherence traffic I mean "minimization", not mimization. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment