Tuesday, February 17, 2015

Digest for comp.programming.threads@googlegroups.com - 2 updates in 2 topics

Ramine <ramine@1.1>: Feb 16 05:39PM -0800

Hello,
 
 
As you have noticed i have implemented a parallel sort library and
i have done some benchmark on it, and as you have noticed it can
not scale much than 8X on multicore processors, the reason is simple,
cause by nature the parallel sorts algorithms of my parallel sort
library are limited by the serial part of the Amdahl's law that is
the memory transfers from the memory to the L2 caches, so what is
the solution then to make my parallel sort library to scale more on
multicore CPUs ? you have to made it NUMA-aware and to make it scale
on NUMA by parallelizing the memory transfers from the memory to the L2
caches, this way it will scale on NUMA architectue , so i have
decided to make this my next project, i will soon make my
parallel sort library NUMA-aware and i will then make it scale more and
more on NUMA architecture with more and more NUMA nodes, so as you see
i will render my parallel sort library a very powerful parallel sort
library, so, the sorting part and the merging part will be fully
parrallelized and fully scalable on NUMA architecture on my next
parallel sort library , that's a really good news !
 
 
 
 
Thank you for your time.
 
 
 
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Feb 16 02:17PM -0800

Hello,
 
 
A guy called Grinder on usenet have presented me the results of my
benchamrk of my parallel sort library, here it is on a 12 cores and read
my explanation bellow:
 
---
 
Number of cores is: 12
Scalability with parallel mergesort on 12 cores is: 2.38
Time of parallel mergesort on one core is: 323286 microseconds
Time of parallel mergesort on 12 cores is: 135606 microseconds
 
 
Number of cores is: 12
Scalability with parallel quicksort on 12 cores is: 2.33
Time of parallel quicksort on one core is: 330408 microseconds
Time of parallel quicksort on 12 cores is: 141813 microseconds
 
 
Number of cores is: 12
Scalability with parallel heapsort on 12 cores is: 4.41
Time of parallel heapsort on one core is: 730261 microseconds
Time of parallel heapsort on 12 cores is: 165588 microseconds
 
--
 
I have took a look at the results, and i think the explanation is the
following: Since i am sorting strings, the compare's function in my
parallel sorts algorithms that compares two strings is less expensive on
your CPU, so from the Amdahl's law, since the compare's function
constitutes the parallel part of the Amdahl's law, and the memory
transfers from the memory to the L2 caches constitutes the serial part,
so since the parallel part is less expensive on your CPU than on other
CPUs , so from the Amdahl's law this is why it scales less on your CPU
than on other CPUs.
 
But i have to be more precise as does a scientist: the compare's
function of two strings takes less CPU time on your CPU than other CPUs,
so it scales less on more cores, but the compare's function of two
strings on your CPU is faster than on other CPUs, so even if
it scales less on more cores , the two algorithms, the serial and the
parallel, are faster on your CPU than on other CPUs, since the compare's
function of two strings is faster on your CPU than on other
CPUs, that's a good news.
 
 
 
 
Thnk you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: