- About my parallel sort algorithms - 1 Update
- More about the benchmark and my conclusion - 1 Update
| Ramine <ramine@1.1>: Feb 16 05:39PM -0800 Hello, As you have noticed i have implemented a parallel sort library and i have done some benchmark on it, and as you have noticed it can not scale much than 8X on multicore processors, the reason is simple, cause by nature the parallel sorts algorithms of my parallel sort library are limited by the serial part of the Amdahl's law that is the memory transfers from the memory to the L2 caches, so what is the solution then to make my parallel sort library to scale more on multicore CPUs ? you have to made it NUMA-aware and to make it scale on NUMA by parallelizing the memory transfers from the memory to the L2 caches, this way it will scale on NUMA architectue , so i have decided to make this my next project, i will soon make my parallel sort library NUMA-aware and i will then make it scale more and more on NUMA architecture with more and more NUMA nodes, so as you see i will render my parallel sort library a very powerful parallel sort library, so, the sorting part and the merging part will be fully parrallelized and fully scalable on NUMA architecture on my next parallel sort library , that's a really good news ! Thank you for your time. Amine Moulay Ramdane. |
| Ramine <ramine@1.1>: Feb 16 02:17PM -0800 Hello, A guy called Grinder on usenet have presented me the results of my benchamrk of my parallel sort library, here it is on a 12 cores and read my explanation bellow: --- Number of cores is: 12 Scalability with parallel mergesort on 12 cores is: 2.38 Time of parallel mergesort on one core is: 323286 microseconds Time of parallel mergesort on 12 cores is: 135606 microseconds Number of cores is: 12 Scalability with parallel quicksort on 12 cores is: 2.33 Time of parallel quicksort on one core is: 330408 microseconds Time of parallel quicksort on 12 cores is: 141813 microseconds Number of cores is: 12 Scalability with parallel heapsort on 12 cores is: 4.41 Time of parallel heapsort on one core is: 730261 microseconds Time of parallel heapsort on 12 cores is: 165588 microseconds -- I have took a look at the results, and i think the explanation is the following: Since i am sorting strings, the compare's function in my parallel sorts algorithms that compares two strings is less expensive on your CPU, so from the Amdahl's law, since the compare's function constitutes the parallel part of the Amdahl's law, and the memory transfers from the memory to the L2 caches constitutes the serial part, so since the parallel part is less expensive on your CPU than on other CPUs , so from the Amdahl's law this is why it scales less on your CPU than on other CPUs. But i have to be more precise as does a scientist: the compare's function of two strings takes less CPU time on your CPU than other CPUs, so it scales less on more cores, but the compare's function of two strings on your CPU is faster than on other CPUs, so even if it scales less on more cores , the two algorithms, the serial and the parallel, are faster on your CPU than on other CPUs, since the compare's function of two strings is faster on your CPU than on other CPUs, that's a good news. Thnk you, Amine Moulay Ramdane. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment