comp.programming.threads@googlegroups.com | Google Groups | ![]() |
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
- We have to speak frankly and clearly - 1 Update
- What about a PhD paper - 2 Updates
- Scalable RWLocks - 1 Update
- About Parallel archiver - 1 Update
- Parallel archiver was updated to version 2.2 - 1 Update
Ramine <ramine@1.1>: Oct 31 05:36PM -0700 Hello, We have to speak frankly and clearly, as is doing PhDs of IEEE for example... What i want to say this time, is that you have to know that parallel programming is hard, cause you have to think about sequntial consistency and about racing conditions and about Deadlocks and about Lock convoy etc. and you have to know also that proving lockfree algorithms is hard to do... so how can we proceed and walk confidently in this complex area that is parallel programming ? my way of doing is to implement a reduced numbers of libraries that are stable and use them as base fondations for other projects, this is my way of reducing the risk or eliminating the risk, so what i have done by inventing my libraries and parallel algorithms and synchronization algorithms is reducing the complexity of parallel programming by using those same parallel projects as building blocks for other projects, i have taking a lot of care for verifying that my parallel algotithms are free of bug and free of problems, and i think i have succeeded in that regards by inventing and implementing and bringing some stable parallel libraries and stable synchronization algorithms, and as you have noticed i have not used Throttling Concurrency in my Threadpool like they are doing in the following link: http://msdn.microsoft.com/en-us/magazine/ff960958.aspx Simply because my parallel projects dont contains waiting instructions like sleep() or IO waiting or this kind of waiting functions, so this makes my parallel algorithms efficient by using my Threadpool i think. So hope this will clear my intentions that were to bring to you stable and fast parallel algorithms, and i think i have succeeded in this regard. You can download all my parallel projects from: https://sites.google.com/site/aminer68/ Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Oct 31 04:32PM -0700 Hello, Frankly i think i have to write a PhD paper for my Parallel Sort library... If you look at my Parallel Quickort you will notice that when the main thread enters the partition() procedure for the first time before starting the threads, it will have to serially partition "all" the array, and this will make the serial part of the Amdahl equation bigger , so it will make my Parallel Quicksort less scalable... hope this is clear... but what about my Parallel Sort library ? we have to be smart, when you take a look at the PhD papers, you will notice that they are benchmarking there parallel algorithms empiricaly, but since i don't have a computer with more than 4 cores, i can not benchmark my Parallel Sort library with more than 4 cores, so i have to do mathematical calculations to do a scalability prediction for my Parallel Sort library, so if you take a look at the source code of my Parallel Sort library, you will notice that i am using 2 steps , one step for the sorting part and one step for the merging part, and my parallel algorithm have completly parallelized the sorting part and completly parallelized the merging part and this has made my parallel algorithm more scalable cause its serial part is much smaller than the serial part of my Parallel Quicksort library, so how can we make a scalability prediction for my Parallel Sort library ? you have to do it the same way as you do it in assembler, you have to calculate the number of CPU clocks that the parallel part of the Amdahl equation takes, and the number of CPU clocks that the serial part of the Amdahl equation takes, i have done it for my Parallel Quicksort inside my Parallel Sort library , although this depends on the size of the "string" when you are sorting strings, we will take as an example an average string of 10 characters, so this will give us around 16 clocks for the parallel part of the Amdahl equation for the compare() between strings etc., cause i have done the calculation my self, and for the serial part we have to know that the memory transfers are serialized in the x86 computers , and since on DDR3 the memory throughput can go up to 17 MB/s and on my 2.4 Ghz CPU this will give 8 bytes memory transfers per clock from the memoryto the CPU, so this will take around 2 clocks for transfering two strings with an average size of 10 characters before making the compare()... so finally this will give a serial part of 2 clocks and a parallel part of 16 clocks, so this will give a scalability on multicores of 9x on x86 multicores with DDR3 memory, this was the scalability prediction for "strings" with an average size of 10 characters, for a strings with bigger size it can scale more , but for integers sorting, two integers will be transfers from the memory to the CPU in 1 CPU clock before doing the compare(), and the compare() function between integers will take only one clock, so for integers it will take 7 CPU clocks for the parallel part and 1 CPU clock and one clock for the serial part , so after doing the Amdahl equation calculation for integers it will give a scalability on multicores to no more than around 8x for integers. So my Parallel Sort library is much more scalable than my Parallel Quicksort library... This very good to know... You can download my Parallel Sort library from: https://sites.google.com/site/aminer68/parallel-sort-library Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Oct 31 04:33PM -0700 I correct: Hello... Frankly i think i have to write a PhD paper for my Parallel Sort library... If you look at my Parallel Quicksort you will notice that when the main thread enters the partition() procedure for the first time before starting the threads, it will have to serially partition "all" the array, and this will make the serial part of the Amdahl equation bigger , so it will make my Parallel Quicksort less scalable... hope this is clear... but what about my Parallel Sort library ? we have to be smart, when you take a look at the PhD papers, you will notice that they are benchmarking there parallel algorithms empiricaly, but since i don't have a computer with more than 4 cores, i can not benchmark my Parallel Sort library with more than 4 cores, so i have to do mathematical calculations to do a scalability prediction for my Parallel Sort library, so if you take a look at the source code of my Parallel Sort library, you will notice that i am using 2 steps , one step for the sorting part and one step for the merging part, and my parallel algorithm have completly parallelized the sorting part and completly parallelized the merging part and this has made my parallel algorithm more scalable cause its serial part is much smaller than the serial part of my Parallel Quicksort library, so how can we make a scalability prediction for my Parallel Sort library ? you have to do it the same way as you do it in assembler, you have to calculate the number of CPU clocks that the parallel part of the Amdahl equation takes, and the number of CPU clocks that the serial part of the Amdahl equation takes, i have done it for my Parallel Quicksort inside my Parallel Sort library , although this depends on the size of the "string" when you are sorting strings, we will take as an example an average string of 10 characters, so this will give us around 16 clocks for the parallel part of the Amdahl equation for the compare() between strings etc., cause i have done the calculation my self, and for the serial part we have to know that the memory transfers are serialized in the x86 computers , and since on DDR3 the memory throughput can go up to 17 MB/s and on my 2.4 Ghz CPU this will give 8 bytes memory transfers per clock from the memoryto the CPU, so this will take around 2 clocks for transfering two strings with an average size of 10 characters before making the compare()... so finally this will give a serial part of 2 clocks and a parallel part of 16 clocks, so this will give a scalability on multicores of 9x on x86 multicores with DDR3 memory, this was the scalability prediction for "strings" with an average size of 10 characters, for a strings with bigger size it can scale more , but for integers sorting, two integers will be transfers from the memory to the CPU in 1 CPU clock before doing the compare(), and the compare() function between integers will take only one clock, so for integers it will take 7 CPU clocks for the parallel part and 1 CPU clock and one clock for the serial part , so after doing the Amdahl equation calculation for integers it will give a scalability on multicores to no more than around 8x for integers. So my Parallel Sort library is much more scalable than my Parallel Quicksort library... This very good to know... You can download my Parallel Sort library from: https://sites.google.com/site/aminer68/parallel-sort-library Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Oct 31 02:59PM -0700 Hello, I have just read the following PhD paper about NUMA-Aware Reader-Writer Locks... Here it is: http://mcg.cs.tau.ac.il/papers/ppopp2013-rwlocks.pdf You will notice that it is scaling better than the DV Read-Writer lock cause i think there is less NUMA nodes inthis Reader-Writer lock than CPU ids in the DV side, so it makes the DV side more expensive in the writer side. so it makes the serial part in the Amdahl equation bigger , so it makes the DV Reader-Writer lock less scalable... but the DV Reader-Writer lock is good also. But i have also invented and inplemented scalable RWLocks , here they are: https://sites.google.com/site/aminer68/scalable-rwlock The scalability of my scalable RWLocks is good at 1% to 3% of writes, even more, i have invented scalable and starvation-free RWLocks, you will find them inside the same zipfile and this makes my scalable RWLocks attractive and powerful i think. Please notice that the Reader-Writer lock that is included inside the Omnithread library is using an expensive atomic operation, on the reader side , so it makes it less scalable than my scalable RWLocks, so i advice you to use my scalable RWLocks cause they are more powerful and they are really good. You can download my scalable RWLocks: https://sites.google.com/site/aminer68/scalable-rwlock Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Oct 31 02:07PM -0700 Hello, My Parallel archiver is really more powerful than my Parallel compression library, so i will advice you to use my Parallel archiver instead of using my Parallel compression library. So hope i have brought to you an interesting and powerful parallel library to add to your libraries. You can download Parallel archiver 2.23 from: https://sites.google.com/site/aminer68/parallel-archiver Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Oct 31 01:39PM -0700 Hello, I have updated my Parallel archiver to version 2.23, i have forgot to update the AddStream() method, and i have do it, and now it is rock solid and stable. Please read this about my Parallel archiver: This software is provided on an "as-is" basis, with no warranties, express or implied. The entire risk and liability of using it is yours. Any damages resulting from the use or misuse of this software will be the responsibility of the user. You can download Parallel archiver 2.23 from: https://sites.google.com/site/aminer68/parallel-archiver Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment