- About task-based scheduler - 1 Update
- Let's talk more computer science - 1 Update
- Sequantial consistency - 1 Update
- Parallel sort and more about scalability - 1 Update
Ramine <ramine@1.1>: Jan 02 08:54PM -0800 Hello, I have just read the following webpage by Dmitry Vyukov: http://www.1024cores.net/home/parallel-computing/radix-sort Please read the "Scheduling" part, you will notice that Dmitry Vyukov have implemented a task-based scheduler on top of the Win32 threading API. In main part it's similar to classical Cilk-style work-stealing scheduler that supports system-topology awareness, hyper-threading awareness, affinity-awareness, batch-spawn capability and manual task-depth control, but let us ask an important question: Why Dmitry Vyukov have invented this task based scheduler ? of course it is to optimize more , but the question that i am asking is that i think that this task-based scheduler that look like the Cilk scheduler is especially designed for "recursive" calls in parallel programs, so if you take a look at my new parallel sort algorithm, its parallel sorting part is not using recursive calls and its parallel merging part is not using recursive calls, so i don't think Dmitry's Scheduler will bring improvement to my parallel algorithm, other than that even if i rewrite my parallel algorithm to use recursive calls this will not bring improvement to "big" data and large industrial data and such, cause what will do the Dmitry's scheduler is to optimize for a small part of the overall big data, so the Dmitry's scheduler is not suitable for general cases, so i don't think i will implement such scheduler , so i will stay with my current threadpool that i have implemented and used in my parallel algorithms. Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Jan 02 05:53PM -0800 Hello... Let's talk more computer science... Finally i have arrived to an important subject that is "optimization", and you will notice that computer science learns you one of the most important thing, is how to amortize and optimize by optimizing more your time complexity that we express mathematically with a O(), this is how i have done it with my new parallel sort algorithm that has a "very" good time complexity, and what learns you also computer science is how to optimize more in the source code's part or parts that takes the greater percentage of running time, and what learns you also computer science is to do the greater percentage of all the optimizations and to not do the smaller optimizations, this is also a good way to follow to optimize your code, so if you look at my parallel conjugate gradient solver library , i am not using SIMD SSE2 instructions , but i am just parallelizing the memory transfers cause my parallel algorithm is NUMA-aware and i am also parallelizing also the multiplication and addition of the double's type etc. and also my algorithm is cache-aware so i am also parallelizing more the L2 cache memory transfers , so this optimization will make it a very greater percentage of all the optimization that we can do on my parallel algorithm, so in large scale industrial problems i don't think that my parallel conjugate gradient algorithm needs to do SIMD SSE2 or AVX optimization , cause SIMD SSE2 or AVX will give just a small improvement to my parallel algorithm. You can download my Parallel conjugate gradient linear system solver library that is NUMA-aware and cache-aware from: https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Jan 02 04:37PM -0800 Hello... As you have seen me doing in this forum, i have invented in front of you my Scalable distributed sequential lock and i have also did a proof of correctness and a proof sequential consistency correctness, but my proof apply only to FreePascal and Lazarus and Delphi compilers that follow and respect the strong memory model of the x86 architecture and that don't force on you a weak memory model as is doing C++, but if you are using GCC C++ or Intel C++ or Microsoft VC C++ , you have to be carefull cause at the -O1 and -O2 and -O3 levels of optimization those C++ compilers do reorder instructions, so if you need to port my algorithm to C++ you have to insert memory ordering constraints instructions... but i have to be frank with you, cause i have took a look at the C++11 and what it is doing is forcing on us the weak memory model even if we are running on x86, and that's too sad ! cause by doing so the C++ compilers are complicating what have tried to easy and simplify the x86 architecture that is using a strong memory model. But anyway hope you have undertood this post correctly... You can download my scalable distributed sequential lock from: https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Jan 02 03:29PM -0800 Hello, I have to be frank, we have to be smart when inventing or doing parallel sort algorithms, my new parallel sort algorithm is smart, cause it is more cache-aware, but you have to be carefull cause look at my other parallel quicksort algorithm here: https://sites.google.com/site/aminer68/parallel-quicksort You have to know that this parallel quicksort that uses the classical way of sorting is not so good, cause when its partition fonction is used recursively, this parallel quicksort algorithm will dispatch the arrays to be partitioned each time to different threads, and this will make it less cache-aware than my new parallel sort algorithm, and since it will make it less cache-aware , so you will not get much than 3X scalability by sorting strings and you will get less that 3X scalability by sorting integers or doubles for example, So i advice you to use my new parallel sort algorithm of my parallel sort library version 3.3 that is more cache-aware and that gives you super linear scalability when sorting strings and it gives you good scalability when sorting integers and doubles etc. You can download my new Parallel Sort library version 3.3 from: https://sites.google.com/site/aminer68/parallel-sort-library Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment