- About my projects ... - 1 Update
- cmsg cancel <n2qitl$vad$2@dont-email.me> - 3 Updates
- Please read again, i correct - 1 Update
- About my scalable conjugate gradient linear system solver library... - 1 Update
Ramine <ramine@1.1>: Nov 21 05:52PM -0800 Hello, Now , ladies and gentlemen, you can be confident with all my projects that are freewares, because i have thoroughly tested them and i think they are stable and fast now , and i think i have corrected all the bugs in them. One last note, if you want to compile my projects on 64 bit compilers and run them, you have to use the DLL called msvcr100.dll that you can download from here: https://sites.google.com/site/aminer68/needed-files On 32 bit compilers you need msvcr110.dll that you will find inside the zip files...... You need msvcr110.dll and msvcr100.dll because under Delphi XE versions i am using the scalable tbbmalloc memory manager from Intel that is very fast and efficient and that needs them... You can download all my projects that are freewares from: https://sites.google.com/site/aminer68/ So be happy with all my projects ! And feel free to port them to other programming languages ! Thank you, Amine Moulay Ramdane. |
bleachbot <bleachbot@httrack.com>: Nov 21 09:09PM +0100 |
bleachbot <bleachbot@httrack.com>: Nov 21 09:18PM +0100 |
bleachbot <bleachbot@httrack.com>: Nov 21 11:49PM +0100 |
Ramine <ramine@1.1>: Nov 21 03:20PM -0800 Please read again, i correct Hello... Today, ladies and gentlemen, i will talk a little bit about my scalable conjugate gradient system solver library.. The important thing to understand is that it it is NUMA-aware and scalable on NUMA architecture, because i am using two functions that multiply a matrix by vector, so i have used a mechanism to distributed equally the memory allocation of the rows of the matrix on different NUMA nodes, and i have made my algorithm cache-aware, other than that i have used a probabilistic mechanism to make it scalable on NUMA architecture , this probabilistic mechanism does minimize at best the contention points and it render my algorithm fully scalable on NUMA architecture. Hope you will be happy with my new scalable algorithm and my scalable parallel library, frankly i think i have to write something like a PhD paper to explain more my new scalable algorithm , but i will let it as it is at this moment... perhaps i will do it in the near future. This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to FreePascal and all the Delphi XE versions and even to Delphi 7, hope you will find it really good. Here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm is scalable: If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture, what i am doing is scrambling the array parts using a probabilistic function and what i have noticed that this probabilistic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so if i am using 16 cores , in the the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture, Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture. Here it is: --- program test; uses math; var tab,tab1,tab2,tab3:array of integer; a,n1,k,i,n2,tmp,j,numa_nodes:integer; begin a:=250; Numa_nodes:=4; setlength(tab2,a); for i:=0 to a-1 do begin tab2[i]:=i mod numa_nodes; end; setlength(tab,a); randomize; for k:=0 to a-1 do tab[k]:=k; n2:=a-1; for k:=0 to a-1 do begin n1:=random(n2); tmp:=tab[k]; tab[k]:=tab[n1]; tab[n1]:=tmp; end; setlength(tab1,a); randomize; for k:=0 to a-1 do tab1[k]:=k; n2:=a-1; for k:=0 to a-1 do begin n1:=random(n2); tmp:=tab1[k]; tab1[k]:=tab1[n1]; tab1[n1]:=tmp; end; for i:=0 to a-1 do if tab2[tab[i]]=tab2[tab1[i]] then begin inc(j); writeln('A contention at: ',i); end; writeln('Number of contention points: ',j); setlength(tab,0); setlength(tab1,0); setlength(tab2,0); end. --- You can download my Scalable Parallel Conjugate gradient solver library from: https://sites.google.com/site/aminer68/scalable-parallel-implementation-... Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 21 03:11PM -0800 Hello, Today, ladies and gentlemen, i will talk a little bit about my scalable conjugate gradient system solver library.. The important thing to understand is that it it is NUMA-aware and scalable on NUMA architecture, because i am using two functions that mutiply a matrix by vector, so i have used a mechanism to distributed equally the memory allocation of the rows of the matrix on different NUMA nodes, and i have made my algorithm cache-aware, other than that i have used a probabilistic mechanism to make it scalable on NUMA architecture , this probalistic mechanism does minimize at best the contention points and it render my algorithm fully scalable on NUMA architecture. Hope you will be happy with my new scalable algorithm and my scalable parallel library, frankly i think i have to write something like a PhD paper to explain more my new scalable algorithm , but i will let it as it is at this moment... perhaps i will do it in the near future. This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to FreePascal and all the Delphi XE versions and even to Delphi 7, hope you will find it really good. Here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm is scalable: If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture, what i am doing is scrambling the array parts using a probabilistic function and what i have noticed that this probabilitic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so if i am using 16 cores , in the the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture, Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture. Here it is: --- program test; uses math; var tab,tab1,tab2,tab3:array of integer; a,n1,k,i,n2,tmp,j,numa_nodes:integer; begin a:=250; Numa_nodes:=4; setlength(tab2,a); for i:=0 to a-1 do begin tab2[i]:=i mod numa_nodes; end; setlength(tab,a); randomize; for k:=0 to a-1 do tab[k]:=k; n2:=a-1; for k:=0 to a-1 do begin n1:=random(n2); tmp:=tab[k]; tab[k]:=tab[n1]; tab[n1]:=tmp; end; setlength(tab1,a); randomize; for k:=0 to a-1 do tab1[k]:=k; n2:=a-1; for k:=0 to a-1 do begin n1:=random(n2); tmp:=tab1[k]; tab1[k]:=tab1[n1]; tab1[n1]:=tmp; end; for i:=0 to a-1 do if tab2[tab[i]]=tab2[tab1[i]] then begin inc(j); writeln('A contention at: ',i); end; writeln('Number of contention points: ',j); setlength(tab,0); setlength(tab1,0); setlength(tab2,0); end. --- You can download my Scalable Parallel Conjugate gradient solver library from: https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware Thank you for your time. Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment