soft and program: Digest for comp.programming.threads@googlegroups.com

comp.programming.threads@googlegroups.com

Google Groups

About my projects ... - 1 Update
cmsg cancel <n2qitl$vad$2@dont-email.me> - 3 Updates
Please read again, i correct - 1 Update
About my scalable conjugate gradient linear system solver library... - 1 Update

Ramine <ramine@1.1>: Nov 21 05:52PM -0800

Hello,

Now , ladies and gentlemen, you can be confident with all my projects
that are freewares, because i have thoroughly tested them and i think
they are stable and fast now , and i think i have corrected all the bugs
in them.

One last note, if you want to compile my projects on 64 bit compilers
and run them, you have to use the DLL called msvcr100.dll that you can
download from here:

https://sites.google.com/site/aminer68/needed-files

On 32 bit compilers you need msvcr110.dll that you will find inside the
zip files......

You need msvcr110.dll and msvcr100.dll because under Delphi XE versions
i am using the scalable tbbmalloc memory manager from Intel that is very
fast and efficient and that needs them...

You can download all my projects that are freewares from:

https://sites.google.com/site/aminer68/

So be happy with all my projects !

And feel free to port them to other programming languages !

Thank you,
Amine Moulay Ramdane.

cmsg cancel <n2qitl$vad$2@dont-email.me>

bleachbot <bleachbot@httrack.com>: Nov 21 09:09PM +0100

bleachbot <bleachbot@httrack.com>: Nov 21 09:18PM +0100

bleachbot <bleachbot@httrack.com>: Nov 21 11:49PM +0100

Please read again, i correct

Ramine <ramine@1.1>: Nov 21 03:20PM -0800

Please read again, i correct

Hello...

Today, ladies and gentlemen, i will talk a little bit about my scalable
conjugate gradient system solver library..

The important thing to understand is that it it is NUMA-aware and
scalable on NUMA architecture, because i am using two functions that
multiply a matrix by vector, so i have used a mechanism to distributed
equally the memory allocation of the rows of the matrix on different
NUMA nodes, and i have made my algorithm cache-aware, other than that i
have used a probabilistic mechanism to make it scalable on NUMA
architecture , this probabilistic mechanism does minimize at best the
contention points and it render my algorithm fully scalable on NUMA
architecture.

Hope you will be happy with my new scalable algorithm and my scalable
parallel library, frankly i think i have to write something like a PhD
paper to explain more my new scalable algorithm , but i will let it as
it is at this moment... perhaps i will do it in the near future.

This scalable Parallel library is especially designed for large scale
industrial engineering problems that you find on industrial Finite
element problems and such, this scalable Parallel library was ported to
FreePascal and all the Delphi XE versions and even to Delphi 7, hope you
will find it really good.

Here is the simulation program that uses the probabilistic mechanism
that i have talked about and that prove to you that my algorithm is
scalable:

If you look at my scalable parallel algorithm, it is dividing the each
array of the matrix by 250 elements, and if you look carefully i am
using two functions that consumes the greater part of all the CPU, it is
the atsub() and asub(), and inside those functions i am using a
probabilistic mechanism so that to render my algorithm scalable on NUMA
architecture, what i am doing is scrambling the array parts using a
probabilistic function and what i have noticed that this probabilistic
mechanism is very efficient, to prove to you what i am saying , please
look at the following simulation that i have done using a variable that
contains the number of NUMA nodes, and what i have noticed that my
simulation is giving almost a perfect scalability on NUMA architecture,
for example let us give to the "NUMA_nodes" variable a value of 4, and
to our array a value of 250, the simulation bellow will give a number of
contention points of a quarter of the array, so if i am using 16 cores ,
in the the worst case it will scale 4X throughput on NUMA architecture,
because since i am using an array of 250 and there is a quarter of the
array of contention points , so from the Amdahl's law this will give a
scalability of almost 4X throughput on four NUMA nodes, and this will
give almost a perfect scalability on more and more NUMA nodes, so my
parallel algorithm is scalable on NUMA architecture,

Here is the simulation that i have done, please run it and you will
notice yourself that my parallel algorithm is scalable on NUMA architecture.

Here it is:

---
program test;

uses math;

var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin

a:=250;
Numa_nodes:=4;

setlength(tab2,a);

for i:=0 to a-1
do
begin

tab2[i]:=i mod numa_nodes;

end;

setlength(tab,a);

randomize;

for k:=0 to a-1
do tab[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab[k];
tab[k]:=tab[n1];
tab[n1]:=tmp;
end;

setlength(tab1,a);

randomize;

for k:=0 to a-1
do tab1[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1[k];
tab1[k]:=tab1[n1];
tab1[n1]:=tmp;
end;

for i:=0 to a-1
do
if tab2[tab[i]]=tab2[tab1[i]] then
begin
inc(j);
writeln('A contention at: ',i);

end;

writeln('Number of contention points: ',j);
setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);
end.
---

You can download my Scalable Parallel Conjugate gradient solver library
from:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-...
Thank you for your time.

Amine Moulay Ramdane.

About my scalable conjugate gradient linear system solver library...

Ramine <ramine@1.1>: Nov 21 03:11PM -0800

Hello,

Today, ladies and gentlemen, i will talk a little bit about
my scalable conjugate gradient system solver library..

The important thing to understand is that it it is NUMA-aware
and scalable on NUMA architecture, because i am using
two functions that mutiply a matrix by vector, so i have used
a mechanism to distributed equally the memory allocation of
the rows of the matrix on different NUMA nodes, and
i have made my algorithm cache-aware, other than that i have
used a probabilistic mechanism to make it scalable on NUMA architecture
, this probalistic mechanism does minimize at best the contention points
and it render my algorithm fully scalable on NUMA architecture.

Hope you will be happy with my new scalable algorithm and my scalable
parallel library, frankly i think i have to write something like a PhD
paper to explain more my new scalable algorithm , but i will let it as
it is at this moment... perhaps i will do it in the near future.

This scalable Parallel library is especially designed for large scale
industrial engineering problems that you find on industrial Finite
element problems and such, this scalable Parallel library was ported to
FreePascal and all the Delphi XE versions and even to Delphi 7, hope you
will find it really good.

Here is the simulation program that uses the probabilistic mechanism
that i have talked about and that prove to you that my algorithm is
scalable:

If you look at my scalable parallel algorithm, it is dividing the each
array of the matrix by 250 elements, and if you look carefully i am
using two functions that consumes the greater part of all the CPU, it is
the atsub() and asub(), and inside those functions i am using a
probabilistic mechanism so that to render my algorithm scalable on NUMA
architecture, what i am doing is scrambling the array parts using a
probabilistic function and what i have noticed that this probabilitic
mechanism is very efficient, to prove to you what i am saying , please
look at the following simulation that i have done using a variable that
contains the number of NUMA nodes, and what i have noticed that my
simulation is giving almost a perfect scalability on NUMA architecture,
for example let us give to the "NUMA_nodes" variable a value of 4, and
to our array a value of 250, the simulation bellow will give a number of
contention points of a quarter of the array, so if i am using 16 cores ,
in the the worst case it will scale 4X throughput on NUMA architecture,
because since i am using an array of 250 and there is a quarter of the
array of contention points , so from the Amdahl's law this will give a
scalability of almost 4X throughput on four NUMA nodes, and this will
give almost a perfect scalability on more and more NUMA nodes, so my
parallel algorithm is scalable on NUMA architecture,

Here is the simulation that i have done, please run it and you will
notice yourself that my parallel algorithm is scalable on NUMA architecture.

Here it is:

---
program test;

uses math;

var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin

a:=250;
Numa_nodes:=4;

setlength(tab2,a);

for i:=0 to a-1
do
begin

tab2[i]:=i mod numa_nodes;

end;

setlength(tab,a);

randomize;

for k:=0 to a-1
do tab[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab[k];
tab[k]:=tab[n1];
tab[n1]:=tmp;
end;

setlength(tab1,a);

randomize;

for k:=0 to a-1
do tab1[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1[k];
tab1[k]:=tab1[n1];
tab1[n1]:=tmp;
end;

for i:=0 to a-1
do
if tab2[tab[i]]=tab2[tab1[i]] then
begin
inc(j);
writeln('A contention at: ',i);

end;

writeln('Number of contention points: ',j);
setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);
end.
---

You can download my Scalable Parallel Conjugate gradient solver library
from:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware

Thank you for your time.

Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Sunday, November 22, 2015

Digest for comp.programming.threads@googlegroups.com - 6 updates in 4 topics

No comments:

Blog Archive

About Me