soft and program: Digest for comp.programming.threads@googlegroups.com

comp.programming.threads@googlegroups.com

Google Groups

About task-based scheduler - 1 Update
Let's talk more computer science - 1 Update
Sequantial consistency - 1 Update
Parallel sort and more about scalability - 1 Update

Ramine <ramine@1.1>: Jan 02 08:54PM -0800

Hello,

I have just read the following webpage by Dmitry Vyukov:

http://www.1024cores.net/home/parallel-computing/radix-sort

Please read the "Scheduling" part, you will notice that Dmitry Vyukov
have implemented a task-based scheduler on top of the Win32 threading
API. In main part it's similar to classical Cilk-style work-stealing
scheduler that supports system-topology awareness, hyper-threading
awareness, affinity-awareness, batch-spawn capability and manual
task-depth control, but let us ask an important question: Why
Dmitry Vyukov have invented this task based scheduler ? of course
it is to optimize more , but the question that i am asking is
that i think that this task-based scheduler that look like the Cilk
scheduler is especially designed for "recursive" calls in parallel
programs, so if you take a look at my new parallel sort algorithm,
its parallel sorting part is not using recursive calls and its parallel
merging part is not using recursive calls, so i don't think Dmitry's
Scheduler will bring improvement to my parallel algorithm, other
than that even if i rewrite my parallel algorithm to use recursive calls
this will not bring improvement to "big" data and large industrial data
and such, cause what will do the Dmitry's scheduler is to optimize for
a small part of the overall big data, so the Dmitry's scheduler is not
suitable for general cases, so i don't think i will implement such
scheduler , so i will stay with my current threadpool that i have
implemented and used in my parallel algorithms.

Thank you for your time.

Amine Moulay Ramdane.

Let's talk more computer science

Ramine <ramine@1.1>: Jan 02 05:53PM -0800

Hello...

Let's talk more computer science...

Finally i have arrived to an important subject that is "optimization",
and you will notice that computer science learns you one
of the most important thing, is how to amortize and optimize by
optimizing more your time complexity that we express mathematically with
a O(), this is how i have done it with my new parallel sort algorithm
that has a "very" good time complexity, and what learns you also
computer science is how to optimize more in the source code's part or
parts that takes the greater percentage of running time, and what learns
you also computer science is to do the greater percentage of all the
optimizations and to not do the smaller optimizations, this is also a
good way to follow to optimize your code, so if you look at my
parallel conjugate gradient solver library , i am not using SIMD SSE2
instructions , but i am just parallelizing the memory transfers cause my
parallel algorithm is NUMA-aware and i am also parallelizing also the
multiplication and addition of the double's type etc. and also my
algorithm is cache-aware so i am also parallelizing more the L2 cache
memory transfers , so this optimization will make it a very greater
percentage of all the optimization that we can do on my parallel
algorithm, so in large scale industrial problems i don't think that my
parallel conjugate gradient algorithm needs to do SIMD SSE2 or AVX
optimization , cause SIMD SSE2 or AVX will give just a small improvement
to my parallel algorithm.

You can download my Parallel conjugate gradient linear system solver
library that is NUMA-aware and cache-aware from:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware

Thank you for your time.

Amine Moulay Ramdane.

Sequantial consistency

Ramine <ramine@1.1>: Jan 02 04:37PM -0800

Hello...

As you have seen me doing in this forum, i have invented in front of you
my Scalable distributed sequential lock and i have also did a
proof of correctness and a proof sequential consistency correctness,
but my proof apply only to FreePascal and Lazarus and Delphi compilers
that follow and respect the strong memory model of the x86 architecture
and that don't force on you a weak memory model as is doing C++, but if
you are using GCC C++ or Intel C++ or Microsoft VC C++ , you have to be
carefull cause at the -O1 and -O2 and -O3 levels of optimization those
C++ compilers do reorder instructions, so if you need to port my
algorithm to C++ you have to insert memory ordering constraints
instructions... but i have to be frank with you, cause i have took a
look at the C++11 and what it is doing is forcing on us the weak memory
model even if we are running on x86, and that's too sad ! cause by doing
so the C++ compilers are complicating what have tried to easy and
simplify the x86 architecture that is using a strong memory model.

But anyway hope you have undertood this post correctly...

You can download my scalable distributed sequential lock from:

https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock

Thank you for your time.

Amine Moulay Ramdane.

Parallel sort and more about scalability

Ramine <ramine@1.1>: Jan 02 03:29PM -0800

Hello,

I have to be frank, we have to be smart when inventing or doing parallel
sort algorithms, my new parallel sort algorithm is smart, cause it is
more cache-aware, but you have to be carefull cause look
at my other parallel quicksort algorithm here:

https://sites.google.com/site/aminer68/parallel-quicksort

You have to know that this parallel quicksort that uses the
classical way of sorting is not so good, cause when its partition
fonction is used recursively, this parallel quicksort algorithm will
dispatch the arrays to be partitioned each time to different threads,
and this will make it less cache-aware than my new parallel sort
algorithm, and since it will make it less cache-aware , so you will
not get much than 3X scalability by sorting strings and you will get
less that 3X scalability by sorting integers or doubles for example, So
i advice you to use my new parallel sort algorithm of my parallel sort
library version 3.3 that is more cache-aware and that gives you super
linear scalability when sorting strings and it gives you good
scalability when sorting integers and doubles etc.

You can download my new Parallel Sort library version 3.3 from:

https://sites.google.com/site/aminer68/parallel-sort-library

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Saturday, January 3, 2015

Digest for comp.programming.threads@googlegroups.com - 4 updates in 4 topics

No comments:

Blog Archive

About Me