Sunday, September 22, 2019

Digest for comp.programming.threads@googlegroups.com - 3 updates in 3 topics

aminer68@gmail.com: Sep 21 03:28PM -0700

Hello,
 
 
 
Here is my new poem of Love that i have just written:
 
 
You are like my so beautiful paradise
 
And it is truly beautifully concise
 
You are like my so beautiful paradise
 
Since our love is not random from a rolling dice
 
You are like my so beautiful paradise
 
Since our love is hence beautifully precise
 
You are like my so beautiful paradise
 
Because our love knows how to be a beautiful sunrise
 
You are like my so beautiful paradise
 
Because our love knows how to empathize and to sympathize
 
You are like my so beautiful paradise
 
Since our love wants to immunize against the demoralize
 
You are like my so beautiful paradise
 
Because our love is far from the evil and the immoral vice
 
You are like my so beautiful paradise
 
Because our love is our fair price that is beautifully nice
 
You are like my so beautiful paradise
 
Because our love knows how to beautifully professionalize
 
You are like my so beautiful paradise
 
Since our love wants to beautifully "revolutionize"
 
 
 
Thank you,
Amine Moulay Ramdane.
aminer68@gmail.com: Sep 21 03:20PM -0700

Hello,
 
 
Today i will talk about data dependency and parallel loops..
 
For a loop to be parallelized, every iteration must be independent of the others, one way to be sure of it is to execute the loop
in the direction of the incremented index of the loop and in the direction of the decremented index of the loop and verify if the results are the same. A data dependency happens if memory is modified: a loop has a data dependency if an iteration writes a variable that is read or write in another iteration of the loop. There is no data dependency if only one iteration reads or writes a variable or if many iterations read
the same variable without modifying it. So this is the "general" "rules".
 
Now there remains to know that you have for example to know how to construct the parallel for loop if there is an induction variable or if there is a reduction operation, i will give an example of them:
 
If we have the following (the code looks like Algol or modern Object Pascal):
 
 
IND:=0
 
For I:=1 to N
Do
Begin
IND := IND + 1;
A[I]:=B[IND];
End;
 
 
So as you are noticing since IND is an induction variable , so
to parallelize the loop you have to do the following:
 
 
For I:=1 to N
Do
Begin
IND:=(I*(I+1))/2;
A[I]:=B[IND];
End;
 
 
Now for the reduction operation example, you will notice that my invention that is my Threadpool with priorities that scales very well (
read about it below) supports a Parallel For that scales very well that supports "grainsize", and you will notice that the grainsize can be used in the ParallelFor() with a reduction operation and you will notice that my following powerful scalable Adder is also used in this scenario, here it is:
 
https://sites.google.com/site/scalable68/scalable-adder-for-delphi-and-freepascal
 
 
So here is the example with a reduction operation in modern Object Pascal:
 
 
TOTAL:=0.0
For I := 1 to N
Do
Begin
TOTAL:=TOTAL+A[I]
End;
 
 
So with my powerful scalable Adder and with my powerful invention that is my ParallelFor() that scales very well, you will parallelize the above like this:
 
procedure test1(j:integer;ptr:pointer);
begin
 
t.add(A[J]); // "t" is my scalable Adder object
 
end;
 
// Let's suppose that N is 100000
// In the following, 10000 is the grainsize
 
obj.ParallelFor(1,N,test1,10000,pointer(0));
 
TOTAL:=T.get();
 
 
And read the following to understand how to use grainsize of my Parallel for that scales well:
 
 
About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well:
 
With ParallelFor() you have to:
 
1- Ensure Sufficient Work
 
Each iteration of a loop involves a certain amount of work,
so you have to ensure a sufficient amount of the work,
read below about "grainsize" that i have implemented.
 
2- In OpenMP we have that:
 
Static and Dynamic Scheduling
 
One basic characteristic of a loop schedule is whether it is static or dynamic:
 
• In a static schedule, the choice of which thread performs a particular
iteration is purely a function of the iteration number and number of
threads. Each thread performs only the iterations assigned to it at the
beginning of the loop.
 
• In a dynamic schedule, the assignment of iterations to threads can
vary at runtime from one execution to another. Not all iterations are
assigned to threads at the start of the loop. Instead, each thread
requests more iterations after it has completed the work already
assigned to it.
 
But with my ParallelFor() that scales very well, since it is using my efficient Threadpool that scales very well, so it is using Round-robin scheduling and it uses also work stealing, so i think that this is sufficient.
 
Read the rest:
 
My Threadpool engine with priorities that scales very well is really powerful because it scales very well on multicore and NUMA systems, also it comes with a ParallelFor() that scales very well on multicores and NUMA systems.
 
You can download it from:
 
https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well
 
 
Here is the explanation of my ParallelFor() that scales very well:
 
I have also implemented a ParallelFor() that scales very well, here is the method:
 
procedure ParallelFor(nMin, nMax:integer;aProc: TParallelProc;GrainSize:integer=1;Ptr:pointer=nil;pmode:TParallelMode=pmBlocking;Priority:TPriorities=NORMAL_PRIORITY);
 
nMin and nMax parameters of the ParallelFor() are the minimum and maximum integer values of the variable of the ParallelFor() loop, aProc parameter of ParallelFor() is the procedure to call, and GrainSize integer parameter of ParallelFor() is the following:
 
The grainsize sets a minimum threshold for parallelization.
 
A rule of thumb is that grainsize iterations should take at least 100,000 clock cycles to execute.
 
For example, if a single iteration takes 100 clocks, then the grainsize needs to be at least 1000 iterations. When in doubt, do the following experiment:
 
1- Set the grainsize parameter higher than necessary. The grainsize is specified in units of loop iterations.
 
If you have no idea of how many clock cycles an iteration might take, start with grainsize=100,000.
 
The rationale is that each iteration normally requires at least one clock per iteration. In most cases, step 3 will guide you to a much smaller value.
 
2- Run your algorithm.
 
3- Iteratively halve the grainsize parameter and see how much the algorithm slows down or speeds up as the value decreases.
 
A drawback of setting a grainsize too high is that it can reduce parallelism. For example, if the grainsize is 1000 and the loop has 2000 iterations, the ParallelFor() method distributes the loop across only two processors, even if more are available.
 
And you can pass a parameter in Ptr as pointer to ParallelFor(), and you can set pmode parameter of to pmBlocking so that ParallelFor() is blocking or to pmNonBlocking so that ParallelFor() is non-blocking, and the Priority parameter is the priority of ParallelFor(). Look inside the test.pas example to see how to use it.
 
 
Thank you,
Amine Moulay Ramdane.
aminer68@gmail.com: Sep 21 10:31AM -0700

Hello,
 
 
Researchers demonstrate the first hardware for a 'probabilistic computer'
 
Engineers at Purdue University and Tohoku University in Japan have built the first hardware to demonstrate how the fundamental units of what would be a probabilistic computer -- called p-bits -- are capable of performing a calculation that quantum computers would usually be called upon to perform.
 
Read more here:
 
https://www.sciencedaily.com/releases/2019/09/190918131437.htm
 
 
Thank you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: