soft and program: Digest for comp.programming.threads@googlegroups.com

comp.programming.threads@googlegroups.com

Google Groups

More political philosophy about do we have to be optimism or pessimism.. - 1 Update
Today i will talk about data dependency and parallel loops.. - 1 Update
Here is my new inventions that are my new variants of Scalable RWLocks that are powerful.. - 1 Update
My inventions that are my SemaMonitor and my SemaCondvar were updated to version 2.3 - 1 Update
About parallel programming and concurrency.. - 1 Update
About FlushProcessWriteBuffers() and IPIs.. - 1 Update

More political philosophy about do we have to be optimism or pessimism..

aminer68@gmail.com: Dec 03 02:51PM -0800

Hello,

More political philosophy about do we have to be optimism or pessimism..

As you have noticed i said yesterday the following:

-----------
I have just looked at the following video that i posted previously:

Is the world getting better or worse? A look at the numbers

https://www.youtube.com/watch?v=yCm9Ng0bbEQ

But i think that the man on the video is "too" optimistic, because
if you have noticed, i defined morality by the following abstraction: "Perfection at best", read my proof below about it, and since i am a more serious computer programmer, i think morality is like the software, since the software needs to be stable and needs to be perfection, so we, more serious software programmers, are not satisfied in computer programming if it is not following high standards,
so i think morality is the same , because we are like in a Safety-critical system that needs high standards of perfectioning, so we have
in our system to be those high standards of perfectioning in action.

-------------

So now you are understanding more the how to measure our satisfaction towards for example the governance, so i think that governance must be high standards of quality of governance, and our Elites must be high standards of quality, this is why i said above: "because we are like in a Safety-critical system that needs high standards of perfectioning, so we have in our system to be those high standards of perfectioning in action."

And look at the following arab proverb that explains more the above:

Also as you have noticed i said previously that:

There is an arab proverb that says in arabic:

"Laa taqoulou kaana abii, bal qoul haana da'"

And I translate it in english:

"Don't say my father was, but say that here i am"

This is also my philosophy, the meaning also of this arabic proverb is that we have to be less complains about past history that was also "full" of weaknesses, and we have today to construct a much more enhanced system than past history systems. This is my philosophy in life, this is why i must be more "professionalism" that knows more how to think more efficiently, and this is why i am a more serious computer developer, and this is why i invented many scalable algorithms and there implementations, and this is why i am thinking and writing my political philosophy and this is why i am thinking and writing my poems in front of you, this is how i am constructing the one that i call "myself", it is like the arab proverb above, because like in the proverb above, i am saying "here i am" by showing that i am more capable.

This arab proverb looks like my following writing were i said
that there is the engine of: "taking care of our image", and here is what i said:

More political philosophy about "Achieving Your Potential"..

I have just looked at the following video of
Garry Kasparov:

Garry Kasparov on "Achieving Your Potential"

https://www.youtube.com/watch?v=NPT0vg_Jl8Q

And i think that , in this video, Garry Kasparov is not so efficient
in his thinking, because like in psychology or philosophy
he must like shows the essence, i mean he must show the primitives
that permits to "achieve your potential", so i will explain:

First you have to know how to take care of your "image" , and
taking care of your image is also an engine that pushes you
forward towards more perfection, but there is also my Rule of:
"More perfection brings satisfaction" that is also an engine
that pushes you forward towards more and more perfection,
since also i said that this satisfaction is double satisfaction
as i explained it to you by saying:

"When you are preparing and cooking a beautiful Moroccan couscous and eating it, you will feel doubly satisfied by being satisfaction of being this more perfection of preparing and cooking the beautiful Moroccan couscous and you will also be satisfaction of eating it even if it comes with the "difficulty" of preparing and cooking and of learning how to prepare and to cook a beautiful Moroccan couscous. That's an efficient philosophy. And it is also my spirit."

But be smart and notice that the result of my Rule that is this satisfaction is also a balance of the individual satisfaction and the satisfaction of the society and the world, so it constrains your image to be in accordance with morality that takes care correctly of the society and the world, so i think that it becomes more clear that those two engines are really powerful too and they permit to achieve your potential.

And now more political philosophy about wisdom..

I think i am like a wise man type of person, but i have to show you more
what is a wise man, a wise man is a composition of both cultural superiority and genetical superiority that forms a wise man, and you will notice that a wise man is capable of good judgment , but i will first make you feel something important, look at the following song of UB40 - (I Can't Help) Falling In Love With You here:

https://www.youtube.com/watch?v=vUdloUqZa7w

As you are noticing the song is saying the following:

"Wise man said, only fools rush in !"

Do you understand it correctly ?

You have to be aware of one of the most important principle
of wisdom !

So i will easy the job for you and make you understand:

There is an important principle in Psychology and in Psychiatry that you have to know correctly to be able to understand, and here it is:

It is that you have to be able to transcend your natural instincts(from genetics) that are emotional and behavioral disorders to be able to be
reason and of a better quality !

Do you understand it correctly ?

And from this, we can logically infer more by asking the following question:

And what is those in the rest of the phenomenal world that look like those natural instincts that causes to be not able to be reason ?

Look for example at nationalism, a nationalist can for example be more virility that render him to be not able to be reason, and that's called "toxic" virility, and those are natural instincts that causes him to be not able to be reason, but he can be also lack of awareness or lack of quality that render him to be not able to be "wisdom", and this looks, and by "analogy", like natural instincts that causes to be not able to be reason, this is why we have to be careful about nationalism because i think that nationalists are in general more virile and there virility can become toxic virility that can render them to be not able to be wisdom, but notice with me that i have to define more what is "wisdom", here it is:

https://en.wikipedia.org/wiki/Wisdom

Other political parties can be more "tolerance"(and this can be caused
by lack of virility) that render them be not able to be wisdom, because this more tolerance in them can prevent them to be the right safety or
the right quality.

So now you are understanding that to be able to be wisdom
we have to be this right quality that permits us to be wisdom !

So now you are more understanding my above thoughts, so i will give an
example so that you understand more:

We are still hating white people because of past black slavery, but i think that this is not "wisdom", because black africans in the past time
of black slavery were considered to be much more "animals", because in the past time of black slavery, black africans were looking like animal monkeys to white people and black africans were considered by white people much less smart than humans that were defined as being white people, so white people, in the past time of black slavery, were "not" able to judge correctly because of lack of "science" and lack of quality, this is why they have practiced slavery on black africans, so we have today, and by understanding it, to be able to transcend this problem and by not being hate towards white people because of past slavery.

Thank you,
Amine Moulay Ramdane.

Today i will talk about data dependency and parallel loops..

aminer68@gmail.com: Dec 03 02:38PM -0800

Hello,

Today i will talk about data dependency and parallel loops..

For a loop to be parallelized, every iteration must be independent of the others, one way to be sure of it is to execute the loop
in the direction of the incremented index of the loop and in the direction of the decremented index of the loop and verify if the results are the same. A data dependency happens if memory is modified: a loop has a data dependency if an iteration writes a variable that is read or write in another iteration of the loop. There is no data dependency if only one iteration reads or writes a variable or if many iterations read
the same variable without modifying it. So this is the "general" "rules".

Now there remains to know that you have for example to know how to construct the parallel for loop if there is an induction variable or if there is a reduction operation, i will give an example of them:

If we have the following (the code looks like Algol or modern Object Pascal):

IND:=0

For I:=1 to N
Do
Begin
IND := IND + 1;
A[I]:=B[IND];
End;

So as you are noticing since IND is an induction variable , so
to parallelize the loop you have to do the following:

For I:=1 to N
Do
Begin
IND:=(I*(I+1))/2;
A[I]:=B[IND];
End;

Now for the reduction operation example, you will notice that my invention that is my Threadpool with priorities that scales very well (
read about it below) supports a Parallel For that scales very well that supports "grainsize", and you will notice that the grainsize can be used in the ParallelFor() with a reduction operation and you will notice that my following powerful scalable Adder is also used in this scenario, here it is:

https://sites.google.com/site/scalable68/scalable-adder-for-delphi-and-freepascal

So here is the example with a reduction operation in modern Object Pascal:

TOTAL:=0.0
For I := 1 to N
Do
Begin
TOTAL:=TOTAL+A[I]
End;

So with my powerful scalable Adder and with my powerful invention that is my ParallelFor() that scales very well, you will parallelize the above like this:

procedure test1(j:integer;ptr:pointer);
begin

t.add(A[J]); // "t" is my scalable Adder object

end;

// Let's suppose that N is 100000
// In the following, 10000 is the grainsize

obj.ParallelFor(1,N,test1,10000,pointer(0));

TOTAL:=T.get();

And read the following to understand how to use grainsize of my Parallel for that scales well:

About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well:

With ParallelFor() you have to:

1- Ensure Sufficient Work

Each iteration of a loop involves a certain amount of work,
so you have to ensure a sufficient amount of the work,
read below about "grainsize" that i have implemented.

2- In OpenMP we have that:

Static and Dynamic Scheduling

One basic characteristic of a loop schedule is whether it is static or dynamic:

• In a static schedule, the choice of which thread performs a particular
iteration is purely a function of the iteration number and number of
threads. Each thread performs only the iterations assigned to it at the
beginning of the loop.

• In a dynamic schedule, the assignment of iterations to threads can
vary at runtime from one execution to another. Not all iterations are
assigned to threads at the start of the loop. Instead, each thread
requests more iterations after it has completed the work already
assigned to it.

But with my ParallelFor() that scales very well, since it is using my efficient Threadpool that scales very well, so it is using Round-robin scheduling and it uses also work stealing, so i think that this is sufficient.

Read the rest:

My Threadpool engine with priorities that scales very well is really powerful because it scales very well on multicore and NUMA systems, also it comes with a ParallelFor() that scales very well on multicores and NUMA systems.

You can download it from:

https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well

Here is the explanation of my ParallelFor() that scales very well:

I have also implemented a ParallelFor() that scales very well, here is the method:

procedure ParallelFor(nMin, nMax:integer;aProc: TParallelProc;GrainSize:integer=1;Ptr:pointer=nil;pmode:TParallelMode=pmBlocking;Priority:TPriorities=NORMAL_PRIORITY);

nMin and nMax parameters of the ParallelFor() are the minimum and maximum integer values of the variable of the ParallelFor() loop, aProc parameter of ParallelFor() is the procedure to call, and GrainSize integer parameter of ParallelFor() is the following:

The grainsize sets a minimum threshold for parallelization.

A rule of thumb is that grainsize iterations should take at least 100,000 clock cycles to execute.

For example, if a single iteration takes 100 clocks, then the grainsize needs to be at least 1000 iterations. When in doubt, do the following experiment:

1- Set the grainsize parameter higher than necessary. The grainsize is specified in units of loop iterations.

If you have no idea of how many clock cycles an iteration might take, start with grainsize=100,000.

The rationale is that each iteration normally requires at least one clock per iteration. In most cases, step 3 will guide you to a much smaller value.

2- Run your algorithm.

3- Iteratively halve the grainsize parameter and see how much the algorithm slows down or speeds up as the value decreases.

A drawback of setting a grainsize too high is that it can reduce parallelism. For example, if the grainsize is 1000 and the loop has 2000 iterations, the ParallelFor() method distributes the loop across only two processors, even if more are available.

And you can pass a parameter in Ptr as pointer to ParallelFor(), and you can set pmode parameter of to pmBlocking so that ParallelFor() is blocking or to pmNonBlocking so that ParallelFor() is non-blocking, and the Priority parameter is the priority of ParallelFor(). Look inside the test.pas example to see how to use it.

Thank you,
Amine Moulay Ramdane.

Here is my new inventions that are my new variants of Scalable RWLocks that are powerful..

aminer68@gmail.com: Dec 03 01:57PM -0800

Hello,

Here is my new inventions that are my new variants of Scalable RWLocks that are powerful..

Author: Amine Moulay Ramdane

Description:

A fast, and scalable and starvation-free and fair and lightweight Multiple-Readers-Exclusive-Writer Lock called LW_RWLockX, the scalable LW_RWLockX does spin-wait, and also a fast and scalable and starvation-free and fair Multiple-Readers-Exclusive-Writer Lock called RWLockX, the scalable RWLockX doesn't spin-wait but uses my portable SemaMonitor and portable event objects , so it is energy efficient.

The parameter of the constructors is the size of the array of the readers , so if the size of the array is equal to the number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention, please look at the source code of my scalable algorithms to understand.

I have used my following hash function to make my new variants of RWLocks scalable:

---

function DJB2aHash(key:int64):uint64;
var
i: integer;
key1:uint64;

begin
Result := 5381;
for i := 1 to 8 do
begin
key1:=(key shr ((i-1)*8)) and $00000000000000ff;
Result := ((Result shl 5) xor Result) xor key1;
end;
end;

---

You can download them from:

https://sites.google.com/site/scalable68/new-variants-of-scalable-rwlocks

Thank you,
Amine Moulay Ramdane.

My inventions that are my SemaMonitor and my SemaCondvar were updated to version 2.3

aminer68@gmail.com: Dec 03 01:54PM -0800

Hello,

My inventions that are my SemaMonitor and my SemaCondvar were updated to version 2.3, they have become efficient and powerful, please read the readme file to know more about the changes, and i have implemented an efficient Monitor over my SemaCondvar. Here is the description of my efficient Monitor inside the Monitor.pas file that you will find inside the zip file:

Description:

This is my implementation of a Monitor over my SemaCondvar.

You will find the Monitor class inside the Monitor.pas file inside the zip file.

When you set the first parameter of the constructor to true, the signal will not be lost if the threads are not waiting with wait() method, but when you set the first parameter of the construtor to false, if the threads are not waiting with the wait() method, the signal will be lost..

Second parameter of the constructor is the kind of Lock, you can
set it to ctMLock to use my scalable node based lock called MLock, or you can set it to ctMutex to use a Mutex or you can set it to ctCriticalSection to use the TCriticalSection.

Here is the methods of my efficient Monitor that i have implemented:

TMonitor = class
private
cache0:typecache0;
lock1:TSyncLock;
obj:TSemaCondvar;
cache1:typecache0;

public

constructor Create(bool:boolean=true;lock:TMyLocks=ctMLock);
destructor Destroy; override;
procedure Enter();
procedure Leave();
function Signal():boolean;overload;
function Signal(nbr:long;var remains:long):boolean;overload;
procedure Signal_All();
function Wait(const AMilliseconds:longword=INFINITE): boolean;
function WaitersBlocked():long;

end;

The wait() method is for the threads to wait on the Monitor object for the signal to be signaled. If wait() fails, that can be that the number of waiters is greater than high(longword).

And the signal() method will signal one time a waiting thread on the Monitor object, but if signal() fails , the returned value is false.

the signal_all() method will signal all the waiting threads on the Monitor object.

The signal(nbr:long;var remains:long) method will signal nbr of waiting threads, but if signal() fails, the remaining number of signals that were not signaled will be returned in the remains variable.

and WaitersBlocked() will return the number of waiting threads on the Monitor object.

and Enter() and Leave() methods to enter and leave the monitor's Lock.

You can download the zip files from:

https://sites.google.com/site/scalable68/semacondvar-semamonitor

and the lightweight version is here:

https://sites.google.com/site/scalable68/light-weight-semacondvar-semamonitor

Thank you,
Amine Moulay Ramdane.

About parallel programming and concurrency..

aminer68@gmail.com: Dec 03 01:50PM -0800

Hello,

About parallel programming and concurrency..

Look at the following concurrency abstractions of microsoft:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitany?view=netframework-4.8

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.8

I will soon implement waitany() and waitall() concurrency abstractions for Delphi and Freepascal, with the timeout in milliseconds of course, and they will work with my efficient implementation of a Future, so you can will be able to wait for many futures with waitany() and waitall().

And about task canceletion like in microsoft TPL, i think it is
not a good abstraction, because how do you know when you have to efficiently cancel a task or tasks ? so you are understanding that task cancelation is not a so efficient abstraction , so i will not implement it, because i think the waitany() and waitall() with Futures with the "timeout" in milliseconds are good concurrency abstractions.

Thank you,
Amine Moulay Ramdane.

About FlushProcessWriteBuffers() and IPIs..

aminer68@gmail.com: Dec 03 01:39PM -0800

Hello,

About FlushProcessWriteBuffers() and IPIs..

It seems that the implementation of the sys_membarrier on Linux 4.3 is too slow. Starting with kernel 4.14, there is a new flag MEMBARRIER_CMD_PRIVATE_EXPEDITED that enables much faster implementation of the syscall using IPI.

See https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

for some details.

And read the following about Userspace RCU, it is also using IPIs:

membarrier system call performance and the future of Userspace RCU on Linux

Read more here:

https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself. However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs.

When i do simple synthetic test on a dual core machine I've obtained following numbers.

420 cycles is the minimum cost of the FlushProcessWriteBuffers() function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on remote core.

Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.

You can download my inventions of scalable Asymmetric RWLocks that use IPIs and that are costless on the reader side from here:

https://sites.google.com/site/scalable68/scalable-rwlock

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Wednesday, December 4, 2019

Digest for comp.programming.threads@googlegroups.com - 6 updates in 6 topics

No comments:

Blog Archive

About Me