Monday, December 8, 2014

Digest for comp.programming.threads@googlegroups.com - 18 updates in 13 topics

comp.programming.threads@googlegroups.com Google Groups
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com.
Ramine <ramine@1.1>: Dec 07 08:40PM -0800

Hello...
 
 
I think i have found what is intelligence...
 
 
When you look at neural networks, you will notice that there
is neural networks that that can "learn" , so that the speed
of "recognition" will become faster, problem solving in intelligence
look like this , cause the way the neurals in the brain are
interconnected does construct a different kind of architecture in the
brain in each person, the way neurals in the brain are interconnected
can also make the brain very fast, and the searching mechanism of the
brain is not like a Breadth first search and deep first search , the
neurals in the brain are interconnected in a way that can construct more
faster architectures in the brain that don't use the same
searching technics as Breadth first search and deep first search ,
but they use different ways of interconnecting the neurals of the brain
so that the solution of the problem can come very fast , and this is
what we call intelligence.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 07:25PM -0800

Hello,
 
 
I have come to an interresting subject...
 
I have thought for a long time to what is exactly is intelligence..
 
An here is my explanation of what is intelligence...
 
 
When you see and say in mathematics: A < B and B < C => A < C
 
I think problem solving in intelligence is in greater part due and
influenced by the factor of "speed" of the brain.. because a fast brain
will Breadth first search and deep first search the above mathematical
implication faster, the brain will see
A < B faster and B < C faster and will see there implication
A < C faster that's is the big part of problem solving in intelligence,
because the act of seeing the "bigger" or the "smaller" is something
easy for humans, so i think that humans compose problem solving in
intelligence by a divide and conquer way , but the "big" factor
that influence problem solving in intelligence is the "speed" of the
brain. This is my explanation of what is intelligence.
 
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 07:22PM -0800

Hello,
 
 
I have come to an interresting subject...
 
I have thought for a long time to what is exactly is intelligence..
 
An here is my explanation of what is intelligence...
 
 
When you see and say in mathematics: A < B and B < C => A < C
 
I think problem solving in intelligence is in greater part due and
influenced by the factor of "speed" of the brain.. because a fast brain
will Breadth first search and deep first search the above mathematical
implication faster, the brain will see
A < B faster and B < C faster and will see there implication
A < C faster that's is the big part of problem solving in intelligence,
because the act of seeing the "bigger" of the "smaller" is something
easy for humans, so i think that humans compose problem solving in
intelligence by a divide and conquer way , but the "big" factor
that influence problem solving in intelligence is the "speed" of the
brain. This is my explanation of what is intelligence.
 
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 06:42PM -0800

Hello,
 
You will say that i am a smart person , this is why i have
understood parallel programming like i have understood it...
 
Perhaps... but when i have took IQ tests i have noticed
that i am not good at the "speed" of solving problems,
but i am good at "logic", i think Chris M. Thomasson
and Dmitry Vyukov have faster brains than me, so if you give
to them a problem to solve they will Breadth first search the solution
with there brain faster than me, and this is my weakness, i don't have
a faster brain as Chris M Thomasson or Dmitry Vyukov, but i am good at
"logic" and the IQ tests that io have passed have proved so, now
i want to explain to you something important, as you have seen me on
this forum , i have told you that i have a socialist way of thinking,
because i am a socialist, so my socialist way of thinking have forced me
to favor one my concept that i call: "Intelligent easiness",
intelligent easiness is the act of easying at the maximum possible
knowledge and the processes and the learning processes and the life
keeping at the same time a high degree of quality. This is my socialist
way of of thinking, but we have to be frank , when you apply my
concept that i call "intelligent easiness" , this act will have
the effect of sharing more and more wealth between humans,
but not all the people like my concept of "intelligent easiness"
because its goal is to share wealth , so it's a kind of socialist
way of thinking, this is why parallel programming seems hard to people
even though parallel programming is an easy learning process.. because
what you will notice is that people on internet don't want you
to understand easily parallel programming, so the goal that is a good
"quality" knowledge that easy the learning process for you is not there
, this is the principal obstacle to learning parallel programming,
people don't want to share with you wealth by applying my concept
of "intelligent easiness" that also higher the quality, because people
today are more capitalistic, this is why the learning process of
parallel programming have took me years even though parallel programming
is not so difficult.
 
 
 
Thank you for your time.
 
 
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 06:00PM -0800

Hello...
 
 
My dear programmers and hardware engineers , as you have seen
me here on this forums, i have showed you how to reason about
parallel programming and synchronization mechanisms,
parallel programming is not hard i think, since i have
worked on it for sometime now, because to know what is a race condition
is something easy, and to know what is sequential consistency
is simple and how to make a correct sequentialy consistent program on
weak and strong memory models and such is also not so difficult... and
to know what is lock convoy or priority inversion is
something easy i think, and an important thing to know on parallel
programming is how to do scalability prediction from the code , by using
CPU cycles calculations for the serial part and the parallel part that's
not so difficult also.. and to know how to optimize code by minimizing
cache-coherence traffic is also not so difficult... so i don't think
that programmers have to fear parallel programming, because the very
important part is how to put your hand on quality information and
quality explanation that easy the learning process for you about
parallel programming, that's the good way to success i think on parallel
programming, this is why you have seen me doing it here on this forum,
what i have tried to do also is easy parallel programming for you.
 
 
 
Thank you for your time,
 
 
 
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 05:26PM -0800

Hello,
 
 
This time since i have mastered to a certain level the
reader-writer algorithms, i will this time think as
an architect on parallel programming to show you something important,
here it is:
 
As you have seen me talking on my previous post, i have
said that with the client-server mechanism that i have given you
before , we can eliminate the lock synchronization mechanism from
the writer side, and we can make the writer side very very cheap in term
of running time compared to the reader section, so there is a very
important question that we can ask then, does transactional
memory, hardware or sofotware, will buy you much when designing a
client-server parallel in-memory database or an client server parallel
AVL tree or a client server parallel Red black tree ? i think
it will not buy you much in term of runnning time, because
with the client-server technic that i have given you above
and that eliminates the lock synchronization from the writer side
of the reader-writer algorithm , this will make the writer
side very cheap in term of running time compared to
the reader section that will be expensive in this scenario,
because as i have said before is this: What i like about my scalable
distributed sequential lock, is that if for example you want to
implement a really fast and parallel in-memory database or a Parallel
Redblack tree and a parallel AVL tree, the reader section will be much
expensive than the writer section, cause in the writer side you first
write the data outside the writer section and inside the writer
section you just copy your pointer to the datastructure,
but in the reader section you have to read the data
from the memory from inside the reader section , that makes
the reader section more expensive , so from the Amdahl's
law this will make the parallel in-memory database
that uses my scalable distributed sequential lock to
scale very well(on NUMA etc.) or it will make the the parallel Redblack
tree or parallel AVL tree that use my scalable distributed sequential
lock to scale very well (on NUMA etc.).
 
 
Hope you have understood my way of thinking like as an architect on
parallel programming...
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 04:38PM -0800

Hello,
 
 
What i like about my scalable distributed sequential lock,
is that if for example you want to implement a really fast and
parallel in-memory database or a Parallel Redblack tree and a parallel
AVL tree, the reader section will be much expensive than the
writer section, cause in the writer side you first write
the data outside the writer section and inside the writer
section you just copy your pointer to the datastructure,
but in the reader section you have to read the data
from the memory from inside the reader section , that makes
the reader section more expensive , so from the Amdahl's
law this will make the parallel in-memory database
that uses my scalable distributed sequential lock to
scale very well(on NUMA etc.) or it will make the the parallel Redblack
tree or parallel AVL tree that use my scalable distributed sequential
lock to scale very well (on NUMA etc.).
 
 
That's very important to know.
 
 
Thank you,
Amine Moulay Ramdane,.
Ramine <ramine@1.1>: Dec 07 05:10PM -0800

Hello,
 
 
But you have to use this optimization:
 
 
Hello,
 
 
If you want to implement an in-memory database with
my scalable distributed sequential lock, and you want
to optimize more the writer side, i mean if you want to
eliminate the expensive full memory barrier of the lock on the writer
side , what can you do about it ? you can use for example a client
server mechanism , that means the data is put by the producer threads on
a queue , and the consumer thread will execute alone the
writer section, and this way you can delete the lock from
the writer side, this will make the writer section much much
faster.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 04:23PM -0800

Hello,
 
 
If you want to implement an in-memory database with
my scalable distributed sequential lock, and you want
to optimize more the writer side, i mean if you want to
eliminate the expensive full memory barrier of the lock on the writer
side , what can you do about it ? you can use for example a client
server mechanism , that means the data is put by the producer threads on
a queue , and the consumer thread will execute alone the
writer section, and this way you can delete the lock from
the writer side, this will make the writer section much much
faster.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 03:49PM -0800

Hello,
 
 
Hello again my dear programmers and hardware engineers...
 
You have seen me talking about the reader-writer algorithms,
i have played with the Dmitry Vyukov distributed reader-writer mutex,
and i have implemented the SIDT(Store Interrupt Descriptor Table
Register) that returns the IDT base to use it as a replacement for the
windows GetCurrentProcessorNumber() used by the reader-writer
distributed algorithm, but i have noticed that it SIDT doesn't work
correctly, cause from time to time it doesn't return any IDT base, so i
have decided to stay with the windows GetCurrentProcessorNumber() even
if it is somewhat expensive, so now my question is where my scalable
distributed sequential lock is used , is there any use cases for my new
algorithm ? if you are using NUMA architecture this will parallelize the
data bus to and from the memory and this will be very good for
databases , so this will be a good use case for my scalable distibuted
sequential lock if you want for example to implement an in-memory
database that is really fast and that is parallelized, but even if you
don't have NUMA , you can still benefit from my scalable distributed
sequential lock cause it doesn't use any full memory barrier as PThread
reader-write lock is doing and as the distributed reader-writer of
Dmitry Vyukov is doing using an RWLock, so if for example you have took
a look at my previous post you will notice that on small to medium
reader sections on read-mostly scenarios my scalable distributed
sequential lock will have the same speed and throughput as Seqlock, but
it will be much faster on the throughput than the Dmitry Vyukov
distributed reader-writer mutex using an RWLock, and it will be
much much more faster on the throughput than the PThread reader-writer
mutex, other than that my new algorithm of a scalable distributed
sequential lock beats Seqlock on some characterists such as it doesn't
starve and it doesn't livelock on scenarios with a greater percentage of
writers, but Seqlock can livelock or starve. So hope you have
understood the improvement that have brought my new algorithm over
Seqlock and i think my new algorithm of a scalable distributed
sequential lock is competitive with RCU on read-mostly scenarios.
 
Finally my last word will be in the form of advice:
 
Please make sure you understand my proof of correctness that i have
presented to you and make sure to undertand very well my previous posts
cause in this era of multicores systems you must be able to undertand
Parallel programming and Synchronization algorithms, this is my advice.
 
 
You can download my scalable distributed sequential lock version 1.11 from:
 
https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 01:32PM -0800

Hello,
 
http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex
 
 
I have just done a calculation on small reader sections using the
Amdahl's law(like when you are parallelizing Red Black trees or AVL
trees) using this distributed reader-writer mutex above using an rwlock
using the GetCurrentProcessorNumber() of windows, and i have compared it
to Seqlock, at 4 cores the Seqlock has 15 times speed throughput of the
distributed reader-writer mutex above, at 8 cores
the Seqlock has 4 times the speed throughtput of the distributed
reader-writer mutex above, at 16 cores the Seqlock has 2 times the speed
throughtput of the distributed reader-writer mutex above,
at 32 cores the Seqlock has the same speed throughput as the distributed
reader-writer mutex above.
 
 
So Seqlock and my scalable distributed reader-writer lock are
much faster on small reader section than the distributed reader-writer
mutex above.
 
 
 
Thank you,
Amine Moulay Ramdane,
Ramine <ramine@1.1>: Dec 07 01:45PM -0800

Hello...
 
 
This calculation was done on read-mostly scenarios.
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 11:57AM -0800

Hello,
 
I have added more to the sequential consistency proof,
and i have corrected some typos , please read again...
 
 
Here is my proof:
 
Look at the source code, when Fcount5^.fcount5 equal 0 in the writer
side , the reader side will be run in a Seqlock mode, , i don't need to
proove Seqlock because my Seqlock algorithm is correct, just compare it
to the other Seqlock algorithms and you will notice it, what i need to
proof is when Fcount6^.fcount6 modulo "the number of cores" is equal to
0 , that means that we are in a distributed mode, so when we are in a
distributed mode, the reader side of my algorithm will enter the
distributed reader-writer lock or it will wait for the distributed
reader-writer lock to exit, if the reader side enters the distributed
reader-writer lock , if Fcount6^.fcount6 has not changed on the
RUnlock(), the reader thread will exit with a "true" value, if
Fcount6^.fcount6 has changed, the RUnlock() method will catch it and it
will rollback with Seqlock mechanism, now if Fcount6^.fcount6 modulo
"the number of cores" is equal to 0 on RLock() , and the reader side
waits for the writer side to enter the distributed reader-writer lock,
the reader side after that will enter the distributed reader-writer lock
and if Fcount6^.fcount6 modulo "the number of cores" has changed
on RUnlock(), the reader side will rollback in a Seqlock mode, if not,
RUnlock() will exit with the value of "true". So all in all my algorithm
is correct.
 
About the sequential consistency of my scalable distributed sequential
lock, my algorithm works on x86 architecture and i think my algorithm is
correct cause look at the source code of the WLock() method, since i am
using a Ticket spinlock with a proportional backoff on the writer side,
the Ticket spinlock is using a "lock add" assembler instruction to
increment a counter in the Enter() method of the ticket spinlock , and
this "lock add" assembler instruction is a barrier for stores and loads
on x86, and the stores of Fcount6^.fcount6 and FCount5^.fcount5
are not reordred with the stores of the writer section, so the WLock()
method is sequential consistent and correct, now look at the WUnlock() ,
we don't need an "sfence" cause stores on WUnlock() are not reordered
with older stores on x86 , so WUnlock() method is sequential consistent
and correct, now look at the RLock() method, the loads inside RLock()
method are not reordered with the loads of the reader section , and on
RUnlock(), the loads of RUnlock() are not reordered with older loads of
the critical section , so all in all my algorithm i think my algorithm
is sequential consistent and correct on x86. So be confident cause i
have reasoned correctly and i think my algorithm is correct and it is a
powerful synchronization mechanism that can replace RCU and that can
replace Seqlock cause it beats Seqlock.
 
 
So this was my proof that my algorithm is correct.
 
You can download my scalable distributed sequential lock version 1.11 from:
 
https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock
 
 
Read this about my scalable distributed sequential lock:
 
 
Disclaimer:
 
My software is provided on an "as-is" basis, with no warranties,
express or implied. The entire risk and liability of using it is yours.
Any damages resulting from the use or misuse of this software will be
the responsibility of the user.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 12:10PM -0800

On 12/7/2014 11:57 AM, Ramine wrote:
> method are not reordered with the loads of the reader section , and on
> RUnlock(), the loads of RUnlock() are not reordered with older loads of
> the critical section , so all in all my algorithm i think my algorithm
 
 
I mean here "reader section" , not critical section.
 
Ramine <ramine@1.1>: Dec 07 11:19AM -0800

Hello,
 
I have wrote in my previous post this:
 
"Now i think my algorithm is correct."
 
You will say: "Amine in computer science we need proof, so
can you prove your algorithm is correct ?"
 
 
Here is my proof:
 
Look at the source code, when Fcount5^.fcount5 equal 0 in the writer
side , the reader side will be run in a Seqlock mode, , i don't need to
proove Seqlock because my Seqlock algorithm is correct, just compare it
to the other Seqlock algorithms and you will notice it, what i need to
proof is when Fcount6^.fcount6 modulo "the number of cores" is equal to
0 , that means that we are in a distributed mode, so when we are in a
distributed mode, the reader side of my algorithm will enter the
distributed reader-writer lock or it will wait for the distributed
reader-writer lock to exit, if the reader side enters the distributed
reader-writer lock , if Fcount6^.fcount6 has not changed on the
RUnlock(), the reader thread will exit with a "true" value, if
Fcount6^.fcount6 has changed, the RUnlock() method will catch it and it
will rollback with Seqlock mechanism, now if Fcount6^.fcount6 modulo
"the number of cores" is equal to 0 on RLock() , and the reader side
waits for the writer side to enter the distributed reader-writer lock,
the reader side after that will enter the distributed reader-writer lock
and if Fcount6^.fcount6 modulo "the number of cores" has hanged
on RUnlock(), the reader side will rollback in a Seqlock mode, if not,
RUnlock() will exit with the value of "true". So all in all my algorithm
is correct.
 
About the sequential consistency of my scalable distributed sequential
lock, my algorithm works on x86 architecture and i think my algorithm is
correct cause look at the source code of the WLock() method, since i am
using a Ticket spinlock with a proportional backoff on the writer side,
the Ticket spinlock is using a "lock add" assembler instruction to
increment a counter in the Enter() method of the ticket spinlock , and
this "lock add" assembler instruction is a barrier for stores and loads
on x86, so the WLock() method is sequential consistent and correct, now
look at the WUnlock() , we don't need an "sfence" cause stores are not
reordered with stores on x86 , so WUnlock() method is sequential
consistent and correct, now look at the RLock() method, the loads inside
RLock() method are not reordered with the loads of the reader section ,
and on RUnlock(), the loads of RUnlock() are not reordered with older
loads of the critical section , so all in all my algorithm i think my
algorithm is sequential consistent and correct on x86. So be confident
cause i have reasoned correctly and i think my algorithm is correct and
it is a powerful synchronization mechanism that can replace RCU and that
can replace Seqlock cause it beats Seqlock.
 
 
So this was my proof that my algorithm is correct.
 
You can download my scalable distributed sequential lock version 1.11 from:
 
https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Dec 07 11:29AM -0800

Hello,
 
 
Read this about my scalable distributed sequential lock:
 
 
Disclaimer:
 
My software is provided on an "as-is" basis, with no warranties,
express or implied. The entire risk and liability of using it is yours.
Any damages resulting from the use or misuse of this software will be
the responsibility of the user.
 
 
 
Thank you.
 
Amine Moulay Ramdane.
 
1
Ramine <ramine@1.1>: Dec 07 11:36AM -0800

On 12/7/2014 11:19 AM, Ramine wrote:
> waits for the writer side to enter the distributed reader-writer lock,
> the reader side after that will enter the distributed reader-writer lock
> and if Fcount6^.fcount6 modulo "the number of cores" has hanged
 
A typo: i mean "has changed", not "has hanged".
 
 
Ramine <ramine@1.1>: Dec 07 10:46AM -0800

Hello,
 
 
I have updated my scalable distributed sequential lock to version 1.11,
i have just corrected a bug, because before my algorithm was doing this
on the writer side:
 
In the WLock() it was doing this:
 
If Fcount5^.fcount5=0 then fcount4^.fcount4:=Fcount4^.fcount4+1;
 
And in the WUnlock() it was doing this:
 
If Fcount5^.fcount5=0 then fcount4^.fcount4:=Fcount4^.fcount4+1;
 
 
This has introduced a bug cause if we are just before
that "FCount6^.fcount6 mod nbrcores" equal to 0 and
the RLock() enters the reader section without
calling the dw.RLock() and the writer side enters
in a distributed mode and enters the writer section
so they will both succeed to enter and the Runlock()
will return true and this is a bug , to correct this bug
we must delete the "If Fcount5^.fcount5=0" if the
"count4^.fcount4:=Fcount4^.fcount4+1" like this:
 
In the WLock() it was doing this:
 
fcount4^.fcount4:=Fcount4^.fcount4+1;
 
 
And in the WUnlock() it was doing this:
 
fcount4^.fcount4:=Fcount4^.fcount4+1;
 
 
Now i think my algorithm is correct.
 
 
You can download the updated version 1.01 from:
 
https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock
 
 
 
Thank you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: