Friday, March 20, 2015

Digest for comp.programming.threads@googlegroups.com - 18 updates in 4 topics

bleachbot <bleachbot@httrack.com>: Mar 19 10:49PM +0100

bleachbot <bleachbot@httrack.com>: Mar 19 10:54PM +0100

bleachbot <bleachbot@httrack.com>: Mar 19 11:06PM +0100

bleachbot <bleachbot@httrack.com>: Mar 19 11:08PM +0100

bleachbot <bleachbot@httrack.com>: Mar 19 11:20PM +0100

bleachbot <bleachbot@httrack.com>: Mar 20 12:13AM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 06:53PM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 11:05PM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 06:55PM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 10:38PM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 06:44PM +0100

bleachbot <bleachbot@httrack.com>: Mar 12 06:19PM +0100

Ramine <ramine@1.1>: Mar 19 06:02PM -0700

Hello,
 
 
I have thought more about concurrent datastructures, and
i think they will scale well on NUMA architecture, because with
concurrent AVL trees and concurrent Red Black trees and concurrent
Skiplists the access to different nodes allocated in different NUMA
nodes will be random and i have thought about it and this will get
you a good result on NUMA architecture, what is my proof ?
imagine that you have 32 cores and one NUMA node for each 4 cores,
that means 8 NUMA nodes in total, so you will allocate your
nodes in different NUMA nodes, so when 32 threads on each of the 32
cores will access thosr concurrent datastructures above,
they will do it in a probabilistic way , this will give a probability
of 1/8 (1 over 8 NUMA nodes) for each thread, so in average i think
you will have a contention for a different NUMA node for every 4
threads , so from the Amdahl's law this will scale on average to 8X
on 8 NUMA nodes, that's really good ! So we are safe !
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Mar 19 06:00PM -0700

On 3/19/2015 6:02 PM, Ramine wrote:
> you will have a contention for a different NUMA node for every 4
> threads , so from the Amdahl's law this will scale on average to 8X
> on 8 NUMA nodes, that's really good ! So we are safe !
 
My reasonning is true for more NUMA nodes, that means it will scale on
more NUMA nodes.
 
 
Ramine <ramine@1.1>: Mar 19 07:19PM -0700

On 3/19/2015 6:02 PM, Ramine wrote:
> imagine that you have 32 cores and one NUMA node for each 4 cores,
> that means 8 NUMA nodes in total, so you will allocate your
> nodes in different NUMA nodes, so when 32 threads on each of the 32
 
 
I mean 32 threads and each thread on each of the 32 cores.
 
Ramine <ramine@1.1>: Mar 19 06:27PM -0700

Hello,
 
 
I have finished to port a beautiful skiplist algorithm to
freepascal and delphi... and i am render it to a concurrent
SkipList using the distributed reader-writer mutex
that i have talked to you about before, and i have
noticed on my benchmarks and doing some calculations
with the Amdahl's law that this concurrent Skiplist
that i am implementing will scale to 100X on read-mostly
scenarios and on a NUMA architecture when it is used
in a client-server manner using threads, that's good.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Mar 19 06:13PM -0700

Hello,
 
 
Hope you have read and understood my previous post titled:
"About NUMA and we are safe !", what i want to say in this
post is that i have done some scalability prediction
for the following distributed reader-writer mutex:
 
https://sites.google.com/site/aminer68/scalable-distributed-reader-writer-mutex
 
as you will noticed i am using an atomic "lock add" assembler
instructions that is executed by only the threads that belong to the
same core, so this will render it less expensive, i have benchmarked
it and i have noticed that it takes 20 CPU cycles on x86, so that's not
so expensive, and i have done a scalability prediction using
this distributed reader-writer mutex with a concurrent AVL tree
and a concurrent Red-Black tree, and it gives 50X scalability on NUMA
architecture when used in client-server way, that's because the "lock
add" assembler instruction that is executed by only the threads that
belong to the same core does take only 20 CPU cycles on x86.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Mar 19 06:15PM -0700

On 3/19/2015 6:13 PM, Ramine wrote:
> so expensive, and i have done a scalability prediction using
> this distributed reader-writer mutex with a concurrent AVL tree
> and a concurrent Red-Black tree, and it gives 50X scalability on NUMA
 
 
I mean it will scale to 50X on read-mostly scenarios.
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: