- cmsg cancel <mefg7j$mn4$2@dont-email.me> - 12 Updates
- About NUMA and we are safe ! - 3 Updates
- Concurrent SkipLists... - 1 Update
- About the distributed reader-writer mutex - 2 Updates
bleachbot <bleachbot@httrack.com>: Mar 19 10:49PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 19 10:54PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 19 11:06PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 19 11:08PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 19 11:20PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 20 12:13AM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 06:53PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 11:05PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 06:55PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 10:38PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 06:44PM +0100 |
bleachbot <bleachbot@httrack.com>: Mar 12 06:19PM +0100 |
Ramine <ramine@1.1>: Mar 19 06:02PM -0700 Hello, I have thought more about concurrent datastructures, and i think they will scale well on NUMA architecture, because with concurrent AVL trees and concurrent Red Black trees and concurrent Skiplists the access to different nodes allocated in different NUMA nodes will be random and i have thought about it and this will get you a good result on NUMA architecture, what is my proof ? imagine that you have 32 cores and one NUMA node for each 4 cores, that means 8 NUMA nodes in total, so you will allocate your nodes in different NUMA nodes, so when 32 threads on each of the 32 cores will access thosr concurrent datastructures above, they will do it in a probabilistic way , this will give a probability of 1/8 (1 over 8 NUMA nodes) for each thread, so in average i think you will have a contention for a different NUMA node for every 4 threads , so from the Amdahl's law this will scale on average to 8X on 8 NUMA nodes, that's really good ! So we are safe ! Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Mar 19 06:00PM -0700 On 3/19/2015 6:02 PM, Ramine wrote: > you will have a contention for a different NUMA node for every 4 > threads , so from the Amdahl's law this will scale on average to 8X > on 8 NUMA nodes, that's really good ! So we are safe ! My reasonning is true for more NUMA nodes, that means it will scale on more NUMA nodes. |
Ramine <ramine@1.1>: Mar 19 07:19PM -0700 On 3/19/2015 6:02 PM, Ramine wrote: > imagine that you have 32 cores and one NUMA node for each 4 cores, > that means 8 NUMA nodes in total, so you will allocate your > nodes in different NUMA nodes, so when 32 threads on each of the 32 I mean 32 threads and each thread on each of the 32 cores. |
Ramine <ramine@1.1>: Mar 19 06:27PM -0700 Hello, I have finished to port a beautiful skiplist algorithm to freepascal and delphi... and i am render it to a concurrent SkipList using the distributed reader-writer mutex that i have talked to you about before, and i have noticed on my benchmarks and doing some calculations with the Amdahl's law that this concurrent Skiplist that i am implementing will scale to 100X on read-mostly scenarios and on a NUMA architecture when it is used in a client-server manner using threads, that's good. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Mar 19 06:13PM -0700 Hello, Hope you have read and understood my previous post titled: "About NUMA and we are safe !", what i want to say in this post is that i have done some scalability prediction for the following distributed reader-writer mutex: https://sites.google.com/site/aminer68/scalable-distributed-reader-writer-mutex as you will noticed i am using an atomic "lock add" assembler instructions that is executed by only the threads that belong to the same core, so this will render it less expensive, i have benchmarked it and i have noticed that it takes 20 CPU cycles on x86, so that's not so expensive, and i have done a scalability prediction using this distributed reader-writer mutex with a concurrent AVL tree and a concurrent Red-Black tree, and it gives 50X scalability on NUMA architecture when used in client-server way, that's because the "lock add" assembler instruction that is executed by only the threads that belong to the same core does take only 20 CPU cycles on x86. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Mar 19 06:15PM -0700 On 3/19/2015 6:13 PM, Ramine wrote: > so expensive, and i have done a scalability prediction using > this distributed reader-writer mutex with a concurrent AVL tree > and a concurrent Red-Black tree, and it gives 50X scalability on NUMA I mean it will scale to 50X on read-mostly scenarios. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment