comp.programming.threads@googlegroups.com | Google Groups | ![]() |
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
- About VISC architecture - 2 Updates
- About Amdahl law - 2 Updates
- More about scalability - 1 Update
- About scalability - 1 Update
Ramine <ramine@1.1>: Nov 15 04:45PM -0800 Hello, I have took a look at the following link about the performance of the VISC architecture of Soft Machines... http://www.bit-tech.net/news/hardware/2014/10/24/soft-machines-visc/1 Notice that they are saying that Soft Machines says that: "The result, the team behind the technology claims, is a boost in instructions per cycle of 3-4 times compared to existing technologies resulting in a 2-4 times boost in performance per watt on both single- and multi-threaded applications.:" Am i misunderstanding here? cause from what i know about parallel programming, that scalability is dependant also on the serial part of the Amdahl equation, so in a "more" memory bound application it can not scale to 2-4 times, so how can they say that in general there VISC architecture boosts performance by 2-4 times ? is it a marketing move ? Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 15 05:04PM -0800 Hello, Read here on the following link , they have come to the same conclusion as me, cause in a "more" memory bound parallel application , the serial part of the Amdahl's law will be bigger because multicore machines serializes the access to the memory bus, so it will not scale and you will even have a retrograde throughput because of contention on the memory bus , and that's the way it is with multicores machines and i think that's not different with VISC architecture of Soft Machines. Read here please: https://share.sandia.gov/news/resources/news_releases/more-chip-cores-can-mean-slower-supercomputing-sandia-simulation-shows/#.VGf2o4odhK8 Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 15 03:36PM -0800 Hello, I think that the Amdahl law is very important, cause it permits us to understand better multicore programming and distributed programming, for example when you are doing a parallel matrix multiplication, you have to move two doubles and you have also to multiply them, an the act to multiplying them inside your computers will tale around 8 clocks if we don't use SIMD instructions, and the act to move the two doubles from the memory to the CPU will take around 1 clock, so the parallel matrix multiplication will not scale more than 8X and that's what the Amdahl lawsays, and that's because of contention on the memory bus that must serialize the accesses to the memomoy , for the concurrent hashtable that's the same , if the data on the corresponding keys of the hashtable are more bigger so the data moving from the memory to the CPU will take much more CPU clocks and this will make the serial part of the Amdahl equation bigger , so this will make the concurrent hashtable to scale less and less, and that's also the Amdahl equation that says that... for databases systems such us Oracle they don't scale well on multicore systems, cause they are memory bound, so this is why you have to scale them by distributing your database on many computers and this will make the memory system and hardisk system truly parallel for the read transations and this is much better and that's also the Amdahl equation who says that, so all in all the Amdahl equation is a good tool that that modifies our perception on parallel programming and that permit us to undertand better the inner side of parallel programming. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 15 03:38PM -0800 On 11/15/2014 3:36 PM, Ramine wrote: > multiplication, you have to move two doubles and you have also to > multiply them, an the act to multiplying them inside your computers will > tale around 8 clocks if we don't use SIMD instructions, and the act to around 8 clocks on x86 computers i mean. |
Ramine <ramine@1.1>: Nov 15 02:35PM -0800 Hello, If you ask me the following question: Amine, you have included the data moving part from the memory to the CPU that is stored in the corresponding key of the concurrent hashtable as a serial part of the Amdahl equation to calculate the scalability of the concurrent hashtable, is it correct to do it like this ? Answer: When the PhD papers are testing there scalable concurrent hashtable or scalable Red Black trees, they are testing them just by moving the pointers of the data of the corresponding keys, they are not counting the moving of the data of the corresponding keys from the memory to the CPU, this is why there concurrent hashtables and concurrent Red Blacks trees are scaling better, but in real world applications the moving part of the data corresponding to the keys of the concurrent hashtable or the concurrent Red Black tree from the memory to the CPU are always taking place and the memory subsystem are still the bottleneck on today multicore machines and i think it will still be the bottleneck in a far future for the scalability if the data on each corresponding key is more bigger, this is why i have included this serial part inside the Amdahl equation for calculating the scalability of the concurrent hashtable. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Nov 15 01:56PM -0800 Hello, I have come to an interesting subject... This time i will speak about "scalability", as you have noticed i have implemented a concurrent hashtable that uses lock-striping , so it is more fine-grained than simply using a single RWLock around the hashtable, here it is: https://sites.google.com/site/aminer68/parallel-hashlist But as you will notice my concurrent hashtable is using an atomic increment inside the MREW(multiple-readers-exclusive-writer) that i am using in each bucket of the hashtable, this atomic increment will take around 100 CPU clockson x86 architecture cause it is using a "lock" assembler instruction , so since it is using this 100 CPU clocks in the serial part of the Amdahl equation so i don't think it will scale by much , so i have to use my scalable RWLock in each bucket, but even if i will use my scalable RWLock in each bucket this will give almost the same perfomance as if i am using only one scalable RWLock around the hashtable... but this is not the only problem, imagine also that the data that is stored in each corresponding key is more bigger , so i think i think that my concurrent hashtable will be less and less scalable cause the accesses to the memory are serialized by the x86 computers, so i think this is the big problem on multicore machines, i mean if the serial part of the Amdahl equation is more bigger cause you are moving data that are more bigger from the memory system to the CPU, this will make your application to scale less and less, so i have come to the follwing question: As you have noticed Intel have implemented its transactional memory and Soft Machines have showed us the performance of its VISC microprocessor , but my question is how can we promise a better scalabbility with those architectures since we know that the bottleneck with scalability is the memory system, cause as i have jkust explained if the data on each corresponding key on my conccurrent hashtable is more bigger my concurrent hashtable will be less and less scalable, and i think this is the same problem with databases and mathematical Matrices calculations etc. they don't scale well on multicores either. Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment