Sunday, November 16, 2014

Digest for comp.programming.threads@googlegroups.com - 6 updates in 4 topics

comp.programming.threads@googlegroups.com Google Groups
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com.
Ramine <ramine@1.1>: Nov 15 04:45PM -0800

Hello,
 
 
I have took a look at the following link about the performance
of the VISC architecture of Soft Machines...
 
http://www.bit-tech.net/news/hardware/2014/10/24/soft-machines-visc/1
 
Notice that they are saying that Soft Machines says that:
 
"The result, the team behind the technology claims, is a boost in
instructions per cycle of 3-4 times compared to existing technologies
resulting in a 2-4 times boost in performance per watt on both single-
and multi-threaded applications.:"
 
 
Am i misunderstanding here? cause from what i know about parallel
programming, that scalability is dependant also on the serial part
of the Amdahl equation, so in a "more" memory bound application it can
not scale to 2-4 times, so how can they say that in general there VISC
architecture boosts performance by 2-4 times ? is it a marketing move ?
 
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Nov 15 05:04PM -0800

Hello,
 
 
Read here on the following link , they have come to the same conclusion
as me, cause in a "more" memory bound parallel application ,
the serial part of the Amdahl's law will be bigger because
multicore machines serializes the access to the memory bus,
so it will not scale and you will even have a retrograde throughput
because of contention on the memory bus , and that's the way it is with
multicores machines and i think that's not different with VISC
architecture of Soft Machines.
 
 
Read here please:
 
 
https://share.sandia.gov/news/resources/news_releases/more-chip-cores-can-mean-slower-supercomputing-sandia-simulation-shows/#.VGf2o4odhK8
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Nov 15 03:36PM -0800

Hello,
 
 
I think that the Amdahl law is very important, cause it permits
us to understand better multicore programming and distributed
programming, for example when you are doing a parallel matrix
multiplication, you have to move two doubles and you have also to
multiply them, an the act to multiplying them inside your computers will
tale around 8 clocks if we don't use SIMD instructions, and the act to
move the two doubles from the memory to the CPU will take around 1
clock, so the parallel matrix multiplication will not scale more than 8X
and that's what the Amdahl lawsays, and that's because of contention on
the memory bus that must serialize the accesses to the memomoy , for the
concurrent hashtable that's the same , if the data on the corresponding
keys of the hashtable are more bigger so the data moving from the memory
to the CPU will take much more CPU clocks and this will make the serial
part of the Amdahl equation bigger , so this will make the concurrent
hashtable to scale less and less, and that's also the Amdahl equation
that says that... for databases systems such us Oracle they don't scale
well on multicore systems, cause they are memory bound, so this is why
you have to scale them by distributing your database on many computers
and this will make the memory system and hardisk system truly parallel
for the read transations and this is much better and that's also the
Amdahl equation who says that, so all in all the Amdahl equation is a
good tool that that modifies our perception on parallel programming and
that permit us to undertand better the inner side of parallel programming.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Nov 15 03:38PM -0800

On 11/15/2014 3:36 PM, Ramine wrote:
> multiplication, you have to move two doubles and you have also to
> multiply them, an the act to multiplying them inside your computers will
> tale around 8 clocks if we don't use SIMD instructions, and the act to
 
around 8 clocks on x86 computers i mean.
 
 
Ramine <ramine@1.1>: Nov 15 02:35PM -0800

Hello,
 
 
If you ask me the following question:
 
 
Amine, you have included the data moving part from the memory to the CPU
that is stored in the corresponding key of the concurrent hashtable as a
serial part of the Amdahl equation to calculate the scalability
of the concurrent hashtable, is it correct to do it like this ?
 
 
Answer:
 
When the PhD papers are testing there scalable concurrent hashtable
or scalable Red Black trees, they are testing them just by moving the
pointers of the data of the corresponding keys, they are not counting
the moving of the data of the corresponding keys from the memory
to the CPU, this is why there concurrent hashtables and concurrent
Red Blacks trees are scaling better, but in real world applications
the moving part of the data corresponding to the keys of the concurrent
hashtable or the concurrent Red Black tree from
the memory to the CPU are always taking place and the memory subsystem
are still the bottleneck on today multicore machines and i think
it will still be the bottleneck in a far future for the scalability if
the data on each corresponding key is more bigger, this is why i have
included this serial part inside the Amdahl equation for calculating the
scalability of the concurrent hashtable.
 
 
 
Thank you,
Amine Moulay Ramdane.
Ramine <ramine@1.1>: Nov 15 01:56PM -0800

Hello,
 
 
I have come to an interesting subject...
 
This time i will speak about "scalability", as you have noticed i
have implemented a concurrent hashtable that uses lock-striping
, so it is more fine-grained than simply using a single RWLock around
the hashtable, here it is:
 
https://sites.google.com/site/aminer68/parallel-hashlist
 
But as you will notice my concurrent hashtable is using an atomic
increment inside the MREW(multiple-readers-exclusive-writer) that i am
using in each bucket of the hashtable, this atomic increment will take
around 100 CPU clockson x86 architecture cause it is using a "lock"
assembler instruction , so since it is using this 100 CPU clocks in the
serial part of the Amdahl equation so i don't think it will scale by
much , so i have to use my scalable RWLock in each bucket, but even if i
will use my scalable RWLock in each bucket this will give almost the
same perfomance as if i am using only one scalable RWLock around the
hashtable... but this is not the only problem, imagine also that the
data that is stored in each corresponding key is more bigger , so i
think i think that my concurrent hashtable will be less and less
scalable cause the accesses to the memory are serialized by the x86
computers, so i think this is the big problem on multicore
machines, i mean if the serial part of the Amdahl equation is more
bigger cause you are moving data that are more bigger from the memory
system to the CPU, this will make your application to scale less and
less, so i have come to the follwing question:
 
As you have noticed Intel have implemented its transactional memory
and Soft Machines have showed us the performance of its VISC
microprocessor , but my question is how can we promise a better
scalabbility with those architectures since we know that the bottleneck
with scalability is the memory system, cause as i have jkust explained
if the data on each corresponding key on my conccurrent hashtable is
more bigger my concurrent hashtable will be less and less scalable, and
i think this is the same problem with databases and mathematical
Matrices calculations etc. they don't scale well on multicores either.
 
 
 
 
Thank you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: