Saturday, August 1, 2009

comp.programming.threads - 20 new messages in 4 topics - digest

comp.programming.threads
http://groups.google.com/group/comp.programming.threads?hl=en

comp.programming.threads@googlegroups.com

Today's topics:

* Multi-producer/multi-consumer SEH-based queue - 2 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/c43d1197c8168b4e?hl=en
* Programming With Posix Threads book - 1 messages, 1 author
http://groups.google.com/group/comp.programming.threads/t/05208d3770bd543e?hl=en
* Multi-cores and contention,,, - 10 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/b74390b785ea0e26?hl=en
* Spinlocks - trouble with speed - 7 messages, 4 authors
http://groups.google.com/group/comp.programming.threads/t/9f5e34251ccf45e3?hl=en

==============================================================================
TOPIC: Multi-producer/multi-consumer SEH-based queue
http://groups.google.com/group/comp.programming.threads/t/c43d1197c8168b4e?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Jul 30 2009 11:36 pm
From: "Chris M. Thomasson"


"Dmitriy V'jukov" <dvyukov@gmail.com> wrote in message
news:02c80e6f-4478-4e54-b039-bd207cd19818@j32g2000yqh.googlegroups.com...
On 31 июл, 04:45, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > I don't worry about the (***) part as it's an EXTREMELY small window.
> > Also,
> > if a programs threads can just up and die at (***) then, IMHO, the
> > program
> > has a lot more "issues" to worry about than this one...


> Yes, spuriously dying threads are indeed problematic :)

> That was just a comments on *formal* semantics. Theoretically, it's
> not a lock-free queue because lock-free algorithm must be safe wrt
> thread termination.
> But from practical POV, yes, I do not care about (***) part too.


> >
> > ;^)
> >
> > So, we now have intrusive/non-intrusive versions of a high-performance
> > MPMC
> > queue that beats the shi% out of the Michael and Scott queue:
> >
> > Very cool!

> Ah, I know what Michael and Scott will say, they will say "your queue
> is not lock-free". Damn!

Then we can become smart ass and say:

"This is just a very simple example of how clever lock-based queue can beat
the hell out of a crappy lock-free queue..."

Ouch!

== 2 of 2 ==
Date: Fri, Jul 31 2009 2:35 pm
From: "Dmitriy V'jukov"


On 31 июл, 07:07, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > > > I think that Relacy will show the truth.
> [...]
> > > However, once I do
> > > that, rl::atomic will load the entire anchor atomically with no chance
> > > of a
> > > preemption in between the load to the anchors head pointer and the
> > > subsequent load to the anchors aba counter. Am I making any sense to
> > > you?
> > C++0x atomics do not support non-atomic accesses to atomic variables.
>
> This raises another question... Lets say that a particular C++0x impl on a
> 32-bit system has lock-free access to a 64-bit variable. Will it have to use
> 64-bit atomic load instruction on the 32-bit system to load 64-bit anchor?
> For instance, on IA-32, it would use a MOVQ instruction?

I think that the answer is yes.

--
Dmitriy V'jukov

==============================================================================
TOPIC: Programming With Posix Threads book
http://groups.google.com/group/comp.programming.threads/t/05208d3770bd543e?hl=en
==============================================================================

== 1 of 1 ==
Date: Fri, Jul 31 2009 6:28 am
From: Duke Robillard


On Wed, 22 Jul 2009 09:27:15 -0700, Francis Moreau wrote:

> 2. I'm not really interested in an API description but rather in
> some discussion

I recently got a book by Brian Goetz called "Java Concurrency In
Practice." It's Java-centric (no pthreads here) but it has sort
of the feel you're talking about...it's more about how to do stuff
with concurrency, rather than an API description. I've found it
really clear and easy to read.

Duke


==============================================================================
TOPIC: Multi-cores and contention,,,
http://groups.google.com/group/comp.programming.threads/t/b74390b785ea0e26?hl=en
==============================================================================

== 1 of 10 ==
Date: Fri, Jul 31 2009 9:06 am
From: Amine


Hello all,


I am still *thinking*...

Please look with me carefully...

I have done for example a parallel program, please look at
Parallel Matrix demo: http://www.colba.net/~aminer/

and this is working fine and it's scalable because there is
an appreciable (1-P) in the Amadhal equation: because it's
using the cache and doing some calculations in *PARALLEL*

But there is still a problem...

I wrote:

"Now if you have followed with me, i think there is three things:
You have to look at the *scalability* from three perspectives:

ALGORITHM + HARDWARE + DATA"


and i wrote:

"If you have noticed, this problem can still occur with my lock-free
Threadpool and lockfree_SPMC."


If you have noticed, i am using a queue for every worker thread etc..


So now my question is:

How can we say that my lockfree Threadpool and lockfree_SPMC
queue is scalable in performance if for every transaction in the pop
()
side we are missing the cache(cause of *RANDOM* data that miss
the cache: so we are not servicing in an M/M/m manner: that means
bringing the data from the caches in *PARALLEL*, but we are are
just servicing in an M/M/1 manner with contention on a single
memory bus and with the lower service rate of the memory ??

Do you undertand ?

And how do you think about this problem ?


Regards,
Amine Moulay Ramdane.

On Jul 30, 1:29 pm, Amine <ami...@colba.net> wrote:
> Hello again,
>
> I was still *thinking*...
>
> Now if you have followed with me, i think there is three things:
> You have to look at the *scalability* from three perspectives:
>
> ALGORITHM + HARDWARE + DATA
>
> I wrote:
>
> "I mean what is the benefit of a very FAST  Ferrari car if the roads
> are not so good and they lower a lot its speed ?"
>
> Now please follow with me..
>
> I wrote:
>
> [1] "So, let say for example that we are runnning under an Azul system
>  with more than 700 cores:  http://www.azulsystems.com/andusing
>  this lock-free Hash  table:http://www.azulsystems.com/events/
> javaone_2007/2007_LockFreeHash.pdf
>
>  and millions of transactions (let say find() transactions that search
>  and return some data) are coming with an arrival rate, those
> transations
>  are mostly memory bound and let suppose an instant that those are
> mostly
>  random transactions that *miss* the cash, so that they are serviced
> not in an
>  M/M/m manner but in M/M/1 with the memory as the server and with its
> low service
>  rate lower than the caches, that will be very bad. So in this
> situation: how can even
>  this Azul system with more than 700 cores with let say a lock-free
> Hash-table scale
>  in performance ?"
>
> If you have noticed, this problem can still occur with my lock-free
> Threadpool and  lockfree_SPMC.
>
> Now this problem doesn't come from the ALGORITHM but can
> come from the HARDWARE: one bus that is servicing the jobs
> in an M/M/1 manner, or/and come from the essence of
> the DATA: random access to memory that *miss* the cash.
>
> Dave Butenhof  wrote in a post:
>
> "And maybe you can think about improving the road... "
>
> But if you have noticed the 'bad road' can be the essence of DATA in
> itself.
>
> So let take as an example the folowing situation
> (that is not the worst case):
>
> Let say that the access of the big in-memory data  is not so
> random and 3 meg of data is frequently used in let say 80% of
> the time, so what i want is to avoid *completly* the situation
> where the 3 meg cached data that is used 80% of the time will
> not be erased  completly from the cache: a situation that
> can look like a livelock  if we have a very high transaction arrival
> rate....
>
> So, what i want in this situation is also an explicit control over
> the cache , that means to *force* the cache to lock 3 meg of data
> in a cache of let say 4 meg (to be able to parallelize like an M/M/
> m...)
>
> Hence my question:
>
> Wich hardware can do this sort of thing in this sort of situation ?
>
> Regards,
> Amine Moulay Ramdane.
>
> On Jul 29, 8:34 pm, Amine <ami...@colba.net> wrote:
>
>
>
> > Hello all,
>
> > I wrote:
>
> > "I mean what is the benefit of a very FAST  Ferari car if the roads
> > are
> > not so good and they lower a lot its speed ?"
>
> > I was still *thinking* ...
>
> > So, let say for example that we are runnning under an Azul system
> > with more than 700 cores:  http://www.azulsystems.com/andusing
> > this lock-free Hash  table:http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > and millions of transactions (let say find() transactions that search
> > and return
> > some data) are coming with an arrival rate, those transations are
> > mostly
> > memory bound and let suppose an instant that those are mostly random
> > transactions
> > that *miss* the cash, so that they are serviced not in an M/M/m manner
> > but in M/M/1
> > with the memory as the server and with its low service rate lower than
> > the caches,
> > that will be very bad. So in this situation: how can even this Azul
> > system with more
> > than 700 cores with let say a lock-free Hash-table scale in
> > performance ?
>
> > Regards,
> > Amine.
>
> > On Jul 29, 3:32 pm, Amine <ami...@colba.net> wrote:
>
> > > On Jul 29, 3:21 pm, Amine <ami...@colba.net> wrote:
>
> > > > I wrote:
> > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > not so good i think...
>
> > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > not so good
> > > > and they lower a lot it's speed ?
>
> > > typo:
>
> > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > not so good and they lower a lot its speed ?
>
> > > Regards,
> > > Amine Moulay Ramdane.
>
> > > > What is the benefit of a very fast lock-free hash-table if the in
> > > > memory data
> > > > it access is big and it's *randomly* accessed: and this will lower the
> > > > speed
> > > > *AND*
> > > > also  there is a contention on the bus in an M/M/1  manner and this
> > > > will also
> > > > lower the speed ?
>
> > > > Regards,
> > > > Amine. Moulay Ramdane.
>
> > > > On Jul 29, 2:35 pm, Amine <ami...@colba.net> wrote:
>
> > > > > Hello all,
>
> > > > > I was thinking about my lock-free Threadpool and lock-free_SPMC
>
> > > > >http://www.colba.net/~aminer/
>
> > > > > i am using a queue for every thread worker ( and in the producer side
> > > > > you can push in parallel also) and i have come to the conclusion that
> > > > > this will scale very well.in the software side...
>
> > > > > But i was thinking more about *scalability* and i have come to the
> > > > > conclusion that there is still a problem, please follow with me:
>
> > > > > As Amadahl law state, the Speed = 1 / (1 -P) + P /N  (N: number of
> > > > > cores)
> > > > > now let's take as an example we are working with a scalable lock-free
> > > > > Hashtable
> > > > > now, if the transactions(find,insert..) are coming to the lock-free
> > > > > hashtable
> > > > > and the jobs(transactions) are processed  in an M/M/m manner with
> > > > > multiple
> > > > > threads and cores servicing the jobs, that's very good...
>
> > > > > Now suppose that the application like a lock-free Hashtable is
> > > > > working
> > > > > frequently with data  and  we are accessing big data directly in
> > > > > memory
> > > > > *AND* the data is randomly accessed *AND* there is contention between
> > > > > processors over the memory bus in and M/M/1 manner , so, there will
> > > > > be
> > > > > an appreciable serial part 1-P in the Amadahl equation, hence:
>
> > > > > what would will you expect from the scalable lock-free Hashtable ?
>
> > > > > what would be then the performance ? i think this will be very bad.
>
> > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > not so good i think...
>
> > > > > Also i have another question:
>
> > > > > Read this:
>
> > > > >http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > > > How the performance of this lock-free Hashtable can scale well in
> > > > > performance
> > > > > on an Azul system with more than 700 cores, if we are using some very
> > > > > big in
> > > > > memory data and accessing it *RANDOMLY* ? and what memory bus
> > > > > architechture
> > > > > are they using ?...
>
> > > > > What do you think about this problem ?
>
> > > > > Regards,
> > > > > Amine Moulay Ramdane.- Hide quoted text -
>
> > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

== 2 of 10 ==
Date: Fri, Jul 31 2009 10:34 am
From: Amine

Hello all,


So now let's make it clear...:

If someone say to you that an ALGORITHM is *SCALABLE*
in performance: let say a lockfree hash, a lockfree list , a
lockfree queue , a lockfree stack...

to say it is *NOT* sufficient to make it *REALLY* scalable in
performance.

You have to look at three things:

ALGORITHM + HARDWARE + DATA


and if for example the essence of the *DATA* accessed is
random and does miss completly or a lot the cache , you
can NOT parallelize by using multiple cache in PARALLEL.
in an M/M/m manner

=> how you can you say that it's scalable in performance ?

=> It's simply NOT scalable.

Regards,
Amine Moulay Ramdane.


On Jul 31, 12:06 pm, Amine <ami...@colba.net> wrote:
> Hello all,
>
> I am still *thinking*...
>
> Please look with me carefully...
>
> I have done for example a parallel program, please look at
> Parallel Matrix demo:http://www.colba.net/~aminer/
>
> and this is working fine and it's scalable because there is
> an appreciable (1-P) in the Amadhal equation: because it's
> using the cache and doing some calculations in *PARALLEL*
>
> But there is still a problem...
>
> I  wrote:
>
> "Now if you have followed with me, i think there is three things:
> You have to look at the *scalability* from three perspectives:
>
> ALGORITHM + HARDWARE + DATA"
>
> and i wrote:
>
> "If you have noticed, this problem can still occur with my lock-free
> Threadpool and  lockfree_SPMC."
>
> If you have noticed, i am using a queue for every worker thread etc..
>
> So now my question is:
>
> How can we say that my lockfree Threadpool and lockfree_SPMC
> queue is scalable in performance if  for every transaction in the pop
> ()
> side we are missing the cache(cause of *RANDOM* data that miss
> the cache: so we are not servicing in an M/M/m manner: that means
> bringing the data from the caches in *PARALLEL*, but we are are
> just servicing in an M/M/1 manner with contention on a single
> memory bus and with the lower service rate of the memory ??
>
> Do you undertand ?
>
> And how do you think about this problem ?
>
> Regards,
> Amine Moulay Ramdane.
>
> On Jul 30, 1:29 pm, Amine <ami...@colba.net> wrote:
>
>
>
> > Hello again,
>
> > I was still *thinking*...
>
> > Now if you have followed with me, i think there is three things:
> > You have to look at the *scalability* from three perspectives:
>
> > ALGORITHM + HARDWARE + DATA
>
> > I wrote:
>
> > "I mean what is the benefit of a very FAST  Ferrari car if the roads
> > are not so good and they lower a lot its speed ?"
>
> > Now please follow with me..
>
> > I wrote:
>
> > [1] "So, let say for example that we are runnning under an Azul system
> >  with more than 700 cores:  http://www.azulsystems.com/andusing
> >  this lock-free Hash  table:http://www.azulsystems.com/events/
> > javaone_2007/2007_LockFreeHash.pdf
>
> >  and millions of transactions (let say find() transactions that search
> >  and return some data) are coming with an arrival rate, those
> > transations
> >  are mostly memory bound and let suppose an instant that those are
> > mostly
> >  random transactions that *miss* the cash, so that they are serviced
> > not in an
> >  M/M/m manner but in M/M/1 with the memory as the server and with its
> > low service
> >  rate lower than the caches, that will be very bad. So in this
> > situation: how can even
> >  this Azul system with more than 700 cores with let say a lock-free
> > Hash-table scale
> >  in performance ?"
>
> > If you have noticed, this problem can still occur with my lock-free
> > Threadpool and  lockfree_SPMC.
>
> > Now this problem doesn't come from the ALGORITHM but can
> > come from the HARDWARE: one bus that is servicing the jobs
> > in an M/M/1 manner, or/and come from the essence of
> > the DATA: random access to memory that *miss* the cash.
>
> > Dave Butenhof  wrote in a post:
>
> > "And maybe you can think about improving the road... "
>
> > But if you have noticed the 'bad road' can be the essence of DATA in
> > itself.
>
> > So let take as an example the folowing situation
> > (that is not the worst case):
>
> > Let say that the access of the big in-memory data  is not so
> > random and 3 meg of data is frequently used in let say 80% of
> > the time, so what i want is to avoid *completly* the situation
> > where the 3 meg cached data that is used 80% of the time will
> > not be erased  completly from the cache: a situation that
> > can look like a livelock  if we have a very high transaction arrival
> > rate....
>
> > So, what i want in this situation is also an explicit control over
> > the cache , that means to *force* the cache to lock 3 meg of data
> > in a cache of let say 4 meg (to be able to parallelize like an M/M/
> > m...)
>
> > Hence my question:
>
> > Wich hardware can do this sort of thing in this sort of situation ?
>
> > Regards,
> > Amine Moulay Ramdane.
>
> > On Jul 29, 8:34 pm, Amine <ami...@colba.net> wrote:
>
> > > Hello all,
>
> > > I wrote:
>
> > > "I mean what is the benefit of a very FAST  Ferari car if the roads
> > > are
> > > not so good and they lower a lot its speed ?"
>
> > > I was still *thinking* ...
>
> > > So, let say for example that we are runnning under an Azul system
> > > with more than 700 cores:  http://www.azulsystems.com/andusing
> > > this lock-free Hash  table:http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > and millions of transactions (let say find() transactions that search
> > > and return
> > > some data) are coming with an arrival rate, those transations are
> > > mostly
> > > memory bound and let suppose an instant that those are mostly random
> > > transactions
> > > that *miss* the cash, so that they are serviced not in an M/M/m manner
> > > but in M/M/1
> > > with the memory as the server and with its low service rate lower than
> > > the caches,
> > > that will be very bad. So in this situation: how can even this Azul
> > > system with more
> > > than 700 cores with let say a lock-free Hash-table scale in
> > > performance ?
>
> > > Regards,
> > > Amine.
>
> > > On Jul 29, 3:32 pm, Amine <ami...@colba.net> wrote:
>
> > > > On Jul 29, 3:21 pm, Amine <ami...@colba.net> wrote:
>
> > > > > I wrote:
> > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > > not so good i think...
>
> > > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > > not so good
> > > > > and they lower a lot it's speed ?
>
> > > > typo:
>
> > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > not so good and they lower a lot its speed ?
>
> > > > Regards,
> > > > Amine Moulay Ramdane.
>
> > > > > What is the benefit of a very fast lock-free hash-table if the in
> > > > > memory data
> > > > > it access is big and it's *randomly* accessed: and this will lower the
> > > > > speed
> > > > > *AND*
> > > > > also  there is a contention on the bus in an M/M/1  manner and this
> > > > > will also
> > > > > lower the speed ?
>
> > > > > Regards,
> > > > > Amine. Moulay Ramdane.
>
> > > > > On Jul 29, 2:35 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > Hello all,
>
> > > > > > I was thinking about my lock-free Threadpool and lock-free_SPMC
>
> > > > > >http://www.colba.net/~aminer/
>
> > > > > > i am using a queue for every thread worker ( and in the producer side
> > > > > > you can push in parallel also) and i have come to the conclusion that
> > > > > > this will scale very well.in the software side...
>
> > > > > > But i was thinking more about *scalability* and i have come to the
> > > > > > conclusion that there is still a problem, please follow with me:
>
> > > > > > As Amadahl law state, the Speed = 1 / (1 -P) + P /N  (N: number of
> > > > > > cores)
> > > > > > now let's take as an example we are working with a scalable lock-free
> > > > > > Hashtable
> > > > > > now, if the transactions(find,insert..) are coming to the lock-free
> > > > > > hashtable
> > > > > > and the jobs(transactions) are processed  in an M/M/m manner with
> > > > > > multiple
> > > > > > threads and cores servicing the jobs, that's very good...
>
> > > > > > Now suppose that the application like a lock-free Hashtable is
> > > > > > working
> > > > > > frequently with data  and  we are accessing big data directly in
> > > > > > memory
> > > > > > *AND* the data is randomly accessed *AND* there is contention between
> > > > > > processors over the memory bus in and M/M/1 manner , so, there will
> > > > > > be
> > > > > > an appreciable serial part 1-P in the Amadahl equation, hence:
>
> > > > > > what would will you expect from the scalable lock-free Hashtable ?
>
> > > > > > what would be then the performance ? i think this will be very bad.
>
> > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > > not so good i think...
>
> > > > > > Also i have another question:
>
> > > > > > Read this:
>
> > > > > >http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > > > > How the performance of this lock-free Hashtable can scale well in
> > > > > > performance
> > > > > > on an Azul system with more than 700 cores, if we are using some very
> > > > > > big in
> > > > > > memory data and accessing it *RANDOMLY* ? and what memory bus
> > > > > > architechture
> > > > > > are they using ?...
>
> > > > > > What do you think about this problem ?
>
> > > > > > Regards,
> > > > > > Amine Moulay Ramdane.- Hide quoted text -
>
> > > > > - Show quoted text -- Hide quoted text -
>
> > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

== 3 of 10 ==
Date: Fri, Jul 31 2009 10:49 am
From: Amine


On Jul 31, 1:34 pm, Amine <ami...@colba.net> wrote:
> Hello all,
>
> So now let's make it clear...:
>
> If someone say to you that an ALGORITHM is *SCALABLE*
> in performance: let say a lockfree hash, a lockfree list , a
> lockfree queue , a lockfree stack...
>
> to say it is *NOT* sufficient to make it *REALLY* scalable in
> performance.
>
> You have to look at three things:
>
>  ALGORITHM + HARDWARE + DATA
>
> and if for example the essence of the *DATA* accessed is
> random and does miss completly or a lot the cache , you
> can NOT parallelize by using multiple cache in PARALLEL.
> in an M/M/m manner

So they are serviced by the memory as the server in an M/M/1 manner,
with the low service rate of the memory bus....

>
> => how you can you say that it's scalable in performance ?
>
> => It's simply NOT scalable.


I was speaking about those lock-free and wait-free algorithms:
like: stack,queue,list,hashtable..

I was not speaking about the CPU bound algorithms.


Do you understand now ?


Regards,
Amine Moulay Ramdane.


>
> Regards,
> Amine Moulay Ramdane.
>
> On Jul 31, 12:06 pm, Amine <ami...@colba.net> wrote:
>
>
>
> > Hello all,
>
> > I am still *thinking*...
>
> > Please look with me carefully...
>
> > I have done for example a parallel program, please look at
> > Parallel Matrix demo:http://www.colba.net/~aminer/
>
> > and this is working fine and it's scalable because there is
> > an appreciable (1-P) in the Amadhal equation: because it's
> > using the cache and doing some calculations in *PARALLEL*
>
> > But there is still a problem...
>
> > I  wrote:
>
> > "Now if you have followed with me, i think there is three things:
> > You have to look at the *scalability* from three perspectives:
>
> > ALGORITHM + HARDWARE + DATA"
>
> > and i wrote:
>
> > "If you have noticed, this problem can still occur with my lock-free
> > Threadpool and  lockfree_SPMC."
>
> > If you have noticed, i am using a queue for every worker thread etc..
>
> > So now my question is:
>
> > How can we say that my lockfree Threadpool and lockfree_SPMC
> > queue is scalable in performance if  for every transaction in the pop
> > ()
> > side we are missing the cache(cause of *RANDOM* data that miss
> > the cache: so we are not servicing in an M/M/m manner: that means
> > bringing the data from the caches in *PARALLEL*, but we are are
> > just servicing in an M/M/1 manner with contention on a single
> > memory bus and with the lower service rate of the memory ??
>
> > Do you undertand ?
>
> > And how do you think about this problem ?
>
> > Regards,
> > Amine Moulay Ramdane.
>
> > On Jul 30, 1:29 pm, Amine <ami...@colba.net> wrote:
>
> > > Hello again,
>
> > > I was still *thinking*...
>
> > > Now if you have followed with me, i think there is three things:
> > > You have to look at the *scalability* from three perspectives:
>
> > > ALGORITHM + HARDWARE + DATA
>
> > > I wrote:
>
> > > "I mean what is the benefit of a very FAST  Ferrari car if the roads
> > > are not so good and they lower a lot its speed ?"
>
> > > Now please follow with me..
>
> > > I wrote:
>
> > > [1] "So, let say for example that we are runnning under an Azul system
> > >  with more than 700 cores:  http://www.azulsystems.com/andusing
> > >  this lock-free Hash  table:http://www.azulsystems.com/events/
> > > javaone_2007/2007_LockFreeHash.pdf
>
> > >  and millions of transactions (let say find() transactions that search
> > >  and return some data) are coming with an arrival rate, those
> > > transations
> > >  are mostly memory bound and let suppose an instant that those are
> > > mostly
> > >  random transactions that *miss* the cash, so that they are serviced
> > > not in an
> > >  M/M/m manner but in M/M/1 with the memory as the server and with its
> > > low service
> > >  rate lower than the caches, that will be very bad. So in this
> > > situation: how can even
> > >  this Azul system with more than 700 cores with let say a lock-free
> > > Hash-table scale
> > >  in performance ?"
>
> > > If you have noticed, this problem can still occur with my lock-free
> > > Threadpool and  lockfree_SPMC.
>
> > > Now this problem doesn't come from the ALGORITHM but can
> > > come from the HARDWARE: one bus that is servicing the jobs
> > > in an M/M/1 manner, or/and come from the essence of
> > > the DATA: random access to memory that *miss* the cash.
>
> > > Dave Butenhof  wrote in a post:
>
> > > "And maybe you can think about improving the road... "
>
> > > But if you have noticed the 'bad road' can be the essence of DATA in
> > > itself.
>
> > > So let take as an example the folowing situation
> > > (that is not the worst case):
>
> > > Let say that the access of the big in-memory data  is not so
> > > random and 3 meg of data is frequently used in let say 80% of
> > > the time, so what i want is to avoid *completly* the situation
> > > where the 3 meg cached data that is used 80% of the time will
> > > not be erased  completly from the cache: a situation that
> > > can look like a livelock  if we have a very high transaction arrival
> > > rate....
>
> > > So, what i want in this situation is also an explicit control over
> > > the cache , that means to *force* the cache to lock 3 meg of data
> > > in a cache of let say 4 meg (to be able to parallelize like an M/M/
> > > m...)
>
> > > Hence my question:
>
> > > Wich hardware can do this sort of thing in this sort of situation ?
>
> > > Regards,
> > > Amine Moulay Ramdane.
>
> > > On Jul 29, 8:34 pm, Amine <ami...@colba.net> wrote:
>
> > > > Hello all,
>
> > > > I wrote:
>
> > > > "I mean what is the benefit of a very FAST  Ferari car if the roads
> > > > are
> > > > not so good and they lower a lot its speed ?"
>
> > > > I was still *thinking* ...
>
> > > > So, let say for example that we are runnning under an Azul system
> > > > with more than 700 cores:  http://www.azulsystems.com/andusing
> > > > this lock-free Hash  table:http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > > and millions of transactions (let say find() transactions that search
> > > > and return
> > > > some data) are coming with an arrival rate, those transations are
> > > > mostly
> > > > memory bound and let suppose an instant that those are mostly random
> > > > transactions
> > > > that *miss* the cash, so that they are serviced not in an M/M/m manner
> > > > but in M/M/1
> > > > with the memory as the server and with its low service rate lower than
> > > > the caches,
> > > > that will be very bad. So in this situation: how can even this Azul
> > > > system with more
> > > > than 700 cores with let say a lock-free Hash-table scale in
> > > > performance ?
>
> > > > Regards,
> > > > Amine.
>
> > > > On Jul 29, 3:32 pm, Amine <ami...@colba.net> wrote:
>
> > > > > On Jul 29, 3:21 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > I wrote:
> > > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > > > not so good i think...
>
> > > > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > > > not so good
> > > > > > and they lower a lot it's speed ?
>
> > > > > typo:
>
> > > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > > not so good and they lower a lot its speed ?
>
> > > > > Regards,
> > > > > Amine Moulay Ramdane.
>
> > > > > > What is the benefit of a very fast lock-free hash-table if the in
> > > > > > memory data
> > > > > > it access is big and it's *randomly* accessed: and this will lower the
> > > > > > speed
> > > > > > *AND*
> > > > > > also  there is a contention on the bus in an M/M/1  manner and this
> > > > > > will also
> > > > > > lower the speed ?
>
> > > > > > Regards,
> > > > > > Amine. Moulay Ramdane.
>
> > > > > > On Jul 29, 2:35 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > > Hello all,
>
> > > > > > > I was thinking about my lock-free Threadpool and lock-free_SPMC
>
> > > > > > >http://www.colba.net/~aminer/
>
> > > > > > > i am using a queue for every thread worker ( and in the producer side
> > > > > > > you can push in parallel also) and i have come to the conclusion that
> > > > > > > this will scale very well.in the software side...
>
> > > > > > > But i was thinking more about *scalability* and i have come to the
> > > > > > > conclusion that there is still a problem, please follow with me:
>
> > > > > > > As Amadahl law state, the Speed = 1 / (1 -P) + P /N  (N: number of
> > > > > > > cores)
> > > > > > > now let's take as an example we are working with a scalable lock-free
> > > > > > > Hashtable
> > > > > > > now, if the transactions(find,insert..) are coming to the lock-free
> > > > > > > hashtable
> > > > > > > and the jobs(transactions) are processed  in an M/M/m manner with
> > > > > > > multiple
> > > > > > > threads and cores servicing the jobs, that's very good...
>
> > > > > > > Now suppose that the application like a lock-free Hashtable is
> > > > > > > working
> > > > > > > frequently with data  and  we are accessing big data directly in
> > > > > > > memory
> > > > > > > *AND* the data is randomly accessed *AND* there is contention between
> > > > > > > processors over the memory bus in and M/M/1 manner , so, there will
> > > > > > > be
> > > > > > > an appreciable serial part 1-P in the Amadahl equation, hence:
>
> > > > > > > what would will you expect from the scalable lock-free Hashtable ?
>
> > > > > > > what would be then the performance ? i think this will be very bad.
>
> > > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > > > not so good i think...
>
> > > > > > > Also i have another question:
>
> > > > > > > Read this:
>
> > > > > > >http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > > > > > How the performance of this lock-free Hashtable can scale well in
> > > > > > > performance
> > > > > > > on an Azul system with more than 700 cores, if we are using some very
> > > > > > > big in
> > > > > > > memory data and accessing it *RANDOMLY* ? and what memory bus
> > > > > > > architechture
> > > > > > > are they using ?...
>
> > > > > > > What do you think about this problem ?
>
> > > > > > > Regards,
> > > > > > > Amine Moulay Ramdane.- Hide quoted text -
>
> > > > > > - Show
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -

== 4 of 10 ==
Date: Fri, Jul 31 2009 12:24 pm
From: Amine


Hello,

We have the abstraction and the reality.

If you are designing an abstract algorithm like a lock-free
or wait-free datastructure, example: a list or hashtable...
and you are interrest in *scalability*, do you take into
account the present reality ?

Of course, as an example: false sharing.

Now the same thing:

If you are saying that your ALGORITHM is scalable
in performance , i think we have to take into account the
present REALITY and look at the present HARDWARE
and the essence of the DATA..

As i wrote before:

"Now if you have followed with me, i think there is three things:
You have to look at the *scalability* from three perspectives:

ALGORITHM + HARDWARE + DATA"


Now, let say for example that you are using those wait-free and
lock-free datastructures in a DATA intensive manner and let say
you are receiving a high rate of transations like find() that search
and RETURN the data and you are missing completly or a lot the
cash: so you are not working in an M/M/m manner with the caches
as the servers but in an M/M/1 manner with the memory (bus) as
the server => low service rate.

What will be the benefit ?

Can you still say those wait-free or lock-free datastructures
are scalable in performance ?

Do we take into account the REALITY ?

I think yes.

So, if we take into account the REALITY: we will say that
they are NOT scalable in performance.

Hence, I will say that they are NOT scalable in performance.

Regards,
Amine Moulay Ramdane.


On Jul 31, 1:49 pm, Amine <ami...@colba.net> wrote:
> On Jul 31, 1:34 pm, Amine <ami...@colba.net> wrote:
>
>
>
>
>
> > Hello all,
>
> > So now let's make it clear...:
>
> > If someone say to you that an ALGORITHM is *SCALABLE*
> > in performance: let say a lockfree hash, a lockfree list , a
> > lockfree queue , a lockfree stack...
>
> > to say it is *NOT* sufficient to make it *REALLY* scalable in
> > performance.
>
> > You have to look at three things:
>
> >  ALGORITHM + HARDWARE + DATA
>
> > and if for example the essence of the *DATA* accessed is
> > random and does miss completly or a lot the cache , you
> > can NOT parallelize by using multiple cache in PARALLEL.
> > in an M/M/m manner
>
> So they are serviced by the memory as the server in an M/M/1 manner,
> with the low service rate of the memory bus....
>
>
>
> > => how you can you say that it's scalable in performance ?
>
> > => It's simply NOT scalable.
>
> I was speaking about those lock-free and wait-free algorithms:
> like: stack,queue,list,hashtable..
>
> I was not speaking about the CPU bound algorithms.
>
> Do you understand now ?
>
> Regards,
> Amine Moulay Ramdane.
>
>
>
>
>
> > Regards,
> > Amine Moulay Ramdane.
>
> > On Jul 31, 12:06 pm, Amine <ami...@colba.net> wrote:
>
> > > Hello all,
>
> > > I am still *thinking*...
>
> > > Please look with me carefully...
>
> > > I have done for example a parallel program, please look at
> > > Parallel Matrix demo:http://www.colba.net/~aminer/
>
> > > and this is working fine and it's scalable because there is
> > > an appreciable (1-P) in the Amadhal equation: because it's
> > > using the cache and doing some calculations in *PARALLEL*
>
> > > But there is still a problem...
>
> > > I  wrote:
>
> > > "Now if you have followed with me, i think there is three things:
> > > You have to look at the *scalability* from three perspectives:
>
> > > ALGORITHM + HARDWARE + DATA"
>
> > > and i wrote:
>
> > > "If you have noticed, this problem can still occur with my lock-free
> > > Threadpool and  lockfree_SPMC."
>
> > > If you have noticed, i am using a queue for every worker thread etc..
>
> > > So now my question is:
>
> > > How can we say that my lockfree Threadpool and lockfree_SPMC
> > > queue is scalable in performance if  for every transaction in the pop
> > > ()
> > > side we are missing the cache(cause of *RANDOM* data that miss
> > > the cache: so we are not servicing in an M/M/m manner: that means
> > > bringing the data from the caches in *PARALLEL*, but we are are
> > > just servicing in an M/M/1 manner with contention on a single
> > > memory bus and with the lower service rate of the memory ??
>
> > > Do you undertand ?
>
> > > And how do you think about this problem ?
>
> > > Regards,
> > > Amine Moulay Ramdane.
>
> > > On Jul 30, 1:29 pm, Amine <ami...@colba.net> wrote:
>
> > > > Hello again,
>
> > > > I was still *thinking*...
>
> > > > Now if you have followed with me, i think there is three things:
> > > > You have to look at the *scalability* from three perspectives:
>
> > > > ALGORITHM + HARDWARE + DATA
>
> > > > I wrote:
>
> > > > "I mean what is the benefit of a very FAST  Ferrari car if the roads
> > > > are not so good and they lower a lot its speed ?"
>
> > > > Now please follow with me..
>
> > > > I wrote:
>
> > > > [1] "So, let say for example that we are runnning under an Azul system
> > > >  with more than 700 cores:  http://www.azulsystems.com/andusing
> > > >  this lock-free Hash  table:http://www.azulsystems.com/events/
> > > > javaone_2007/2007_LockFreeHash.pdf
>
> > > >  and millions of transactions (let say find() transactions that search
> > > >  and return some data) are coming with an arrival rate, those
> > > > transations
> > > >  are mostly memory bound and let suppose an instant that those are
> > > > mostly
> > > >  random transactions that *miss* the cash, so that they are serviced
> > > > not in an
> > > >  M/M/m manner but in M/M/1 with the memory as the server and with its
> > > > low service
> > > >  rate lower than the caches, that will be very bad. So in this
> > > > situation: how can even
> > > >  this Azul system with more than 700 cores with let say a lock-free
> > > > Hash-table scale
> > > >  in performance ?"
>
> > > > If you have noticed, this problem can still occur with my lock-free
> > > > Threadpool and  lockfree_SPMC.
>
> > > > Now this problem doesn't come from the ALGORITHM but can
> > > > come from the HARDWARE: one bus that is servicing the jobs
> > > > in an M/M/1 manner, or/and come from the essence of
> > > > the DATA: random access to memory that *miss* the cash.
>
> > > > Dave Butenhof  wrote in a post:
>
> > > > "And maybe you can think about improving the road... "
>
> > > > But if you have noticed the 'bad road' can be the essence of DATA in
> > > > itself.
>
> > > > So let take as an example the folowing situation
> > > > (that is not the worst case):
>
> > > > Let say that the access of the big in-memory data  is not so
> > > > random and 3 meg of data is frequently used in let say 80% of
> > > > the time, so what i want is to avoid *completly* the situation
> > > > where the 3 meg cached data that is used 80% of the time will
> > > > not be erased  completly from the cache: a situation that
> > > > can look like a livelock  if we have a very high transaction arrival
> > > > rate....
>
> > > > So, what i want in this situation is also an explicit control over
> > > > the cache , that means to *force* the cache to lock 3 meg of data
> > > > in a cache of let say 4 meg (to be able to parallelize like an M/M/
> > > > m...)
>
> > > > Hence my question:
>
> > > > Wich hardware can do this sort of thing in this sort of situation ?
>
> > > > Regards,
> > > > Amine Moulay Ramdane.
>
> > > > On Jul 29, 8:34 pm, Amine <ami...@colba.net> wrote:
>
> > > > > Hello all,
>
> > > > > I wrote:
>
> > > > > "I mean what is the benefit of a very FAST  Ferari car if the roads
> > > > > are
> > > > > not so good and they lower a lot its speed ?"
>
> > > > > I was still *thinking* ...
>
> > > > > So, let say for example that we are runnning under an Azul system
> > > > > with more than 700 cores:  http://www.azulsystems.com/andusing
> > > > > this lock-free Hash  table:http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
>
> > > > > and millions of transactions (let say find() transactions that search
> > > > > and return
> > > > > some data) are coming with an arrival rate, those transations are
> > > > > mostly
> > > > > memory bound and let suppose an instant that those are mostly random
> > > > > transactions
> > > > > that *miss* the cash, so that they are serviced not in an M/M/m manner
> > > > > but in M/M/1
> > > > > with the memory as the server and with its low service rate lower than
> > > > > the caches,
> > > > > that will be very bad. So in this situation: how can even this Azul
> > > > > system with more
> > > > > than 700 cores with let say a lock-free Hash-table scale in
> > > > > performance ?
>
> > > > > Regards,
> > > > > Amine.
>
> > > > > On Jul 29, 3:32 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > On Jul 29, 3:21 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > > I wrote:
> > > > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > > > we are using some big in memory data  and accessing it *randomly*
> > > > > > > > with a lock-free Hash-table , what will be the performance then ?
> > > > > > > > not so good i think...
>
> > > > > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > > > > not so good
> > > > > > > and they lower a lot it's speed ?
>
> > > > > > typo:
>
> > > > > > I mean what is the benefit of a very FAST  Ferari car if the roads are
> > > > > > not so good and they lower a lot its speed ?
>
> > > > > > Regards,
> > > > > > Amine Moulay Ramdane.
>
> > > > > > > What is the benefit of a very fast lock-free hash-table if the in
> > > > > > > memory data
> > > > > > > it access is big and it's *randomly* accessed: and this will lower the
> > > > > > > speed
> > > > > > > *AND*
> > > > > > > also  there is a contention on the bus in an M/M/1  manner and this
> > > > > > > will also
> > > > > > > lower the speed ?
>
> > > > > > > Regards,
> > > > > > > Amine. Moulay Ramdane.
>
> > > > > > > On Jul 29, 2:35 pm, Amine <ami...@colba.net> wrote:
>
> > > > > > > > Hello all,
>
> > > > > > > > I was thinking about my lock-free Threadpool and lock-free_SPMC
>
> > > > > > > >http://www.colba.net/~aminer/
>
> > > > > > > > i am using a queue for every thread worker ( and in the producer side
> > > > > > > > you can push in parallel also) and i have come to the conclusion that
> > > > > > > > this will scale very well.in the software side...
>
> > > > > > > > But i was thinking more about *scalability* and i have come to the
> > > > > > > > conclusion that there is still a problem, please follow with me:
>
> > > > > > > > As Amadahl law state, the Speed = 1 / (1 -P) + P /N  (N: number of
> > > > > > > > cores)
> > > > > > > > now let's take as an example we are working with a scalable lock-free
> > > > > > > > Hashtable
> > > > > > > > now, if the transactions(find,insert..) are coming to the lock-free
> > > > > > > > hashtable
> > > > > > > > and the jobs(transactions) are processed  in an M/M/m manner with
> > > > > > > > multiple
> > > > > > > > threads and cores servicing the jobs, that's very good...
>
> > > > > > > > Now suppose that the application like a lock-free Hashtable is
> > > > > > > > working
> > > > > > > > frequently with data  and  we are accessing big data directly in
> > > > > > > > memory
> > > > > > > > *AND* the data is randomly accessed *AND* there is contention between
> > > > > > > > processors over the memory bus in and M/M/1 manner , so, there will
> > > > > > > > be
> > > > > > > > an appreciable serial part 1-P in the Amadahl equation, hence:
>
> > > > > > > > what would will you expect from the scalable lock-free Hashtable ?
>
> > > > > > > > what would be then the performance ? i think this will be very bad.
>
> > > > > > > > So, now imagine that we elevate the problem by using an M/M/m in
> > > > > > > > the bus memory side , this will lower contention.  But imagine that
> > > > > > > > we are using some big in memory data  and accessing
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

== 5 of 10 ==
Date: Fri, Jul 31 2009 2:33 pm
From: "Dmitriy V'jukov"


On 31 июл, 23:24, Amine <ami...@colba.net> wrote:


What do you mean by all these "M/M/1" and "M/M/m"? I am totally
confused.


--
Dmitriy V'jukov


== 6 of 10 ==
Date: Fri, Jul 31 2009 3:33 pm
From: Amine


On Jul 31, 5:33 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
> On 31 июл, 23:24, Amine <ami...@colba.net> wrote:
>
> What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> confused.

M means Markov: because of the memoryless nature of the
exponential distribution it makes the arrival and departure
processes into continuous-time versions of Marov chain.

M/M/m means the arrival process(client side: jobs,transations)
and service process(in the server side) is exponentially distributed,
m: means the number of servers..


Regards,
Amine.

>
> --
> Dmitriy V'jukov

== 7 of 10 ==
Date: Fri, Jul 31 2009 5:25 pm
From: Amine


On Jul 31, 5:33 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
> On 31 июл, 23:24, Amine <ami...@colba.net> wrote:
>
> What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> confused.
>
> --
> Dmitriy V'jukov


Let's take for example there is a high rate of transactions
like find('key',out obj), that are serviced by a wait-free or a
lock-free hash-table. After 'obj' is returned we copy the data
inside the obj to somewhere else over some very fast networks.

What is the benefit of those wait-free or lock-free datastructures
if the accessed memory data inside 'obj' is random and we are
missing completly or a lot the cash: so that we will not be
servicing in an M/M/m manner with the caches as the servers
but in an M/M/1 manner with the memory (bus) as the server:

=> LOW SERVICE RATE + NO SCALABILITY.


Regards,
Amine.

== 8 of 10 ==
Date: Fri, Jul 31 2009 6:45 pm
From: Amine


On Jul 31, 8:25 pm, Amine <ami...@colba.net> wrote:
> On Jul 31, 5:33 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
>
> > On 31 июл, 23:24, Amine <ami...@colba.net> wrote:
>
> > What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> > confused.
>
> > --
> > Dmitriy V'jukov
>
> Let's take for example there is a high rate of transactions
> like find('key',out obj), that are serviced by a wait-free or a
> lock-free hash-table. After 'obj' is returned we copy the data
> inside the obj to somewhere else over some very fast networks.
>
> What is the benefit of those wait-free or lock-free datastructures
> if the accessed memory data inside 'obj' is random and we are
> missing completly or a lot the cash: so that we will  not be
> servicing in an M/M/m manner with the caches as the servers
> but in an M/M/1 manner with the memory (bus) as the server:
>
>  => LOW SERVICE RATE + NO SCALABILITY.


And since there is no scalabity benefit in those search+copy
transations , how will you call that ?

I have simply said:

"What will be the benefit ?

Can you still say those wait-free or lock-free datastructures
are scalable in performance ?

Do we take into account the REALITY ?

I think yes.

So, if we take into account the REALITY: we will say that
they are NOT scalable in performance."


Regards,
Amine.

>
> Regards,
> Amine.

== 9 of 10 ==
Date: Fri, Jul 31 2009 8:19 pm
From: Amine

Dmitriy V'jukov wrote:
> What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> confused.

> --
> Dmitriy V'jukov


I wrote:
"Let's take for example there is a high rate of transactions
like find('key',out obj), that are serviced by a wait-free or a
lock-free hash-table. After 'obj' is returned we copy the data
inside the obj to somewhere else over some very fast networks.

What is the benefit of those wait-free or lock-free datastructures
if the accessed memory data inside 'obj' is random and we are
missing completly or a lot the cash: so that we will not be
servicing in an M/M/m manner with the caches as the servers
but in an M/M/1 manner with the memory (bus) as the server:

=> LOW SERVICE RATE + NO SCALABILITY.


Is it not something like a database that doesn't scale in REALITY ?

Even with those wait-free or lock-free data-strutures ?

How will you call that ?

And tell me who is responsible of this situation Dmitriy V'jukov ?

Is it the ALGORITHM ?

The HARDWARE ?

Or the DATA ?

Can i say the ALGORITHM ?

Or is it forbiden ?

So, i will say the DATA is reponsible.

Am i correct to say that ?

Or is it the HARDWARE ?


Regards,
Amine Moulay Ramdane.


== 10 of 10 ==
Date: Fri, Jul 31 2009 9:17 pm
From: Amine

I wrote:

" Is it not something like a database that doesn't scale in REALITY ?

Even with those wait-free or lock-free data-strutures ?

How will you call that ?

And tell me who is responsible of this situation Dmitriy V'jukov ?

Is it the ALGORITHM ?

The HARDWARE ?

Or the DATA ?

Can i say the ALGORITHM ?

Or is it forbiden ?

So, i will say the DATA is reponsible.

Am i correct to say that ?

Or is it the HARDWARE ?"


And what is the solution to this catastrophe Dmitriy V'jukov ?

Think with me my dear Dmitriy V'jukov...

Regards,
Amine.

On Jul 31, 11:19 pm, Amine <ami...@colba.net> wrote:
> Dmitriy V'jukov  wrote:
> > What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> > confused.
> > --
> > Dmitriy V'jukov
> I  wrote:
>
> "Let's take for example there is a high rate of transactions
> like find('key',out obj), that are serviced by a wait-free or a
> lock-free hash-table. After 'obj' is returned we copy the data
> inside the obj to somewhere else over some very fast networks.
>
> What is the benefit of those wait-free or lock-free datastructures
> if the accessed memory data inside 'obj' is random and we are
> missing completly or a lot the cash: so that we will  not be
> servicing in an M/M/m manner with the caches as the servers
>  but in an M/M/1 manner with the memory (bus) as the server:
>
>   => LOW SERVICE RATE + NO SCALABILITY.
>
> Is it not something like a database that doesn't scale in REALITY ?
>
> Even with those wait-free or lock-free data-strutures ?
>
> How will you call that ?
>
> And tell me who is responsible of this situation Dmitriy V'jukov  ?
>
> Is it the ALGORITHM ?
>
> The HARDWARE ?
>
> Or the  DATA ?
>
> Can i say the ALGORITHM ?
>
> Or is it forbiden ?
>
> So, i will say the DATA is reponsible.
>
> Am i correct to say that ?
>
> Or is it the HARDWARE ?
>
> Regards,
> Amine Moulay Ramdane.


==============================================================================
TOPIC: Spinlocks - trouble with speed
http://groups.google.com/group/comp.programming.threads/t/9f5e34251ccf45e3?hl=en
==============================================================================

== 1 of 7 ==
Date: Fri, Jul 31 2009 10:54 am
From: Thomas Richter


David Schwartz wrote:

> You want a jet plane and you present a jalopy. You don't "fix" the
> jalopy to make a jet plane, you get a jet plane.

Since you're not providing any hints how to get it right, I assume you
don't know either. In the latter case, please stop wasting bandwidth.

Thank you.

== 2 of 7 ==
Date: Fri, Jul 31 2009 10:57 am
From: Thomas Richter


Chris M. Thomasson wrote:
>
> "Chris M. Thomasson" <no@spam.invalid> wrote in message
> news:h4t8m2$10f9$1@news.ett.com.ua...
>>
>> "Thomas Richter" <thor@math.tu-berlin.de> wrote in message
>> news:h4sgvg$kd$1@infosun2.rus.uni-stuttgart.de...
>>> Hi folks,
>>>
>>> to protect a shared resource which needs to be locked only very
>>> shortly, I'm using a very simple spinlock technique:
> [...]
>> Why are you using an atomic RMW instruction to release the lock on an
>> x86? That is unnecessary overhead as you only need a plain MOV
>> instruction:
>>
>>
>> http://www.intel.com/Assets/PDF/manual/253668.pdf
>> (read section 8.2)
>>
>>
>> Also, why are you using FAA? Its clearly not needed. BTW, what's the
>> point of `Locked_Add()'?
>
> One other really important question... Are you sure that you have
> eliminated any chance of false-sharing between the lock word and any
> state within the critical section? Or between the lock work and any
> other lock word? You need to ensure that the lock word is padded up to
> and aligned on a L2 cache line.

I would hope so - the data around the counter remains constant
throughout the remaining program and is no longer touched or required. A
cache line is around 16 bytes, is that right?

So long,
Thomas

== 3 of 7 ==
Date: Fri, Jul 31 2009 12:57 pm
From: Chris Friesen


Thomas Richter wrote:
> Chris M. Thomasson wrote:

>> One other really important question... Are you sure that you have
>> eliminated any chance of false-sharing between the lock word and any
>> state within the critical section? Or between the lock work and any
>> other lock word? You need to ensure that the lock word is padded up to
>> and aligned on a L2 cache line.
>
> I would hope so - the data around the counter remains constant
> throughout the remaining program and is no longer touched or required. A
> cache line is around 16 bytes, is that right?

I think core2 has a line size of 64 bytes.

Chris


== 4 of 7 ==
Date: Fri, Jul 31 2009 1:04 pm
From: "Chris M. Thomasson"


"Thomas Richter" <thor@math.tu-berlin.de> wrote in message
news:h4vb59$8tp$2@infosun2.rus.uni-stuttgart.de...
> Chris M. Thomasson wrote:
>>
>> "Chris M. Thomasson" <no@spam.invalid> wrote in message
>> news:h4t8m2$10f9$1@news.ett.com.ua...
>>>
>>> "Thomas Richter" <thor@math.tu-berlin.de> wrote in message
>>> news:h4sgvg$kd$1@infosun2.rus.uni-stuttgart.de...
>>>> Hi folks,
>>>>
>>>> to protect a shared resource which needs to be locked only very
>>>> shortly, I'm using a very simple spinlock technique:
>> [...]
>>> Why are you using an atomic RMW instruction to release the lock on an
>>> x86? That is unnecessary overhead as you only need a plain MOV
>>> instruction:
>>>
>>>
>>> http://www.intel.com/Assets/PDF/manual/253668.pdf
>>> (read section 8.2)
>>>
>>>
>>> Also, why are you using FAA? Its clearly not needed. BTW, what's the
>>> point of `Locked_Add()'?
>>
>> One other really important question... Are you sure that you have
>> eliminated any chance of false-sharing between the lock word and any
>> state within the critical section? Or between the lock work and any other
>> lock word? You need to ensure that the lock word is padded up to and
>> aligned on a L2 cache line.
>
> I would hope so - the data around the counter remains constant throughout
> the remaining program and is no longer touched or required. A cache line
> is around 16 bytes, is that right?

More like 64-bytes, or 128-bytes. It's all platform dependant. You can
enforce this by doing something like:
_____________________________________________________________________
#include <stdio.h>
#include <assert.h>


#if ! defined (UINTPTR_TYPE)
# define UINTPTR_TYPE size_t

No comments: