soft and program: comp.programming.threads - 4 new messages in 1 topic

comp.programming.threads
http://groups.google.com/group/comp.programming.threads?hl=en

comp.programming.threads@googlegroups.com

Today's topics:

* removing store/load in Peterson's algorithm? - 4 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/6a7e7620b0cd85eb?hl=en

==============================================================================
TOPIC: removing store/load in Peterson's algorithm?
http://groups.google.com/group/comp.programming.threads/t/6a7e7620b0cd85eb?hl=en
==============================================================================

== 1 of 4 ==
Date: Tues, Dec 29 2009 7:57 pm
From: "Chris M. Thomasson"

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
news:47a995bf-fb05-4f57-a7ed-fa99bc3391a2@p8g2000yqb.googlegroups.com...
On Dec 23, 10:32 pm, "James" <n...@spam.invalid> wrote:

> > > > For the MFENCE version I am performing a 32-bit store to location a,
> > > > an
> > > > MFENCE then a 32-bit load from location b.
> >
> > > > For the collocation version I am performing a 16-bit store to
> > > > location a
> > > > then a 32-bit load from location a.
> >
> > > > That should be fair?
> > > Yes.
> > > > > I think that if collocation is observably faster than MFENCE then
> > > > > that
> > > > > means that it just does not work.
> >
> > > > The MFENCE version of the test is much slower than the collocation
> > > > version.
> > > > So, does the collocation trick not work on an x86?
> > > The first question I would answer is: does Peterson algorithm with
> > > collocation actually provide mutual exclusion on x86 (is it guaranteed
> > > or just seems to work on some processors)? If the answer is No then
> > > performance of collocation is irrelevant for me.
> >
> > I simply don't know if it works on x86. I don't know who to ask in order
> > to
> > verify this. I could create a test which does not prove anything even if
> > it
> > passes a hundred trillion times. Can you think of a way to verify that
> > collocation works on x86?

> Well, what about consulting those fine x86 manuals?

The performance of collocation trick seems to be little bit better than
`LOCK' prefix:

http://cpt.pastebin.com/f5a2d7337

I get 71.015 seconds for collocation, and 79.578 seconds for `LOCK' prefix
on old P4 (look at `PLOCK_MEMBAR' macro). Humm, I wonder if this violates
using different sized atomic operations to mutate a semaphore? Also, the
overhead for collocation seems to kick in on contention, while overhead of
LOCK prefix kicks in on a per-call basis.

:^o

== 2 of 4 ==
Date: Wed, Dec 30 2009 4:23 am
From: Dmitriy Vyukov

On Dec 30, 6:57 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> The performance of collocation trick seems to be little bit better than
> `LOCK' prefix:
>
> http://cpt.pastebin.com/f5a2d7337
>
> I get 71.015 seconds for collocation, and 79.578 seconds for `LOCK' prefix
> on old P4 (look at `PLOCK_MEMBAR' macro). Humm, I wonder if this violates
> using different sized atomic operations to mutate a semaphore? Also, the
> overhead for collocation seems to kick in on contention, while overhead of
> LOCK prefix kicks in on a per-call basis.
>
> :^o

And what are the numbers w/o contention? Does collocation
significantly faster than LOCK w/o contention?

--
Dmitriy V'jukov

== 3 of 4 ==
Date: Wed, Dec 30 2009 2:24 pm
From: "Chris M. Thomasson"

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
news:348fd5cb-a7ed-499f-a7ae-8547ae4b109f@n38g2000yqf.googlegroups.com...
On Dec 30, 6:57 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > The performance of collocation trick seems to be little bit better than
> > `LOCK' prefix:
> >
> > http://cpt.pastebin.com/f5a2d7337
> >
> > I get 71.015 seconds for collocation, and 79.578 seconds for `LOCK'
> > prefix
> > on old P4 (look at `PLOCK_MEMBAR' macro). Humm, I wonder if this
> > violates
> > using different sized atomic operations to mutate a semaphore? Also, the
> > overhead for collocation seems to kick in on contention, while overhead
> > of
> > LOCK prefix kicks in on a per-call basis.
> >
> > :^o

> And what are the numbers w/o contention? Does collocation
> significantly faster than LOCK w/o contention?

I removed contention by only creating a single thread. Simply comment out a
`pthread_create()' `pthread_join()' pair in the `main()' function. E.g.,

pthread_create(&tid[0], NULL, thread_1, NULL);
/* pthread_create(&tid[1], NULL, thread_2, NULL); */

/* pthread_join(tid[1], NULL); */
pthread_join(tid[0], NULL);

I am getting 35.125 seconds for collocation, and 40.954 seconds for `LOCK'
prefix. I should also test this against an actual `MFENCE' instruction.
Also, please note that this LOCK prefix is not operating on shared state:
__________________________________________________________
__declspec(naked)
void
atomic_membar(void)
{
_asm
{
MOV EAX, 0
LOCK XADD [ESP], EAX
RET
}
}
__________________________________________________________

While the collocation trick is actually operating on shared state.
Therefore, it might be interesting to check the performance against a `LOCK'
instruction on a piece of shared state:
__________________________________________________________
__declspec(naked)
void
atomic_membar(void)
{
static uword32 g_dummy = 0;

_asm
{
MOV EAX, 0
LOCK XADD [g_dummy], EAX
RET
}
}
__________________________________________________________

Humm.

== 4 of 4 ==
Date: Wed, Dec 30 2009 2:46 pm
From: "Chris M. Thomasson"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:LoQ_m.5821$YG1.548@newsfe14.iad...
> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
> news:348fd5cb-a7ed-499f-a7ae-8547ae4b109f@n38g2000yqf.googlegroups.com...
> On Dec 30, 6:57 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>> > The performance of collocation trick seems to be little bit better than
>> > `LOCK' prefix:
>> >
>> > http://cpt.pastebin.com/f5a2d7337
>> >
>> > I get 71.015 seconds for collocation, and 79.578 seconds for `LOCK'
>> > prefix
>> > on old P4 (look at `PLOCK_MEMBAR' macro). Humm, I wonder if this
>> > violates
>> > using different sized atomic operations to mutate a semaphore? Also,
>> > the
>> > overhead for collocation seems to kick in on contention, while overhead
>> > of
>> > LOCK prefix kicks in on a per-call basis.
>> >
>> > :^o
[...]
> While the collocation trick is actually operating on shared state.
> Therefore, it might be interesting to check the performance against a
> `LOCK' instruction on a piece of shared state:
> __________________________________________________________
> __declspec(naked)
> void
> atomic_membar(void)
> {
> static uword32 g_dummy = 0;
>
> _asm
> {
> MOV EAX, 0
> LOCK XADD [g_dummy], EAX
> RET
> }
> }
> __________________________________________________________

It has similar results, 40.012 seconds, `MFENCE' is basically equal to
`LOCK' at 38.125 seconds. I need to run the test presented by James to see
if I can recreate the numbers that are similar to the ones he reported.

WRT the test code, on average, I would say that collocation is getting
around 7 to 9 second improvement over `LOCK' and `MFENCE' in high contention
environment. Collocation get around 4 to 6 second improvement in zero
contention scenario on a P4. Not sure if it's worth it. However, those
seconds do add up...

;^)

I believe that one could use collocation to get around the explicit
`#StoreLoad' in the work-stealing deque algorithm. Instead of storing to the
tail then loading the head you can do 32-bit store in tail and single 64-bit
load to get head and tail. That should eliminate the `#StoreLoad' in
between.

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.programming.threads"
group.

To post to this group, visit http://groups.google.com/group/comp.programming.threads?hl=en

To unsubscribe from this group, send email to comp.programming.threads+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.programming.threads/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

soft and program

Thursday, December 31, 2009

comp.programming.threads - 4 new messages in 1 topic - digest

No comments:

Blog Archive

About Me