soft and program: comp.programming.threads - 18 new messages in 4 topics

comp.programming.threads
http://groups.google.com/group/comp.programming.threads?hl=en

comp.programming.threads@googlegroups.com

Today's topics:

* __asm__ cmpxchg8b/cmpxchg16b - 6 messages, 3 authors
http://groups.google.com/group/comp.programming.threads/t/791f2415da140c34?hl=en
* Would this work on even one platform? - 2 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/de8a88ed7fa913d4?hl=en
* Lock free queue idea. - 7 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/82066179448783da?hl=en
* Intrusive / non-intrusive - 3 messages, 2 authors
http://groups.google.com/group/comp.programming.threads/t/6a7e7e1c0f59d89e?hl=en

==============================================================================
TOPIC: __asm__ cmpxchg8b/cmpxchg16b
http://groups.google.com/group/comp.programming.threads/t/791f2415da140c34?hl=en
==============================================================================

== 1 of 6 ==
Date: Fri, Nov 27 2009 11:45 pm
From: "C Warwick"

"Chris M. Thomasson" <no@spam.invalid> wrote in message news:h7pkr2$1ed3
> void
> mpmcq_push(
> struct mpmcq* const self,
> struct node* node
> ) {
> struct node* prev;
> node->m_next = NULL;
> prev = ATOMIC_SWAP_RELAXED(&self->head, node); // #1
> ATOMIC_STORE_RELEASE(&prev->next, node); // #2
> }

Assuming I understand the ATOMIC____ functions right, wouldnt that basically
detach the two ends of the queue between lines #1 and #2? After #1 the head
points at the new node, but the last node in the queue doesnt join up yet.
So if a thread stalled between those two lines, consumers wouldnt be able to
get past that break, new items could be added, but they'd be inaccessable to
consumers until the stalled thread completed the join?

== 2 of 6 ==
Date: Sat, Nov 28 2009 12:36 am
From: Dmitriy Vyukov

On 28 ноя, 10:45, "C Warwick" <m...@not.here> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in message news:h7pkr2$1ed3
>
> > void
> > mpmcq_push(
> > struct mpmcq* const self,
> > struct node* node
> > ) {
> > struct node* prev;
> > node->m_next = NULL;
> > prev = ATOMIC_SWAP_RELAXED(&self->head, node); // #1
> > ATOMIC_STORE_RELEASE(&prev->next, node); // #2
> > }
>
> Assuming I understand the ATOMIC____ functions right, wouldnt that basically
> detach the two ends of the queue between lines #1 and #2? After #1 the head
> points at the new node, but the last node in the queue doesnt join up yet.
> So if a thread stalled between those two lines, consumers wouldnt be able to
> get past that break, new items could be added, but they'd be inaccessable to
> consumers until the stalled thread completed the join?

Yes, you are right. Strictly saying the queue is blocking (however
uses only wait-free primitives).
I described the caveats here:
http://groups.google.com/group/lock-free/browse_frm/thread/c8e3201da4a6a300
http://groups.google.com/group/lock-free/browse_frm/thread/55df71b87acb8201

--
Dmitriy V'jukov

== 3 of 6 ==
Date: Sat, Nov 28 2009 2:02 pm
From: "Chris M. Thomasson"

"C Warwick" <me@not.here> wrote in message
news:7nc2plF3l4imlU1@mid.individual.net...
>
> "Chris M. Thomasson" <no@spam.invalid> wrote in message news:h7pkr2$1ed3
>> void
>> mpmcq_push(
>> struct mpmcq* const self,
>> struct node* node
>> ) {
>> struct node* prev;
>> node->m_next = NULL;
>> prev = ATOMIC_SWAP_RELAXED(&self->head, node); // #1
>> ATOMIC_STORE_RELEASE(&prev->next, node); // #2
>> }
>
> Assuming I understand the ATOMIC____ functions right, wouldnt that
> basically detach the two ends of the queue between lines #1 and #2? After
> #1 the head points at the new node, but the last node in the queue doesnt
> join up yet. So if a thread stalled between those two lines, consumers
> wouldnt be able to get past that break, new items could be added, but
> they'd be inaccessable to consumers until the stalled thread completed the
> join?

I personally don't worry about that scenario because the chance a thread
will stall between those two instructions are few and far between. If it
does happen, it won't break anything*. Now, if a thread actually dies within
those instructions then the shi% has definitely hit the fan. However, thread
should not be dieing at "random" places anyway. If they do, trust me, you
have got MUCH bigger problems to deal with.

Now, this will be a problem if you are using this queue with processes
instead of threads. I say this because IMHO, processes can die which is why
there are such things as robust mutexs.

(*) - IMPORTANT!!!

Let's assume that you have three consumer threads [C1...C3] and two producer
threads [P1...P2]. As soon as a consumer thread consumes an item it quits.
Now, imagine that `P1' stalls between the two critical instructions. In the
mean time `P2' produces two items, broadcasts because it produced more than
one item, and quits. Now, all the consumers wake up, try to consume items,
fail and go back to waiting. Then `P1' wakes back up, fixes the queue which
makes all three items visible, and signals because it only produced a single
item. Now, a single consumer, say `C1' wakes and consumes a single item and
quits. Now what? Well, you got yourself a deadlock.

So, if you are going to use this queue, I would always issue a broadcast
just to be on the safe side.

== 4 of 6 ==
Date: Sat, Nov 28 2009 2:17 pm
From: "Chris M. Thomasson"

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
news:15388d14-d0f3-4d9e-89fd-8f20dbbb097c@g26g2000yqe.googlegroups.com...
On 28 ноя, 10:45, "C Warwick" <m...@not.here> wrote:
> > "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> > news:h7pkr2$1ed3
> >
> > > void
> > > mpmcq_push(
> > > struct mpmcq* const self,
> > > struct node* node
> > > ) {
> > > struct node* prev;
> > > node->m_next = NULL;
> > > prev = ATOMIC_SWAP_RELAXED(&self->head, node); // #1
> > > ATOMIC_STORE_RELEASE(&prev->next, node); // #2
> > > }
> >
> > Assuming I understand the ATOMIC____ functions right, wouldnt that
> > basically
> > detach the two ends of the queue between lines #1 and #2? After #1 the
> > head
> > points at the new node, but the last node in the queue doesnt join up
> > yet.
> > So if a thread stalled between those two lines, consumers wouldnt be
> > able to
> > get past that break, new items could be added, but they'd be
> > inaccessable to
> > consumers until the stalled thread completed the join?

> Yes, you are right. Strictly saying the queue is blocking (however
> uses only wait-free primitives).
> I described the caveats here:
[...]

Perhaps you should mention the following possible behavior:

http://groups.google.com/group/comp.programming.threads/msg/2556874a844a706e

Do you think it's a legitimate concern WRT the multi-consumer/producer case?
I am going to try and model it in Relacy.

BTW, I don't think it would be a problem in your SEH-based queue:

http://groups.google.com/group/lock-free/browse_frm/thread/c8e3201da4a6a300

AFAICT, it contains the same producer logic, however you seem to detect the
inconsistency and do a spin-wait to correct it. Am I right?

== 5 of 6 ==
Date: Sat, Nov 28 2009 2:29 pm
From: "Chris M. Thomasson"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:y5hQm.53676$de6.6307@newsfe21.iad...
> "C Warwick" <me@not.here> wrote in message
> news:7nc2plF3l4imlU1@mid.individual.net...
>>
>> "Chris M. Thomasson" <no@spam.invalid> wrote in message news:h7pkr2$1ed3
>>> void
>>> mpmcq_push(
>>> struct mpmcq* const self,
>>> struct node* node
>>> ) {
>>> struct node* prev;
>>> node->m_next = NULL;
>>> prev = ATOMIC_SWAP_RELAXED(&self->head, node); // #1
>>> ATOMIC_STORE_RELEASE(&prev->next, node); // #2
>>> }
>>
>> Assuming I understand the ATOMIC____ functions right, wouldnt that
>> basically detach the two ends of the queue between lines #1 and #2? After
>> #1 the head points at the new node, but the last node in the queue doesnt
>> join up yet. So if a thread stalled between those two lines, consumers
>> wouldnt be able to get past that break, new items could be added, but
>> they'd be inaccessable to consumers until the stalled thread completed
>> the join?
[...]
> (*) - IMPORTANT!!!
>
> Let's assume that you have three consumer threads [C1...C3] and two
> producer threads [P1...P2]. As soon as a consumer thread consumes an item
> it quits. Now, imagine that `P1' stalls between the two critical
> instructions. In the mean time `P2' produces two items, broadcasts because
> it produced more than one item, and quits. Now, all the consumers wake up,
> try to consume items, fail and go back to waiting. Then `P1' wakes back
> up, fixes the queue which makes all three items visible, and signals
> because it only produced a single item. Now, a single consumer, say `C1'
> wakes and consumes a single item and quits. Now what? Well, you got
> yourself a deadlock.
>
>
> So, if you are going to use this queue, I would always issue a broadcast
> just to be on the safe side.

I believe one can get around this by doing something like...

membars aside for a moment:
________________________________________________________________
struct node*
mpmcq_pop(
struct mpmcq* const self
) {

void* state;

struct dwcas_anchor cmp, xchg;

retry:

cmp = ATOMIC_DWLOAD(&self->tail);

do {

struct node* next = ATOMIC_LOAD(&cmp.node->next);

if (! next)
{
struct node* head = ATOMIC_LOAD(&self->head);

if (cmp.node == head)
{
return NULL;
}

spin_wait_yield_whatever();

goto retry;
}

state = next->state;

xchg.node = next;

xchg.aba = cmp.aba + 1;

} while (! ATOMIC_DWCAS(&self->tail, &cmp, &xchg));

cmp.node->state = state;

return cmp.node;
}
________________________________________________________________

That should detect a point in which a producer thread is in the middle of
rejoining the queue, and wait for it to complete.

== 6 of 6 ==
Date: Sat, Nov 28 2009 3:15 pm
From: "Chris M. Thomasson"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:ejhQm.53677$de6.10085@newsfe21.iad...
> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
[...]
> Perhaps you should mention the following possible behavior:
>
> http://groups.google.com/group/comp.programming.threads/msg/2556874a844a706e
>
> Do you think it's a legitimate concern WRT the multi-consumer/producer
> case? I am going to try and model it in Relacy.

Here is the problem and a workaround modeled in Relacy 2.0:

http://relacy.pastebin.com/fd4b6e1a

The program works as-is. However, if you un-define the
`WORKAROUND_THE_PROBLEM' macro everything bites the dust rather quickly!
Well, at least Relacy is telling me that a valid workaround is possible...

Relacy ROCKS!

:^)

> BTW, I don't think it would be a problem in your SEH-based queue:
>
> http://groups.google.com/group/lock-free/browse_frm/thread/c8e3201da4a6a300
>
> AFAICT, it contains the same producer logic, however you seem to detect
> the inconsistency and do a spin-wait to correct it. Am I right?

==============================================================================
TOPIC: Would this work on even one platform?
http://groups.google.com/group/comp.programming.threads/t/de8a88ed7fa913d4?hl=en
==============================================================================

== 1 of 2 ==
Date: Sat, Nov 28 2009 1:03 am
From: Dmitriy Vyukov

On 25 ноя, 22:25, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > Stand-alone fences are more powerful in some sense,
> > because they do not bind to a particular variable. And at the same
> > time stand-alone fences (except seq_cst) are weaker, because they do
> > not establish total order over operations (as store/RMW on a
> > particular atomic).
>
> I personally like stand-alone fences because IMHO they are more flexible. I
> can put them exactly where I need them; e.g.:
> ________________________________________________________________
> #define N 10
>
> static int g_values[N] = { NULL };
> static std::atomic<bool> g_flags[N] = { NULL };
>
> void thread_1()
> {
> for (size_t i = 0; i < N; ++i)
> {
> g_values[i] = 666;
> }
>
> std::atomic_thread_fence(std::memory_order_release);
>
> for (size_t i = 0; i < N; ++i)
> {
> g_flags[i].store(true, std::memory_order_relaxed);
> }
>
> }
>
> void other_threads()
> {
> retry:
> for (size_t i = 0; i < N; ++i)
> {
> if (! g_flags[i].load(std::memory_order_relaxed))
> {
> spin();
> goto retry;
> }
> }
>
> std::atomic_thread_fence(std::memory_order_acquire);
>
> for (size_t i = 0; i < N; ++i)
> {
> assert(g_values[i] == 666);
> }}
>
> ________________________________________________________________

AFAIR, Paul McKenney present something along the lines of your example
as a motivating example for stand-alone fences.

> As far as a total order over operations, I am not sure I understand what you
> mean. I expect the following to be out of order:
> ________________________________________________________________
> std::atomic<int> g_value1;
> std::atomic<int> g_value2;
>
> void foo()
> {
> g_value1.store(1, std::memory_order_relaxed);
> std::atomic_thread_fence(std::memory_order_release);
> g_value2.load(std::memory_order_relaxed);}
>
> ________________________________________________________________

Regarding total order I mean that your above code won't work (even if
release is replaced with acq_rel).
However if we replace stand-alone acq_rel fence with atomic RMW with
acq_rel ordering:

g_value1.store(1, std::memory_order_relaxed);
g_sync.exchange(0, std::memory_order_acq_rel);
g_value2.load(std::memory_order_relaxed);}

It must work. Because RMW on a particular variable establishes total
order (over all RMW on the variable), while stand-alone fence not.

--
Dmitriy V'jukov

== 2 of 2 ==
Date: Sat, Nov 28 2009 11:59 am
From: "Chris M. Thomasson"

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
news:f9e5c87c-5280-4a32-886e-b2ac9d30dd3d@a32g2000yqm.googlegroups.com...
On 25 ноя, 22:25, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> > > Stand-alone fences are more powerful in some sense,
> > > because they do not bind to a particular variable. And at the same
> > > time stand-alone fences (except seq_cst) are weaker, because they do
> > > not establish total order over operations (as store/RMW on a
> > > particular atomic).
[...]
> > As far as a total order over operations, I am not sure I understand what
> > you
> > mean. I expect the following to be out of order:
> > ________________________________________________________________
> > std::atomic<int> g_value1;
> > std::atomic<int> g_value2;
> >
> > void foo()
> > {
> > g_value1.store(1, std::memory_order_relaxed);
> > std::atomic_thread_fence(std::memory_order_release);
> > g_value2.load(std::memory_order_relaxed);}
> >
> > ________________________________________________________________

> Regarding total order I mean that your above code won't work (even if
> release is replaced with acq_rel).

Right. When I wrote:

"I expect the following to be out of order"

I meant that I expect the load from `g_value2' to be able to rise above the
store to `g_value1'.

> However if we replace stand-alone acq_rel fence with atomic RMW with
> acq_rel ordering:

> g_value1.store(1, std::memory_order_relaxed);
> g_sync.exchange(0, std::memory_order_acq_rel);
> g_value2.load(std::memory_order_relaxed);}

> It must work. Because RMW on a particular variable establishes total
> order (over all RMW on the variable), while stand-alone fence not.

I would also expect this to work:
___________________________________________________
g_value1.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
g_value2.load(std::memory_order_relaxed);
___________________________________________________

==============================================================================
TOPIC: Lock free queue idea.
http://groups.google.com/group/comp.programming.threads/t/82066179448783da?hl=en
==============================================================================

== 1 of 7 ==
Date: Sat, Nov 28 2009 2:33 pm
From: "DaJones"

Ok this is not going to be the fastest, you could also argue it's not even a
simplification, it just moves complexity around. But I'm interested to know
if this has been proposed before, if there are any problems with it, or
further improvments that could be made.

Basicaly I started with the Michael & Scott FIFO lock free queue. Ya'll know
that right?

I'll just pitch right at it. Instead of using 0 / NULL to signify that a
node is the terminating node in the list, use the tail counter. Specificaly
have the tail count start at 1, increase by 2s, and use that as the
terminator. Pointers should always be 4 byte aligned, so if the 'next' ptr
is odd it's the last node, if it's not it's valid and the tail has fallen
behind.

So now the ABA problem is defeated because when you CAS the terminating
link, it must be equal to the tail count, or else the tail has fallen
behind. And should a node get recycled, the tail count will have increased
by then, so a CAS with the old tail count will fail.

QPtr
{
Node* ptr;
int tag; // initalized to 1
}

void enqueue(Node* node)
{
while(true)
{
tail = atomic_read_8b(&q_tail); // atomicaly read tail.
node->next = tail.count+2; // inc by 2 will keep tail count odd

if (CAS(&q_tail.ptr->next, tail.count, node)) break;

CAS(&q_tail, tail, QPtr(node, tail.count+=2)); // swing tail
}
CAS(&q_tail, tail, QPtr(node, tail.count+=2)); // swing tail
}

So now each node only needs a single pointer instead of a tag & a pointer as
in the original michael and scott algorithm.

== 2 of 7 ==
Date: Sat, Nov 28 2009 3:33 pm
From: "Chris M. Thomasson"

"DaJones" <no@here.com> wrote in message
news:7ndmqbF3l3jv5U1@mid.individual.net...
> Ok this is not going to be the fastest, you could also argue it's not even
> a simplification, it just moves complexity around. But I'm interested to
> know if this has been proposed before, if there are any problems with it,
> or further improvments that could be made.
>
> Basicaly I started with the Michael & Scott FIFO lock free queue. Ya'll
> know that right?
>
> I'll just pitch right at it. Instead of using 0 / NULL to signify that a
> node is the terminating node in the list, use the tail counter.
> Specificaly have the tail count start at 1, increase by 2s, and use that
> as the terminator. Pointers should always be 4 byte aligned, so if the
> 'next' ptr is odd it's the last node, if it's not it's valid and the tail
> has fallen behind.
>
> So now the ABA problem is defeated because when you CAS the terminating
> link, it must be equal to the tail count, or else the tail has fallen
> behind. And should a node get recycled, the tail count will have increased
> by then, so a CAS with the old tail count will fail.
>
> QPtr
> {
> Node* ptr;
> int tag; // initalized to 1
> }
>
> void enqueue(Node* node)
> {
> while(true)
> {
> tail = atomic_read_8b(&q_tail); // atomicaly read tail.

> node->next = tail.count+2; // inc by 2 will keep tail count odd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I must be missing something important here, but how does this work? How are
you allocating and deallocating the nodes? Basically, my specific question
is how are you getting the actual node that `node->next' points to over on
the consumer side? Could you show some code for the consumer?

Thanks!

:^)

> if (CAS(&q_tail.ptr->next, tail.count, node)) break;
>
> CAS(&q_tail, tail, QPtr(node, tail.count+=2)); // swing tail
> }
> CAS(&q_tail, tail, QPtr(node, tail.count+=2)); // swing tail
> }
>
> So now each node only needs a single pointer instead of a tag & a pointer
> as in the original michael and scott algorithm.

== 3 of 7 ==
Date: Sat, Nov 28 2009 6:47 pm
From: "DaJones"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:nqiQm.6324$y%5.3350@newsfe03.iad...
> I must be missing something important here, but how does this work? How
> are you allocating and deallocating the nodes? Basically, my specific
> question is how are you getting the actual node that `node->next' points
> to over on the consumer side? Could you show some code for the consumer?

Ok knocked up a more complete example...

Nodes are allocated and freed via a freelist. Aint included that.

==========================================================

struct Node
{
Node* next;
int value;
}

struct QPtr
{
Node* ptr;
int tag;
QPtr(Node* p, int t) : ptr(p), tag(t) {}
}

class Queue
{
private:

QPtr head;
QPtr tail;

public:

Queue::Queue()
{
Node* node = new_node(); // get node from free list
node->next = (Node*) 1; // Tail count starts at 1
head.ptr = tail.ptr = node;
head.tag = tail.tag = 1;
}

// =================

void enqueue(int v)
{
Node* node = new_node(); // get node from free list
node.value = v;

while(true)
{
ttt = AtomicRead8B(&tail);
node->next = (node*)(ttt.tag+2);

if (CAS(&ttt.ptr->next, tail.tag, node)) break;

CAS(&tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2)); // Siwng tail
}
CAS(&tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2)); // Swing tail
}

// =================

int dequeue()
{
while (true)
{
hhh = AtomicRead8B(&head);
ttt = AtomicRead8B(&tail);

if (hhh.ptr == ttt.ptr)
{
// Empty queue if 'next' is an odd number, ie not a valid
ptr.
if ((hhh.ptr->next & 1) == 1) return 0;

CAS(tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2)); // Try to
swing tail
}
else
{
int v = hhh.ptr->next->value;
if CAS(head, hhh, QPtr(hhh.ptr->next, head.count+1));
{
free_node(hhh.ptr); // return node to free list
return v;
}
}
}
}
}

== 4 of 7 ==
Date: Sat, Nov 28 2009 8:01 pm
From: "Chris M. Thomasson"

"DaJones" <no@here.com> wrote in message
news:7ne5loF3l8setU1@mid.individual.net...
>
> "Chris M. Thomasson" <no@spam.invalid> wrote in message
> news:nqiQm.6324$y%5.3350@newsfe03.iad...
>> I must be missing something important here, but how does this work? How
>> are you allocating and deallocating the nodes? Basically, my specific
>> question is how are you getting the actual node that `node->next' points
>> to over on the consumer side? Could you show some code for the consumer?
>
> Ok knocked up a more complete example...
[...]

I see what you are doing and I think it should work out.

== 5 of 7 ==
Date: Sat, Nov 28 2009 10:18 pm
From: "Chris M. Thomasson"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:3mmQm.9394$Xb5.3618@newsfe19.iad...
> "DaJones" <no@here.com> wrote in message
> news:7ne5loF3l8setU1@mid.individual.net...
>>
>> "Chris M. Thomasson" <no@spam.invalid> wrote in message
>> news:nqiQm.6324$y%5.3350@newsfe03.iad...
>>> I must be missing something important here, but how does this work? How
>>> are you allocating and deallocating the nodes? Basically, my specific
>>> question is how are you getting the actual node that `node->next' points
>>> to over on the consumer side? Could you show some code for the consumer?
>>
>> Ok knocked up a more complete example...
> [...]
>
> I see what you are doing and I think it should work out.

I think I spoke to fast here. I am not sure about ABA wrt operations on the
next pointers of the nodes. I need to flesh out some code and check the
algorithm out. Do you happen to have an example implementation?
Unfortunately, I cannot see exactly how to implement this in Relacy Race
Detector:

http://groups.google.com/group/relacy

== 6 of 7 ==
Date: Sat, Nov 28 2009 11:03 pm
From: "Chris M. Thomasson"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:mmoQm.31435$Sw5.380@newsfe16.iad...
> "Chris M. Thomasson" <no@spam.invalid> wrote in message
> news:3mmQm.9394$Xb5.3618@newsfe19.iad...
>> "DaJones" <no@here.com> wrote in message
>> news:7ne5loF3l8setU1@mid.individual.net...
>>>
>>> "Chris M. Thomasson" <no@spam.invalid> wrote in message
>>> news:nqiQm.6324$y%5.3350@newsfe03.iad...
>>>> I must be missing something important here, but how does this work? How
>>>> are you allocating and deallocating the nodes? Basically, my specific
>>>> question is how are you getting the actual node that `node->next'
>>>> points to over on the consumer side? Could you show some code for the
>>>> consumer?
>>>
>>> Ok knocked up a more complete example...
>> [...]
>>
>> I see what you are doing and I think it should work out.
>
> I think I spoke to fast here. I am not sure about ABA wrt operations on
> the next pointers of the nodes. I need to flesh out some code and check
> the algorithm out. Do you happen to have an example implementation?
> Unfortunately, I cannot see exactly how to implement this in Relacy Race
> Detector:
>
> http://groups.google.com/group/relacy

Actually now that I think about it, I can implement this in Relacy. When I
get some time, I will go ahead and do it. I will post it here when it's
finished.

Thank you for your patience.

:^)

== 7 of 7 ==
Date: Sat, Nov 28 2009 11:52 pm
From: "DaJones"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:W0pQm.7414$Lq5.2613@newsfe20.iad...
> "Chris M. Thomasson" <no@spam.invalid> wrote in message
> news:mmoQm.31435$Sw5.380@newsfe16.iad...
>> "Chris M. Thomasson" <no@spam.invalid> wrote in message
>> news:3mmQm.9394$Xb5.3618@newsfe19.iad...
>>> "DaJones" <no@here.com> wrote in message
>>> news:7ne5loF3l8setU1@mid.individual.net...
>>>>
>>>> "Chris M. Thomasson" <no@spam.invalid> wrote in message
>>>> news:nqiQm.6324$y%5.3350@newsfe03.iad...
>>>>> I must be missing something important here, but how does this work?
>>>>> How are you allocating and deallocating the nodes? Basically, my
>>>>> specific question is how are you getting the actual node that
>>>>> `node->next' points to over on the consumer side? Could you show some
>>>>> code for the consumer?
>>>>
>>>> Ok knocked up a more complete example...
>>> [...]
>>>
>>> I see what you are doing and I think it should work out.
>>
>> I think I spoke to fast here. I am not sure about ABA wrt operations on
>> the next pointers of the nodes. I need to flesh out some code and check
>> the algorithm out. Do you happen to have an example implementation?
>> Unfortunately, I cannot see exactly how to implement this in Relacy Race
>> Detector:
>>
>> http://groups.google.com/group/relacy
>
> Actually now that I think about it, I can implement this in Relacy. When I
> get some time, I will go ahead and do it. I will post it here when it's
> finished.
>
> Thank you for your patience.

If this will help I worked on it a bit more and got it working. Amazing how
many typos and bugs the were in the original version. I really should have
coded a proper working example to begin with.

Anyways this correctly completes a series of enqueues and dequeues from a
single thread.

// ================================

template <typename T>
inline T AtomicRead4B(T* source)
{
assert(sizeof(T) == 4);
assert((int(source) & 3) == 0);
__asm
{
MOV EAX,[source]
MOV EAX,[EAX]
}
}

// ==============================

template <typename T>
inline T AtomicRead8B(T* source)
{
assert(sizeof(T) == 8);
assert((int(source) & 7) == 0);

T result;
__asm
{
MOV EAX,[source]
MOVQ MM0,[EAX]
MOVQ [result],MM0
EMMS
}
return result;
}

// ===================================

template <typename T>
inline int CmpSwap4B(T* target, T expected, T newval)
{
assert(sizeof(T) == 4);
__asm
{
MOV ECX,[target]
MOV EAX,[expected]
MOV EDX,[newval]
LOCK CMPXCHG [ECX],EDX
MOV EAX,0
SETZ AL
}
}

// ===============================

template <typename T>
inline int CmpSwap8B(T* target, T expected, T newval)
{
assert(sizeof(T) == 8);
__asm
{
MOV ESI,[target]
MOV EAX,DWORD PTR [expected]
MOV EDX,DWORD PTR [expected+4]
MOV EBX,DWORD PTR [newval]
MOV ECX,DWORD PTR [newval+4]
LOCK CMPXCHG8B [ESI]
MOV EAX,0
SETZ AL
}
}

// ===================================

const int INITIALNODES = 100; // Initial nodes in freelist

struct Node
{
Node* next;
int value;
};

struct QPtr
{
Node* ptr;
int tag;
QPtr() : ptr(NULL), tag(1) {}
QPtr(Node* p, int t) : ptr(p), tag(t) {}
};

class Queue
{
private:

QPtr head;
QPtr tail;

QPtr flist;

void moreNodes()
{
for (int i = 0; i < INITIALNODES; i ++)
{
freeNode(new Node);
}
}

void freeNode(Node* node)
{
while (true)
{
QPtr top = AtomicRead8B(&flist);
QPtr tmp(node, top.tag+1);
node->next = top.ptr;
if (CmpSwap8B<QPtr>(&flist, top, tmp)) return;
}
}

Node* grabNode()
{
while (true)
{
QPtr top = AtomicRead8B(&flist);
if (top.ptr == NULL)
{
moreNodes();
}
else
{
QPtr tmp(top.ptr->next, top.tag+1);
if (CmpSwap8B<QPtr>(&flist, top, tmp)) return top.ptr;
}
}
}

public:

Queue::Queue() :
head(),
tail(),
flist()
{
Node* node = grabNode(); // get node from free list
node->next = (Node*) 1; // Tail count starts at 1
head.ptr = tail.ptr = node;
head.tag = tail.tag = 1;
}

// =================

void enqueue(int v)
{
Node* node = grabNode(); // get node from free list
node->value = v;
QPtr ttt;

while(true)
{
ttt = AtomicRead8B(&tail);
node->next = (Node*)(ttt.tag+2);

if (CmpSwap4B<Node*>(&ttt.ptr->next, (Node*)tail.tag, node))
break;

CmpSwap8B(&tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2)); // Swing
tail
}
CmpSwap8B(&tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2)); // Swing tail
}

// =================

int dequeue()
{
while (true)
{
QPtr hhh = AtomicRead8B(&head);
QPtr ttt = AtomicRead8B(&tail);

if (hhh.ptr == ttt.ptr)
{
if ((int(hhh.ptr->next) & 1) == 1) return 0; // Terminating
node

CmpSwap8B<QPtr>(&tail, ttt, QPtr(ttt.ptr->next, ttt.tag+2));
}
else
{
int v = hhh.ptr->next->value;
if (CmpSwap8B<QPtr>(&head, hhh, QPtr(hhh.ptr->next,
head.tag+1)))
{
freeNode(hhh.ptr);
return v;
}
}
}
}
};

==============================================================================
TOPIC: Intrusive / non-intrusive
http://groups.google.com/group/comp.programming.threads/t/6a7e7e1c0f59d89e?hl=en
==============================================================================

== 1 of 3 ==
Date: Sat, Nov 28 2009 7:48 pm
From: "DaJones"

What do these mean with respect to lock free algorithms, or to concurency?

thanks.

== 2 of 3 ==
Date: Sat, Nov 28 2009 8:09 pm
From: "Chris M. Thomasson"

"DaJones" <no@here.com> wrote in message
news:7ne993F3lkq3jU1@mid.individual.net...
> What do these mean with respect to lock free algorithms, or to concurency?

WRT to lock free algorithms, well, it means you can get around using a dummy
node and use user provided nodes directly. It also means that user provided
nodes are subjected to the memory management requirements of the lock-free
algorithm. It means using less memory and less cache misses. One more thing,
intrusive lock-free algorithms can be more complex than there non-intrusive
counterparts.

== 3 of 3 ==
Date: Sat, Nov 28 2009 9:32 pm
From: "DaJones"

"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:wtmQm.9395$Xb5.4246@newsfe19.iad...
> "DaJones" <no@here.com> wrote in message
> news:7ne993F3lkq3jU1@mid.individual.net...
>> What do these mean with respect to lock free algorithms, or to
>> concurency?
>
> WRT to lock free algorithms, well, it means you can get around using a
> dummy node and use user provided nodes directly. It also means that user
> provided nodes are subjected to the memory management requirements of the
> lock-free algorithm. It means using less memory and less cache misses. One
> more thing, intrusive lock-free algorithms can be more complex than there
> non-intrusive counterparts.

Ok thanks.

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.programming.threads"
group.

To post to this group, visit http://groups.google.com/group/comp.programming.threads?hl=en

To unsubscribe from this group, send email to comp.programming.threads+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.programming.threads/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

soft and program

Sunday, November 29, 2009

comp.programming.threads - 18 new messages in 4 topics - digest

No comments:

Blog Archive

About Me