soft and program: comp.programming.threads - 3 new messages in 2 topics

Friday, July 31, 2009

comp.programming.threads - 3 new messages in 2 topics - digest

comp.programming.threads
http://groups.google.com/group/comp.programming.threads?hl=en

comp.programming.threads@googlegroups.com

Today's topics:

* Multi-producer/multi-consumer SEH-based queue - 2 messages, 1 author
http://groups.google.com/group/comp.programming.threads/t/c43d1197c8168b4e?hl=en
* Spinlocks - trouble with speed - 1 messages, 1 author
http://groups.google.com/group/comp.programming.threads/t/9f5e34251ccf45e3?hl=en

==============================================================================
TOPIC: Multi-producer/multi-consumer SEH-based queue
http://groups.google.com/group/comp.programming.threads/t/c43d1197c8168b4e?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Jul 30 2009 8:07 pm
From: "Chris M. Thomasson"

"Dmitriy V'jukov" <dvyukov@gmail.com> wrote in message
news:413440f3-a557-468d-a623-d7adb9353148@n11g2000yqb.googlegroups.com...
On 31 июл, 05:07, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > > I think that Relacy will show the truth.
[...]
> > However, once I do
> > that, rl::atomic will load the entire anchor atomically with no chance
> > of a
> > preemption in between the load to the anchors head pointer and the
> > subsequent load to the anchors aba counter. Am I making any sense to
> > you?

> C++0x atomics do not support non-atomic accesses to atomic variables.

This raises another question... Lets say that a particular C++0x impl on a
32-bit system has lock-free access to a 64-bit variable. Will it have to use
64-bit atomic load instruction on the 32-bit system to load 64-bit anchor?
For instance, on IA-32, it would use a MOVQ instruction?

> Hmmm.... Well, I think, you may just load anchor twice :)
> Load whole anchor, get pointer from it. Load whole anchor one more
> time, get counter from it. It must be enough for Relacy to squeeze
> into.

Load anchor twice; something like:

void pop() {
for (;;) {
stack_anchor real;
stack_anchor a = m_anchor($).load(rl::memory_order_relaxed);
real.m_head = a.m_head;
// perhaps add an explicit yield?
stack_anchor b = m_anchor($).load(rl::memory_order_relaxed);
real.m_aba = b.m_aba;

// blah; perform CAS with `real' variable...
}
}

well, that should do it.

:^)

Thanks Dmitriy.

== 2 of 2 ==
Date: Thurs, Jul 30 2009 11:36 pm
From: "Chris M. Thomasson"

"Dmitriy V'jukov" <dvyukov@gmail.com> wrote in message
news:02c80e6f-4478-4e54-b039-bd207cd19818@j32g2000yqh.googlegroups.com...
On 31 июл, 04:45, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > I don't worry about the (***) part as it's an EXTREMELY small window.
> > Also,
> > if a programs threads can just up and die at (***) then, IMHO, the
> > program
> > has a lot more "issues" to worry about than this one...

> Yes, spuriously dying threads are indeed problematic :)

> That was just a comments on *formal* semantics. Theoretically, it's
> not a lock-free queue because lock-free algorithm must be safe wrt
> thread termination.
> But from practical POV, yes, I do not care about (***) part too.

> >
> > ;^)
> >
> > So, we now have intrusive/non-intrusive versions of a high-performance
> > MPMC
> > queue that beats the shi% out of the Michael and Scott queue:
> >
> > Very cool!

> Ah, I know what Michael and Scott will say, they will say "your queue
> is not lock-free". Damn!

Then we can become smart ass and say:

"This is just a very simple example of how clever lock-based queue can beat
the hell out of a crappy lock-free queue..."

Ouch!

==============================================================================
TOPIC: Spinlocks - trouble with speed
http://groups.google.com/group/comp.programming.threads/t/9f5e34251ccf45e3?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jul 30 2009 11:06 pm
From: David Schwartz

On Jul 30, 3:41 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:

> > Here's the second biggest: When a CPU does acquire the spinlock after
> > spinning on it, it takes the mother of all branch mispredictions,
> > likely flushing all pipelines and resulting in a huge performance
> > penalty. So, now you have a CPU that's acquired a contested spinlock
> > -- it's absolutely vital that it perform at the very optimum to
> > preserve performance -- and you've slowed it to a crawl.

> So how to avoid this, i.e. instead of a negative critique, provide an
> improvement. If the CPU leaves the spinlock, of course the branch cache
> is no longer up to date, but how can it be if it leaves the spin. Why is
> the pipeline flushed? How, how can one avoid the pipeline stall?

His implementation is not worth fixing. It's just too broken. You
avoid it by implementing a proper spinlock from top to bottom.

> > Here's the third biggest: Consider a hyper-threaded CPU. Now imagine
> > one virtual core holds the spinlock and the other is blocked on it.
> > The "blocked" CPU will spin at full speed, depriving the other virtual
> > core of execution resources. So, again, the CPU that holds the
> > spinlock will run at a massively reduced speed at precisely the time
> > it most needs to hurry.

> I would currently not care enough about hyperthreading CPUs right now,
> but I can throw in a couple of "nops" in the loop, or a bogus
> computation. Feel ensured that I tried this, made no difference, but
> this isn't a hyperthreading CPU in first place.

You can't fix this kind of code by trying things and seeing how they
work. You're supporting CPU past, present, and future. (Unless this is
some kind of niche code.) You have to go by *guaranteed* behavior, not
observed behavior. (Unless you're doing CPU-specific tuning, but I
don't see any CPU selection code in there.)

> > There are more, of course. You've almost literally made every possible
> > mistake. Spinlocks are phenomenally complex. You have to have a very
> > deep understanding of the CPU internals to get them right.

> Ok, so get them right, please.

You get them right by writing a proper spinlock, not by fixing a
completely broken spinlock.

You want a jet plane and you present a jalopy. You don't "fix" the
jalopy to make a jet plane, you get a jet plane.

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.programming.threads"
group.

To post to this group, visit http://groups.google.com/group/comp.programming.threads?hl=en

To unsubscribe from this group, send email to comp.programming.threads+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.programming.threads/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

soft and program

Friday, July 31, 2009

comp.programming.threads - 3 new messages in 2 topics - digest

No comments:

Blog Archive

About Me