soft and program: comp.programming.threads - 26 new messages in 5 topics

comp.programming.threads
http://groups.google.com/group/comp.programming.threads?hl=en

comp.programming.threads@googlegroups.com

Today's topics:

* Combination of sigwait() with sigaction()? - 3 messages, 3 authors
http://groups.google.com/group/comp.programming.threads/t/8fb8e48e21c67023?hl=en
* Open issues in the specification for fork()? - 18 messages, 6 authors
http://groups.google.com/group/comp.programming.threads/t/e219574ddde7649b?hl=en
* How do I look inside an .exe file to view the programming - 1 messages, 1
author
http://groups.google.com/group/comp.programming.threads/t/70ea90ac4a00188c?hl=en
* How to view what commands are run by an exe file as they are running - 1
messages, 1 author
http://groups.google.com/group/comp.programming.threads/t/826e0e9502290d02?hl=en
* Data copying on NUMA - 3 messages, 3 authors
http://groups.google.com/group/comp.programming.threads/t/89bfa62ecdef1706?hl=en

==============================================================================
TOPIC: Combination of sigwait() with sigaction()?
http://groups.google.com/group/comp.programming.threads/t/8fb8e48e21c67023?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Nov 21 2013 9:32 am
From: Rainer Weikusat

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

> Markus Elfring <Markus.Elfring@web.de> writes:
>> I have found an interesting article about waiting for signals in a separate
>> thread. A source code skeleton is shown.
>> http://devcry.heiho.net/2009/05/pthreads-and-unix-signals.html
>>
>> The author suggested also to install an empty signal handler by the function
>> "sigaction". How do you think about this suggestion?
>> Is the mentioned indication of non-default signal actions useful here?
>
> The author is somewhat confused here: sigwait returns "pending signals",
> this means "signals which don't cause any 'action' because they're
> blocked", hence, no "default actions" will happen. But the disposition
> of a signal can be set to something other than "default action" or "signal
> handler", namely, to "ignore this signals" and signals set to be ignored
> don't become pending signals, they're just discarded (and hence, sigwait
> will never return them). The main annoyance here is that the default
> disposition of SIGCHLD is "ignore", hence, without setting that to
> something else, sigwait (and similar routines) will never return a
> SIGCHLD.

SUS actually leaves this (immedidately discarding ignored signals)
unspecified and at least on the two systems where I tested this (running
Linux 3.2.9 and Linux 2.6.36.4), this program

----------
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

static void child(void)
{
sleep(1);
fputs("\tchild going down\n", stderr);
_exit(0);
}

static void dummy(int unused)
{}

int main(void)
{
sigset_t wanted;
int sig;

sigemptyset(&wanted);
sigaddset(&wanted, SIGINT);
sigaddset(&wanted, SIGCHLD);
sigprocmask(SIG_BLOCK, &wanted, NULL);

if (fork() == 0) child();

sigwait(&wanted, &sig);
fprintf(stderr, "got signal %d\n", sig);

signal(SIGCHLD, dummy);

if (fork() == 0) child();

sigwait(&wanted, &sig);
fprintf(stderr, "got signal %d\n", sig);

return 0;
}
------------

does indicate that a SIGCHLD was received in both cases. I remember
running into lost SIGCHLDs in the past, though, that's why I started to
install dummy handlers for everything I wanted to handle via
sigwaitsomething.

== 2 of 3 ==
Date: Thurs, Nov 21 2013 9:38 am
From: Markus Elfring

> The author is somewhat confused here: sigwait returns "pending signals",
> this means "signals which don't cause any 'action' because they're
> blocked", hence, no "default actions" will happen.

Thanks for your feedback.

I have just found that the section "12.8 Threads and Signals" of the book
"Advanced Programming in the UNIX� Environment" contains also a bit of
information for the requested issue. The corresponding handling might be
implementation-defined. Is any more clarification for such details possible?

Regards,
Markus

== 3 of 3 ==
Date: Thurs, Nov 21 2013 2:52 pm
From: "Chris M. Thomasson"

> "Markus Elfring" wrote in message
> news:bf6rviFm5kiU1@mid.individual.net...
> Hello,
> I have found an interesting article about waiting for signals in a
> separate
> thread. A source code skeleton is shown.
> http://devcry.heiho.net/2009/05/pthreads-and-unix-signals.html

> [...]

FWIW, check this crazy shi% out:

https://groups.google.com/d/topic/comp.programming.threads/lUXT4XgGzP4/discussion

;^)

==============================================================================
TOPIC: Open issues in the specification for fork()?
http://groups.google.com/group/comp.programming.threads/t/e219574ddde7649b?hl=en
==============================================================================

== 1 of 18 ==
Date: Sat, Nov 23 2013 10:05 am
From: Markus Elfring

Hello,

I have read the document
"http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html" once more.
I find that some details will need further clarification, won't it?

1. Do you interpret the wording
"... If a multi-threaded process calls fork(), ..."
from the section "DESCRIPTION" in the way that it is not specified which
behaviour should be expected for the address space in an ordinary
single-threaded process?

2. Is the wording
"... The fork() function is thus used only to run new programs, and the
effects of calling functions that require certain resources between the call to
fork() and the call to an exec function are undefined. ..."
from the section "RATIONALE" a contradiction to the previous information
"... Consequently, to avoid errors, the child process may only execute
async-signal-safe operations until such time as one of the exec functions is
called. ..."?

3. Is asynchronous-signal-safety also relevant for the single-threaded use case
here?
Do you know interesting software design challenges or "realistic problems"
for this situation?

Regards,
Markus

== 2 of 18 ==
Date: Sat, Nov 23 2013 10:32 am
From: Måns Rullgård

Markus Elfring <Markus.Elfring@web.de> writes:

> Hello,
>
> I have read the document
> "http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html"
> once more. I find that some details will need further clarification,
> won't it?
>
> 1. Do you interpret the wording
> "... If a multi-threaded process calls fork(), ..."
> from the section "DESCRIPTION" in the way that it is not specified which
> behaviour should be expected for the address space in an ordinary
> single-threaded process?

No. See the top of the DESCRIPTION section:

The fork() function shall create a new process. The new process (child
process) shall be an exact copy of the calling process (parent process)
except as detailed below.

This adequately defines the single-threaded case. The part you quote
details the behaviour in a multi-threaded process.

> 2. Is the wording
> "... The fork() function is thus used only to run new programs, and
> the effects of calling functions that require certain resources
> between the call to fork() and the call to an exec function are
> undefined. ..."
> from the section "RATIONALE" a contradiction to the previous information
> "... Consequently, to avoid errors, the child process may only execute
> async-signal-safe operations until such time as one of the exec functions is
> called. ..."?

No. The part you quote from the RATIONALE section is an observation of
the typical use of fork() in a multi-threaded program (before the comma),
and a note about the consequences of the restrictions given in the
DESCRIPTION (after the comma).

The RATIONALE section exists only as background information which can be
useful for understanding why a particular behaviour is specified. If it
appears to contradict the DESCRIPTION, the latter wins.

> 3. Is asynchronous-signal-safety also relevant for the single-threaded
> use case here?

No. In the single-threaded case the state of the child process is known
so no limitations apply beyond any already present in the parent. Note
however that ordering of operations on resources shared between the two
processes (e.g. file descriptors) is not defined unless explicitly
synchronised by other means.

--
M�ns Rullg�rd
mans@mansr.com

== 3 of 18 ==
Date: Sat, Nov 23 2013 2:13 pm
From: Chris Vine

On Sat, 23 Nov 2013 19:05:22 +0100
Markus Elfring <Markus.Elfring@web.de> wrote:
[snip]
> 3. Is asynchronous-signal-safety also relevant for the
> single-threaded use case here?
> Do you know interesting software design challenges or "realistic
> problems" for this situation?

After fork() the child process is single threaded, consisting only of
a continuation of the thread which called fork(); however, it inherits
the state of the whole parent process. In a multi-threaded program, only
the parent remains multi-threaded.

This means that if, for example, a thread other than the forking()
thread holds a lock or other thread resource, it will never be released
in the child process. If the child process subsequently attempts to
lock a lock so held, such as used by malloc(), it will deadlock. This
is the principal reason for the async-signal-safe requirement:
async-signal-safe functions do not use such resources and so will not
misbehave.

This is irrelevant to single threaded programs. They can call whatever
they like after a fork().

Chris

== 4 of 18 ==
Date: Sat, Nov 23 2013 9:56 pm
From: Markus Elfring

> This is irrelevant to single threaded programs. They can call whatever
> they like after a fork().

Unless the processing context will be switched to the implementation of a signal
handler. (Asynchronous-signal-safety should be considered in an other situation
then, shouldn't it?) ;-)

Regards,
Markus

== 5 of 18 ==
Date: Sun, Nov 24 2013 2:27 am
From: Casper H.S. Dik

Markus Elfring <Markus.Elfring@web.de> writes:

>Hello,

>I have read the document
>"http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html" once more.
>I find that some details will need further clarification, won't it?

>1. Do you interpret the wording
> "... If a multi-threaded process calls fork(), ..."
> from the section "DESCRIPTION" in the way that it is not specified which
>behaviour should be expected for the address space in an ordinary
>single-threaded process?

Well, the description starts with:

" .... The new process (child process) shall be an exact copy of
the calling process (parent process) except as detailed below:"

So what is listed for the multi-threaded process is one of the
exceptions; specifically, I believe it says that the stacks of the other
threads aren't part of the new process.

>2. Is the wording
> "... The fork() function is thus used only to run new programs, and the
>effects of calling functions that require certain resources between the call to
>fork() and the call to an exec function are undefined. ..."
> from the section "RATIONALE" a contradiction to the previous information
> "... Consequently, to avoid errors, the child process may only execute
>async-signal-safe operations until such time as one of the exec functions is
>called. ..."?

This is all about multi-threaded programs; and in such programs what functions
you can call in the child after fork() is restricted to async-signal-safe
functions.

>3. Is asynchronous-signal-safety also relevant for the single-threaded use case
>here?

No, except when fork() is called from a signal handler.

> Do you know interesting software design challenges or "realistic problems"
>for this situation?

Yes. Any form of malloc(), exit() instead of _exit(), etc, may lead
to problems which likely happen randomly and probably after you've
shipped the code to a customer.

Casper

== 6 of 18 ==
Date: Sun, Nov 24 2013 7:39 am
From: Rainer Weikusat

Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> writes:
> Markus Elfring <Markus.Elfring@web.de> writes:

[...]

>>2. Is the wording
>> "... The fork() function is thus used only to run new programs, and the
>>effects of calling functions that require certain resources between the call to
>>fork() and the call to an exec function are undefined. ..."
>> from the section "RATIONALE" a contradiction to the previous information
>> "... Consequently, to avoid errors, the child process may only execute
>>async-signal-safe operations until such time as one of the exec functions is
>>called. ..."?
>
> This is all about multi-threaded programs; and in such programs what functions
> you can call in the child after fork() is restricted to async-signal-safe
> functions.

This is all about the people who wrote the pthreads-specification
fighting the other tribe of people who are Violently(!!1) opposed to
multi-threading but purposely not defining any sensible semantics for
fork in multi-threaded processes. Consequently, the UNIX(*) standard is
useless here and best ignored: For any existing platform, the behaviour
is necessarily defined and code can be written to work in such an
environment.

[...]

>> Do you know interesting software design challenges or "realistic problems"
>>for this situation?
>
> Yes. Any form of malloc(), exit() instead of _exit(), etc, may lead
> to problems which likely happen randomly and probably after you've
> shipped the code to a customer.

If the failures are demonstrably random, that is, they occur because the
people who wrote the system code added stuff like

if (random() % 15 < 3) *(int *)(random()) = 12;

with some guard to trigger only in multi-threaded programs, that would
be a problem of the customer who should be more careful when chosing
suppliers.

== 7 of 18 ==
Date: Sun, Nov 24 2013 4:23 pm
From: Casper H.S. Dik

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

>This is all about the people who wrote the pthreads-specification
>fighting the other tribe of people who are Violently(!!1) opposed to
>multi-threading but purposely not defining any sensible semantics for
>fork in multi-threaded processes. Consequently, the UNIX(*) standard is
>useless here and best ignored: For any existing platform, the behaviour
>is necessarily defined and code can be written to work in such an
>environment.

I think this is completely unfair as it are those who worked on the
implementation and those who wrote the standard knew exactly the
two choices they had; they picked one of the two choices and you
seem to disagree with that particular choice.

>If the failures are demonstrably random, that is, they occur because the
>people who wrote the system code added stuff like

You don't quite get exactly what I'm saying; any program using threads
no longer has a well-defined order in which all the instructions are
evaluated. In such a problem errors in logic, such as those not
adhering with "calling only calling async-signal-safe after fork",
will occur seemingly randomly as it depends on what the other threads
are doing.

Casper

== 8 of 18 ==
Date: Sun, Nov 24 2013 5:19 pm
From: Chris Vine

On 25 Nov 2013 00:23:05 GMT
Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> wrote:
[snip]
> I think this is completely unfair as it are those who worked on the
> implementation and those who wrote the standard knew exactly the
> two choices they had; they picked one of the two choices and you
> seem to disagree with that particular choice.

It was also in my view the better choice - I just don't like forkall(),
as it can make the program more difficult to reason about, particularly
for programs using a shared resource such as I/O (and what don't),
which can turn forkall() into a bug factory.

The only purpose for calling POSIX fork() in a multi-threaded program is
to follow it with a call to exec*() (perhaps after setting up pipes with
dup2() and the like), in which case the async-signal-safe restriction
is fine: you do your other set up (including any memory allocation)
before the fork() and not after. For other purposes, in a
multi-threaded program conforming to POSIX you start a thread, not a
process.

pthread_atfork() is completely broken. That can cause problems with
single threaded programs that use multi-threaded libraries without
taking the care to understand that what they are linking to is
multi-threaded, but there we are. Know what you are linking to is, I
guess, the best answer.

Chris

== 9 of 18 ==
Date: Mon, Nov 25 2013 4:08 am
From: Casper H.S. Dik

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:

>It was also in my view the better choice - I just don't like forkall(),
>as it can make the program more difficult to reason about, particularly
>for programs using a shared resource such as I/O (and what don't),
>which can turn forkall() into a bug factory.

Originally, Solaris did came with its own threads library and in that
thread implemetation had fork1() and fork() (aka forkall()).
Later they added a pthread implementation and the pthread implementaton
needed to use fork1() for fork(), so there were two different
libraries.

If you didn't link with -lthread or -lpthread, you got stubs for many
of the thread calls (such as mutex primitives) so a library could be
written as if it was multi-threaded but would work in a single threaded
application.

>The only purpose for calling POSIX fork() in a multi-threaded program is
>to follow it with a call to exec*() (perhaps after setting up pipes with
>dup2() and the like), in which case the async-signal-safe restriction
>is fine: you do your other set up (including any memory allocation)
>before the fork() and not after. For other purposes, in a
>multi-threaded program conforming to POSIX you start a thread, not a
>process.

And so we now have posix_spawn() which does much of what you generally
do between fork() and exec().

>pthread_atfork() is completely broken. That can cause problems with
>single threaded programs that use multi-threaded libraries without
>taking the care to understand that what they are linking to is
>multi-threaded, but there we are. Know what you are linking to is, I
>guess, the best answer.

That is perhaps more a question quality of implementation and not so
much an issue with pthread_atfork(). In Solaris 10 there is no longer
a difference between a single threaded program and a multi-threaded
program with only one thread. So there is really no issue with linking
with libraries which may use threads.

As a result of making threads the standard, fork() is
now *always* fork1() though forkall() is still available.

Casper

== 10 of 18 ==
Date: Mon, Nov 25 2013 4:21 am
From: Rainer Weikusat

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:
> On 25 Nov 2013 00:23:05 GMT
> Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> wrote:
> [snip]
>> I think this is completely unfair as it are those who worked on the
>> implementation and those who wrote the standard knew exactly the
>> two choices they had; they picked one of the two choices and you
>> seem to disagree with that particular choice.

[...]

> The only purpose for calling POSIX fork() in a multi-threaded program is
> to follow it with a call to exec*() (perhaps after setting up pipes with
> dup2() and the like),

For some people, the only purpose of fork they can possibly imagine is
as halfway useless incumbment on the way to executing another program
and - suprisingly and certainly by mere coincidence - the people who
wrote the phtreads specification and produced this litte gem are among
them. This makes two extreme standpoints, namely, "multi-threading is
useless and evil and 'cooperating sequential processes' is all God ever
wanted men to use" and "multiple processes running the same program are
totally useless, dangerous, outlandish, alien and socialist(!!1) and our
nice, shiny threads are a much better modern solution to any problem one
could conceivably need to solve" and neither of both is particularly
helpful in the real world (as extreme standpoints are wont to be),
especially considering all the moss which has been growing on this
controversy since it originated last century.

When a process forks, the contents of the memory in the address space of
the new process will be identical to that of the parent at the time of
the fork. Because only the forking thread is duplicated the child, the
effect on any other threads is that they stopped asynchronously at some
unpredictable (as seen from the 'sequential flow of time' of each
particular thread) time between executeing two instructions. This is
exactly what also happens when a signal handler is invoked because of an
asynchronous signal, hence, without special precautions, only
async-signal-safe functions can be called safely in the new
process. Another problem would be that POSIX demands that mutexes can
only be unlocked by the thread which locked them which implies that a
mutex which was held by some other thread than the one which forked
cannot ever be unlocked in the new process.

But signal handlers are not really restriced to async-signal safe calls
because they're free to do anything provided that 'main thread of
execution' was interrupted in an 'async-signal safe section' of the
code (the requirement is [paraphrase] 'all functions defined by this
specification shall work as described in the presence of asynchronous
signal except when function which is not async-signal safe is executed
from a signal handler which interrupted another not async-signal safe
function. In this case, the behaviour is undefined'). Likewise, there's
no restriction regarding what the forked process may safely do provided
all threads were executing something async-signal safe at the time of
the fork and no mutexes held by any of them will be touched in the new
process. And this is perfectly doable, especially considering that -
practically - some things are async-signal safe despite they're not
required to be, eg, blocking on a locked mutex (or semaphore). In an
ideal universe, implementations would be required to document how they
deal with multi-threaded access to shared resources and one could even
envision an API for safe forking modelled similar to the existing
facilities for dealing with asynchronous signals. In absence of both, it
is still possible to confine all other threads of a process to some safe
location during the fork and release them afterwards.

This is obviously not going to work for the most general imaginable
case, where 'the process' runs arbitrary, binary only third-party code
whose actions can neither be controlled nor predicted but this is not
the only case.

== 11 of 18 ==
Date: Mon, Nov 25 2013 7:52 am
From: Rainer Weikusat

Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>
>>This is all about the people who wrote the pthreads-specification
>>fighting the other tribe of people who are Violently(!!1) opposed to
>>multi-threading but purposely not defining any sensible semantics for
>>fork in multi-threaded processes. Consequently, the UNIX(*) standard is
>>useless here and best ignored: For any existing platform, the behaviour
>>is necessarily defined and code can be written to work in such an
>>environment.
>
> I think this is completely unfair as it are those who worked on the
> implementation and those who wrote the standard knew exactly the
> two choices they had; they picked one of the two choices and you
> seem to disagree with that particular choice.

I 'disagree' with this here:

There are two reasons why POSIX programmers call fork(). One
reason is to create a new thread of control within the same
program (which was originally only possible in POSIX by creating
a new process);

[...]

When a programmer is writing a multi-threaded program, the first
described use of fork(), creating new threads in the same
program, is provided by the pthread_create() function. The
fork() function is thus used only to run new programs,

because while this is not an outright lie, it's a carefully
weasel-worded piece of disinformation supposed to lent some appearance
of rationality to the entirely political bias of the author: Fork does
not create 'a new thread of control running the same program' (which has
a process chained to its ankle for some shitty, historical reason nobody
in his right mind could possibly understand) but it creates a new
process running the same program and this includes creating a new thread
of control among some other things, most prominently, a new address
space.

There's also the practical concern that adding a new thread to some
existing code unleashes 'immediate pthread madness' which is supposed to
change the semantics of the working code retroactively: Something which
used to be a perfectly legitimate use of fork is now considered to have
undefined behaviour, regardless of how the interactions between the
existing 'single-threaded process code' and the code running in the new
thread actually happen to work. This means 'using pthreads' is meant to
be an all-or-nothing choice: Either the process remains strictly
single-threaded. Or everything has to be rewritten to be 'strictly
pthreaded' insofar this is possible (which might not be the case since
the code could rely on properties of fork pthread_create does not have)
and this looks a lot like a useless effort to me, IOW, 'pthreads' trying
to punish me for using 'traditionally written UNIX(*) code'.

== 12 of 18 ==
Date: Mon, Nov 25 2013 8:55 am
From: Casper H.S. Dik

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

>I 'disagree' with this here:

> There are two reasons why POSIX programmers call fork(). One
> reason is to create a new thread of control within the same
> program (which was originally only possible in POSIX by creating
> a new process);

The changes for the threads didn't change how fork() worked or
didn't work.

Even before threads were common, fork() had many, many issues, such
as fork()ing the whole status of the stdio library; atexit() handlers.

fork() as a second thread in the same program was always full of
risk.

You almost sound like someone who laments the loss of Linux' clone
model to the pthread model.

It is true, I think, that starting threads from a library is
frowned upon, except if these threads are extremely well-behaved
(and that might include not calling malloc() after control has been
back to the main thread)

Casper

== 13 of 18 ==
Date: Mon, Nov 25 2013 9:36 am
From: Rainer Weikusat

Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>
>
>>I 'disagree' with this here:
>
>> There are two reasons why POSIX programmers call fork(). One
>> reason is to create a new thread of control within the same
>> program (which was originally only possible in POSIX by creating
>> a new process);
>
> The changes for the threads didn't change how fork() worked or
> didn't work.
>
> Even before threads were common, fork() had many, many issues, such
> as fork()ing the whole status of the stdio library; atexit() handlers.
>
> fork() as a second thread in the same program was always full of
> risk.
>
> You almost sound like someone who laments the loss of Linux' clone
> model to the pthread model.

I wrote something specific about the whole 'semantic unit of text' the
first paragraph of which you quoted above and about another problem with
the pthreads 'make no prisoners' approach. I don't see how the contents
of your text would relate to that. I'm also not aware of any 'losses'
affecting the Linux clone system call which were caused by adding the
functionality necessary for supporting the pthreads signal handling
model (which, while sensible in itself, is another 'disaster as
designed' because it runs counter the intuition of a lot of people, who
presumably use dedicate signal handling threads, thus, limiting the
scalability of their code, despite there's no real reason for that
except that they don't understand the purpose of the pthreads signal
handling approach and consider it "scary and dangerous" because of
that).

While I don't consider stdio particularly useful, I wouldn't go so far
to label it as 'risky on UNIX(*)' and the same is true for 'atexit
handlers'. Using binary-only code with unknown behaviour is 'risky' but
even these dangers can be dealt with.

== 14 of 18 ==
Date: Mon, Nov 25 2013 10:46 am
From: Drazen Kacar

Casper H.S Dik wrote:

> If you didn't link with -lthread or -lpthread, you got stubs for many
> of the thread calls (such as mutex primitives) so a library could be
> written as if it was multi-threaded but would work in a single threaded
> application.

Not really. Things worked that way and it was generally known that stubs
in libc exist[1], but that was neither documented nor supported. So the
only entity who could safely write libraries that way was Sun.

I don't know if anyone else wrote them that way. I decided not to, because
the method was not supported and thus likely to cause trouble if the
implementation changes.

[1] And, of course, there were people who wrote multi-threaded program,
but forgot to add one of the thread libraries to the link line. Then the
program would link fine (because of stubs in libc), but none of the
thread functions would work. That was a lot of fun for everyone. :-)

--
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
|
| dave@fly.srk.fer.hr

== 15 of 18 ==
Date: Mon, Nov 25 2013 11:55 am
From: Casper H.S. Dik

Drazen Kacar <dave@fly.srk.fer.hr> writes:

>Not really. Things worked that way and it was generally known that stubs
>in libc exist[1], but that was neither documented nor supported. So the
>only entity who could safely write libraries that way was Sun.

And they didn't all work either; e.g., pthread_once() didn't work.

>I don't know if anyone else wrote them that way. I decided not to, because
>the method was not supported and thus likely to cause trouble if the
>implementation changes.

>[1] And, of course, there were people who wrote multi-threaded program,
>but forgot to add one of the thread libraries to the link line. Then the
>program would link fine (because of stubs in libc), but none of the
>thread functions would work. That was a lot of fun for everyone. :-)

Yes; fortunately, that was fixed when Solaris 10 was released (in 2005)

But that couldn't be done until we had created a multi-thread library
which behaved more sanely in the presence of just the one thread;
such a library was first shipped with Solaris 8 and was made the default
in Solaris 9.

Casper

== 16 of 18 ==
Date: Mon, Nov 25 2013 2:16 pm
From: Rainer Weikusat

Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> writes:
> Drazen Kacar <dave@fly.srk.fer.hr> writes:

[...]

>>[1] And, of course, there were people who wrote multi-threaded program,
>>but forgot to add one of the thread libraries to the link line. Then the
>>program would link fine (because of stubs in libc), but none of the
>>thread functions would work. That was a lot of fun for everyone. :-)
>
> Yes; fortunately, that was fixed when Solaris 10 was released (in 2005)
>
> But that couldn't be done until we had created a multi-thread library
> which behaved more sanely in the presence of just the one thread;
> such a library was first shipped with Solaris 8 and was made the default
> in Solaris 9.

This is described in some detail in a whitepaper named 'Multithreading
in the Solaris Operating Environment' published around the time of
Solaris 9 which is still worth a read (I actually just reread it). The
main change in order to get a 'sanely behaving threading library' was
switching from M:N threading to 1:1 threading and thus getting rid of
the necessity to use hidden userspace threads in order to try to solve
problems inherent in the M:N approach. This includes the remarkable feat
of accepting that threads are useful because they enable applications to
use multiprocessors AND NOT because they enable university eggheads from
'research assistant' upwards to avoid dealing with state machines in
their code (that's a gross oversimplification, mainly because I really
don't know how to describe this in a more accurate way).

Commercial operating system tend to oscillate between these two models
(with the occasional idiot reimplementing cooperative userspace
threading for the upteempth time), cf the following statement from HP on
'MxN threading in HP-UX 11i':

MxN threads were implemented in response to the perception that
enterprise applications creating large numbers of threads
perform better with MxN threads than with 1x1.

However,

The only real way to know how using the MxN model affects the
performance of your particular application is to benchmark the
two models and compare the results.

Or, to put this into other words "While there's a strong perception that
such applications ought to exist, there's little real evidence that they
actually do".

== 17 of 18 ==
Date: Tues, Nov 26 2013 12:33 am
From: Casper H.S. Dik

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

>Commercial operating system tend to oscillate between these two models
>(with the occasional idiot reimplementing cooperative userspace
>threading for the upteempth time), cf the following statement from HP on
>'MxN threading in HP-UX 11i':

> MxN threads were implemented in response to the perception that
> enterprise applications creating large numbers of threads
> perform better with MxN threads than with 1x1.

>However,

> The only real way to know how using the MxN model affects the
> performance of your particular application is to benchmark the
> two models and compare the results.

>Or, to put this into other words "While there's a strong perception that
>such applications ought to exist, there's little real evidence that they
>actually do".

The argument for "M:N is better" was very strong because, theoretically,
it could be faster and synchronization wouldn't need to use the kernel.

A "1:1" implementation, in practice, however, uncontended locks don't need a
trip to the kernel either and the enormous amount of extra work needed in
the MxN implementation (handling SIGWAITING, a second implementation of
context switching, which actually often did require kernel involvement)

It should also be noticed that when a comparison was done, it was
often using the same libthread library but run so that M == N; but
that wasn't a proper comparison as in that case you still pay for
the penalty for the M:N support.

Casper

== 18 of 18 ==
Date: Tues, Nov 26 2013 7:46 am
From: Rainer Weikusat

Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

[M:N/ 1:1 threading]

> The argument for "M:N is better" was very strong because, theoretically,
> it could be faster and synchronization wouldn't need to use the kernel.
>
> A "1:1" implementation, in practice, however, uncontended locks don't need a
> trip to the kernel either and the enormous amount of extra work needed in
> the MxN implementation (handling SIGWAITING, a second implementation of
> context switching, which actually often did require kernel involvement)

Some people think that 'threads are a way to structure programs', that
is, they're primarily supposed to be helpful for developers. The idea is
typically that 'a thread of execution' can be dedicated to a single task
which progresses intermittently and concurrently with a lot of other
tasks of the same kind: In a single-threaded process, this requires
storing the state of each task in some 'globally accessible' data
structure and continuing to work on some other tasks based on the
recorded state of that whenever the 'current task' can't make progress
anymore. If each task had a dedicated thread, this wouldn't
be necessary because 'the scheduler' would transparently suspend one
thread and switch to another whenever things would otherwise come to a
standstill, thus providing the illusion that each task really runs on a
dedicated computer without having to consider the needs of other tasks.

This, of course, requires creating as many threads as there are
concurrent tasks and for something like a network server, this could
easily mean thousands or even tenthousands of them. Consequently, each
thread needs to be as cheap as possible, in particular, it shouldn't
have any kernel resources like 'a kernel stack' permanently allocated to
it. Also, in order to make programming easy, actual concurrent
execution of threads should rather be avoided: For as long as a thread
is busy, it just keeps running, and once it invokes 'a blocking call' of
some kind, another thread can start to run. In order to avoid starving
tasks, threads are supposed to be 'nice to each other', that is,
explictly yield the processer every now and then when being busy for a
long time. This means cooperative userspace threading. Because there's
no concurrency, synchronization isn't really needed and because there's
no preemption, no complicated userspace scheduler is needed, either. At
least in theory. In practice, once the process blocks in the kernel, all
threads stop running and hence, each and every posssible blocking system
call must be wrapped in some library routine which accomplishes the same
end in some non-blocking way and runs other threads in the meantime. The
net effect of this is usually a lot of borderline insane library hackery
and some rather bizarre restrictions as to what a thread can do or must
not do (the GNU pth paper provides a nice example of that).

Some other people think that threads are supposed to be useful for users
in the sense that they enable these to utilize processing resources such
as multiple processors and/or multiple cores which might be available to
them. This implies that 'a thread' runs code which looks pretty much
like that of a single-threaded in processes wrt to
'multi-tasking'. In order to keep scheduling overhead down, there will
be relatively few threads (somewhat more than available 'processing
things') but they're both supposed to execute in parallell and to work
with 'shared global resources'. This means synchronization is needed and
the kernel scheduler has to deal with them because the kernel is the
privileged piece of coding managing the actual hardware and thus, the
only software capable of doing things like assigned threads to available
processors for execution. Further, this is supposed to work in an
environment shared with other application which also seek to use the
available resources and which don't have been written to cooperate,
hence, preemption is necessary. The most natural choice for this is 1:1
threading where each 'application thread' corresponds with a 'kernel
scheduled entity' of some sort.

Trying to satisfy both groups of people leads to M:N threading which
really just combines the disadvantages of both models without providing
the advantages of either. It ends up as expensive way to screw everyone
alike but only in practice, not in theory.

I was planning to illustrate this using the pre-Solaris 8/9 'threading
workarounds' as example but this text is too long already ...

==============================================================================
TOPIC: How do I look inside an .exe file to view the programming
http://groups.google.com/group/comp.programming.threads/t/70ea90ac4a00188c?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 26 2013 4:02 am
From: goldee.milan@gmail.com

not even one at least one answer. someone should not write here poems, but simply copy and paste a SOFTWARE how it looks like. line after line.

==============================================================================
TOPIC: How to view what commands are run by an exe file as they are running
http://groups.google.com/group/comp.programming.threads/t/826e0e9502290d02?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 26 2013 5:29 am
From: bryan.chisenhall@gmail.com

On Friday, November 15, 2013 2:03:32 PM UTC-6, Lucas Levrel wrote:
> Le 15 novembre 2013, chiz a ￜcrit :
>
>
>
> > My problem is this: I am trying to install a program that requires me to
>
> > have a specific version of another program installed before completing
>
> > the install. When I run the .exe file, it goes through the normal steps
>
> > until it checks to see if I have a specific version of another program
>
> > installed. I have this program installed but, for whatever reason, it
>
> > does not recognize that I have it installed.
>
>
>
> The "whatever" here is a big problem I think.
>
>
>
> > So, I want to see where it is looking for that file. I don't want to
>
> > edit the .exe file. I just want to know what commands the .exe is trying
>
> > to run so I can see where it is looking. Maybe a "command" is a bad way
>
> > to describe what I'm looking for... not sure. If I can see where it is
>
> > looking, I can install the program into that directory. I'm hoping that
>
> > would solve the problem.
>
>
>
> Normally one should not make assumptions on where a program resides, so
>
> your calling program should not have the path to the called program
>
> hardcoded. Is the called program in your %path%? Run a command window
>
> (Win+R, type "cmd", Enter), type there "the_called_program_name.exe". Is
>
> it found by Windows?
>
>
>
> But are you sure this is a path issue? The calling program could be
>
> running a command like "the_called_program /version" and parsing the
>
> output to know which version is installed.
>
>
>
> --
>
> LL

Thank you for the reply.

I checked the command prompt and it was not found by Windows. Thanks for mentioning that; I'll try to figure out how to get that to work.

The calling program does require a specific version but I tripled checked that thinking it must be the issue. I do have the correct version installed though.

Thanks again!

==============================================================================
TOPIC: Data copying on NUMA
http://groups.google.com/group/comp.programming.threads/t/89bfa62ecdef1706?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Nov 28 2013 2:25 pm
From: Paavo Helde

On NUMA, as the acronym says, some memory is better accessible by a certain
NUMA node than others. Now, let's say I have a deep dynamic-allocated data
structure I want to use in a thread running in another NUMA node, how
should I pass it there? Should I perform the copy in the other thread so
that the new dynamic allocations take place in the target thread? Probably
depends on the memory allocator, what would be the best choice for
Linux/Windows?

This is not a theoretical question, actually we see a large scaling
performance drop on NUMA and have to decide whether to go multi-process
somehow or are there some ways to make multi-threaded apps to behave
better. As far as I understand there is a hard limit of 64 worker threads
per process on Windows, so probably we have to go multi-process anyway at
some time point. Any insights or comments?

TIA
Paavo

== 2 of 3 ==
Date: Sun, Dec 1 2013 5:21 pm
From: Robert Wessel

On Thu, 28 Nov 2013 16:25:12 -0600, Paavo Helde
<myfirstname@osa.pri.ee> wrote:

>
>On NUMA, as the acronym says, some memory is better accessible by a certain
>NUMA node than others. Now, let's say I have a deep dynamic-allocated data
>structure I want to use in a thread running in another NUMA node, how
>should I pass it there? Should I perform the copy in the other thread so
>that the new dynamic allocations take place in the target thread? Probably
>depends on the memory allocator, what would be the best choice for
>Linux/Windows?

On Windows, for example, you can use the VirtualAllocExNuma API to
allocate storage "near" a given node from any running code. By
default the allocation will be in local memory for the allocating
thread, which would not be optimal if another thread is going to do
all the work on that area..

In general, allocating structure close to the executing node is
certainly a win for some applications. Yes, that's vague, but so is
your problem statement. If a thread (or group of threads), is going
to heavily use a structure that won't cache, running all of those
threads in a single NUMA node, and allocating that structure in memory
local to that node, can significantly improve the available bandwidth
and latency for those threads, as well as consuming less of the global
resources for other threads in the system.

>This is not a theoretical question, actually we see a large scaling
>performance drop on NUMA and have to decide whether to go multi-process
>somehow or are there some ways to make multi-threaded apps to behave
>better. As far as I understand there is a hard limit of 64 worker threads
>per process on Windows, so probably we have to go multi-process anyway at
>some time point. Any insights or comments?

There is no 64 threads per process limit in Windows. Perhaps such a
thing existed in Win9x. Win32 has a limit of 32 logical cores per
machine (which has nothing in particular to do with the number of
threads in a process), but that doesn't apply to Win64 (although if
you run Win32 applications on Win64 only the first 32 core in the
first processor group are used to execute Win32 code).

If you're running multiple processes, then the system can fairly
easily split the workload between nodes, as you're implicitly telling
the system that the data is not shared, so the system will try to keep
a process (and hence it's allocations) on a particular node. If
you're running enough threads in a process that you're going to span
more than one node, you're going to have to specify some of that
manually, or structure things so that allocations only happen on the
appropriate node (usually by only doing the required allocations on
the actual threads using the allocated areas).

== 3 of 3 ==
Date: Sat, Dec 7 2013 2:34 am
From: andrew@cucumber.demon.co.uk (Andrew Gabriel)

In article <XnsA287445E6980myfirstnameosapriee@216.196.109.131>,
Paavo Helde <myfirstname@osa.pri.ee> writes:
>
> On NUMA, as the acronym says, some memory is better accessible by a certain
> NUMA node than others. Now, let's say I have a deep dynamic-allocated data
> structure I want to use in a thread running in another NUMA node, how
> should I pass it there? Should I perform the copy in the other thread so
> that the new dynamic allocations take place in the target thread? Probably
> depends on the memory allocator, what would be the best choice for
> Linux/Windows?

On Solaris, memory is always allocated preferentially local to the
core the thread is running on. When a thread becomes runnable, it
is preferentially scheduled on a core local to the data it has been
accessing. The application itself doesn't need to do anything - this
is a feature of the OS. I'm less clear how Linux/Windows handle this.

> This is not a theoretical question, actually we see a large scaling
> performance drop on NUMA and have to decide whether to go multi-process

Multi-process or multi-threaded doesn't make any difference from the
NUMA point of view, if the data is shared in both cases.

> somehow or are there some ways to make multi-threaded apps to behave
> better. As far as I understand there is a hard limit of 64 worker threads
> per process on Windows, so probably we have to go multi-process anyway at
> some time point. Any insights or comments?

You can optimze the data layout such that different cores are not
competing for the same cache lines, and continually invalidating
each others' local caches. To do this, group data accessed by a
single thread at a time into aligned chunks which are multiples of
64 bytes long.

Break up hot locks where possible. Check for tools to identify hot
locks (on Solaris, plockstat).

If you are heavily using malloc/free in a multi-threaded process,
make sure you are linking in a library which contains a
version of these designed for heavy multi-threaded use.

--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.programming.threads"
group.

To post to this group, visit http://groups.google.com/group/comp.programming.threads?hl=en

To unsubscribe from this group, send email to comp.programming.threads+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.programming.threads/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

soft and program

Wednesday, January 29, 2014

comp.programming.threads - 26 new messages in 5 topics - digest

No comments:

Blog Archive

About Me