Saturday, February 13, 2021

Digest for comp.lang.c++@googlegroups.com - 17 updates in 5 topics

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 12 04:05PM -0800

On 2/12/2021 1:19 PM, Chris M. Thomasson wrote:
>> independent.  I don't believe that can be done.
> [...]
 
> Agreed. Its to sensitive.
 
It can be like orchestrating a symphony.
 
https://youtu.be/5znrVdAtEDI
Bonita Montero <Bonita.Montero@gmail.com>: Feb 13 01:51PM +0100

> - Raw spin locks make no sense on /any/ single core.
 
Raw spinlocks almost never make sense in userland.
David Brown <david.brown@hesbynett.no>: Feb 13 03:56PM +0100

On 12/02/2021 19:35, Marcel Mueller wrote:
>> barrier) for loads and stores.
 
> Strictly speaking this is not true if you take DMA into account. But
> this is not a common use case.
 
That depends very much on the way DMA and the memory system is
implemented. On many microcontrollers, volatile is all you need. On
others, the memory barrier instructions generated with atomics is not
sufficient - you need explicit cache flush instructions. (That's the
kind of thing that makes low-level code so much fun!)
 
>> fail completely with this implementation.
 
> Sure, when the hardware does not allow lock free access, then there are
> no generic, satisfactory solutions.
 
I think (correct me if I'm wrong) that every system will have a limit to
the lock-free sizes they support.
 
 
> In this case yo cannot use DWCAS on this platform. You need to seek for
> other solutions. E.g. store and replace a 32 bit pointer to the actual
> value.
 
Yes, that could perhaps be a way to handle things, but off the top of my
head I can't see how to do this safely and generically.
 
>> the C++ standard either.  The standard only says what the code should
>> do, not how it should do it (and in this case, the code does not work).
 
> So the library needs to be adjusted platform dependent.
 
Yes.
 
>> otherwise, you have misunderstood the system.  The best you can do is
>> figure out a maximum response time,
 
> There are systems with guaranteed maximum values.
 
Of course - but that is the maximum of the inherent cpu maximum response
time, maximum delays from memory systems, maximum run-times of other
interrupt functions (that have not re-enabled interrupts), and so on.
Interrupt-disable sections which are shorter than the maximum interrupt
function time will not affect the maximum response time for interrupts.
 
>> when things will happen - they are about giving guarantees for the
>> maximum delays.
 
> Exactly. But that is enough.
 
Yes.
 
 
> Context switches can be quite expensive if your hardware has many
> registers (including MMU) and no distinct register sets for different
> priority levels.
 
Yes, context switches can be expensive - but I don't see how that is
relevant. Interrupt response times don't usually have to take full
context switch times into account, because you don't need a full context
switch to get to the point where you are able to react quickly to the
urgent event. Most cpus preserve the instruction pointer, flag
register, and perhaps one or two general purpose registers when an
interrupt occurs - they rarely do full context switches.
 
(There are more niche architectures with very fast and deterministic
response times.)
 
 
> Feel free to do so if it is suitable on your platform. You already
> mentioned that this is not sufficient on multi core systems, which
> become quite common for embedded systems too nowadays.
 
Embedded systems can broadly be divided into three groups. There are
"big" systems, for which multi-core is becoming more common - these are
dominated by Linux, and you can use whatever multi-threading techniques
you like from the Linux world. There are "small" systems, with a single
core - these are far and away the largest in quantity. In between, you
have asymmetric multiprocessing - devices with a big fast core for hard
work (or possibly multiple fast cores running Linux), and a small core
for low-power states, handling simple low-level devices, or dedicated to
things like wireless communication. Although you have more than one
core in the same package, these are running different programs (and
perhaps different OS's).
 
I see very little in the way of symmetric multicore systems running
RTOS's. That may change, however.
 
> cases/ not all cases.
> If a generic atomic library does not guarntee forward progress when used
> with different priorities it is not suitable for this case.
 
All RTOS systems are sensitive to priority inversion, as are all small
single-core embedded systems with bare metal (interrupt functions are,
in effect, high priority pre-emptive threads). It is not a special case
- it applies to just about every use of devices such as the Cortex-M or
other microcontrollers. You cannot use the gcc-provided atomics library
for even the simple situation of having a 64-bit value shared between an
interrupt function and a main loop or other thread - it will kill your
system if there is an access conflict.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 13 08:42PM +0100

Am 13.02.21 um 15:56 schrieb David Brown:
>> value.
 
> Yes, that could perhaps be a way to handle things, but off the top of my
> head I can't see how to do this safely and generically.
 
This is quite easy. you only need two physical storage for the data. One
holds the current value, one the next value. When you synchronize only
writers they can safely swap the storage pointer. No need to synchronize
readers. Often that's enough.
 
Even more sophisticated solutions may use thread local storage for the
values. Each thread has a current and a next storage. This removes the
need for the writer mutex. In case of an IRQ handler (which usually is
not re-entrant) dedicated storage for the IRQ handler could do the job.
 
If this is still not sufficient because the high update rate could cause
a reader to get two versions behind the stolen bits hack in the
referring pointer may identify this situations. Readers need to
increment the master pointer atomically before using the storage it
points to. Now writers know that they should not discard or modify this
storage.
 
 
>> If a generic atomic library does not guarntee forward progress when used
>> with different priorities it is not suitable for this case.
 
> All RTOS systems are sensitive to priority inversion,
 
Sure, but lock free algorithms are not. ;-)
 
> for even the simple situation of having a 64-bit value shared between an
> interrupt function and a main loop or other thread - it will kill your
> system if there is an access conflict.
 
Is it possible to raise the priority of all mutex users for the time of
the critical section? This will still abuse a time slice if the spin
lock does not explicitly call sleep. But at least it will not deadlock.
 
 
Marcel
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 13 01:24PM -0800

On 2/13/2021 6:56 AM, David Brown wrote:
>> no generic, satisfactory solutions.
 
> I think (correct me if I'm wrong) that every system will have a limit to
> the lock-free sizes they support.
 
You are correct. I remember a long time ago when CMPXCHG16B was first
introduced, it was _not_ guaranteed that every future 64-bit x86 would
support it. This instruction on a 64-bit system is DWCAS for Double
Width Compare-And-Swap. IBM z arch has CDS for a DWCAS. Iirc, CDS is
guaranteed to be on all z-arch.
 
For fun.... Check is odd ball instruction out David:
 
CMP8XCHG16 for the Itanium.
 
;^)
 
[...]
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 13 01:35PM -0800

On 2/13/2021 11:42 AM, Marcel Mueller wrote:
 
> Is it possible to raise the priority of all mutex users for the time of
> the critical section? This will still abuse a time slice if the spin
> lock does not explicitly call sleep. But at least it will not deadlock.
 
 
Iirc, priority inheritance can sort of "reduce" priority inversion. Have
binary semaphores on the mind.
 
https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutexattr_getprioceiling.html
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 13 01:36PM -0800

On 2/13/2021 4:51 AM, Bonita Montero wrote:
>> - Raw spin locks make no sense on /any/ single core.
 
> Raw spinlocks almost never make sense in userland.
 
True.
mickspud@potatofield.co.uk: Feb 13 10:34AM

On Fri, 12 Feb 2021 18:18:59 GMT
 
>>>wouldn't know this of course as you are fucking clueless anachronism.
 
>>Oh dear, someone tell the child about copy-on-write.
 
>copy-on-write and overcommit are two orthogonal concepts.
 
They're really not.
mickspud@potatofield.co.uk: Feb 13 10:38AM

On Fri, 12 Feb 2021 13:08:32 -0800
 
>> as fundamental as multiplexing network sockets without threading then I guess
 
>> you may well be screwed without them.
 
>Are you familiar with IOCP?
 
Nope, but having skim read about it I don't see anything there that multi
process couldn't accomplish. If you know otherwise then feel free to explain.
mickspud@potatofield.co.uk: Feb 13 10:42AM

On Fri, 12 Feb 2021 22:22:41 +0000
>>> On 12/02/2021 16:20, mickspud@potatofield.co.uk wrote:
 
>> Still waiting for your proof. Take your time.
 
>Proof of what, dear?
 
Is the thread getting too complex for you too follow? Go have a lie down.
 
 
>The problem, dear, is when lots of processes start "copy-on-writing" causing
>over-committed pages to be allocated you run out of memory and Linux starts
>killing random processes. This is the omnishambles, dear.
 
Well as someone else said, you can switch over commit off if its an issue and
at least *nix does it page by page rather than the MS option of "Lets load
an entirely new copy of an exe into memory and start executing it from the
beginning with little parental state". Very useful. The irony of course is
that the WIndows kernel can do fork but for [reasons] MS doesn't expose it
in the Win32 API.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 13 12:52PM -0800


>> Are you familiar with IOCP?
 
> Nope, but having skim read about it I don't see anything there that multi
> process couldn't accomplish. If you know otherwise then feel free to explain.
 
IOCP is basically windows version of POSIX aio. Just make sure to never
create a process and/or thread per connection. That really, really bad.
David Brown <david.brown@hesbynett.no>: Feb 13 02:57PM +0100

On 12/02/2021 17:15, Manfred wrote:
 
> "The alignment of a member is on a boundary that's either a multiple of
> n, or a multiple of the size of the member, whichever is smaller."
 
> So, this compiler /does/ padding between members depending on their order.
 
As I said, the /padding/ between members in a struct does depend on
their order. Their /alignment/ does not. That applies to MSVC, gcc,
and any other compiler.
 
If you have:
 
struct S {
char a;
int b;
char c;
}
 
where "int" is size 4 and alignment 4, then S will have an alignment of
4, there will be 3 bytes of padding between "a" and "b", and three bytes
of padding after "c" - giving a total size of 12.
 
If you arrange it as:
 
struct S {
char a;
char c;
int b;
}
 
then there will be 2 bytes of padding after "c", and a total size of 8
(with the same alignment of 4).
 
 
If you arrange it as:
 
struct S {
int b;
char a;
char c;
}
 
then there will be 2 bytes of padding after "c" at the end of the
structure, and a total size of 8 (with the same alignment of 4).
 
Padding depends on the order, alignment does not.
 
 
Now, compilers are free to /increase/ alignments if they want, and
padding as needed to support that. It's not uncommon on 64-bit systems
to use 8-bit alignment on structures, and data on stacks or in
statically allocated memory can be given extra alignment - this can aid
cache friendly memory layouts. And of course compilers can offer
options or extensions to give different alignment (and thereby padding)
arrangements, even if that breaks the platform's ABI.
mickspud@potatofield.co.uk: Feb 13 04:39PM

On Sat, 13 Feb 2021 14:57:19 +0100
>cache friendly memory layouts. And of course compilers can offer
>options or extensions to give different alignment (and thereby padding)
>arrangements, even if that breaks the platform's ABI.
 
#pragma pack is absolutely essential if you're mapping a structure directly
to a memory block or serialising the block into a structure such as network
packet header/data or in a device driver so you let the compiler know exactly
how you want padding if any.
David Brown <david.brown@hesbynett.no>: Feb 13 06:19PM +0100

> to a memory block or serialising the block into a structure such as network
> packet header/data or in a device driver so you let the compiler know exactly
> how you want padding if any.
 
No, it is not essential. It can be convenient, but it is far from the
only way to achieve the padding you want or to handle externally defined
structures (network packets, file formats, hardware registers, etc.).
I've seen it misused much more often than I've seen it used appropriately.
 
When you need to control padding, I usually find it cleaner and more
reliable (as well as more portable) to put the padding in explicitly,
combined with static assertions to confirm that everything is the right
size. Non-aligned data can read or written using memcpy(), and good
compilers will optimise the results at least as efficiently as accessing
packed structures.
 
Some older or poorer compilers can't do a decent job of optimising
memcpy(), and there some kind of compiler-specific "pack" solution can
be helpful.
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 13 12:54PM

On Tue, 2021-02-09, Chris M. Thomasson wrote:
>>         std::thread writers[ct_writer_threads_n];
 
>>         std::cout << "Booting threads...\n";
 
> std::cout.flush();
...
 
> Some people are telling me that they cannot see the output until the
> program is finished, hence adding std::cout.flush() helps here.
 
Don't those people have a broken environment? If you write a full
line of text to std::cout, and std::cout is a terminal, surely that
line should be printed (leave the process) when operator<< returns?
I.e. std::cout should be in line buffered mode.
 
At least on Unix (where the whole output/error stream thing comes
from).
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 12 04:28PM -0800

On 2/11/2021 11:55 AM, Alf P. Steinbach wrote:
 
>> He is hyper smart... Big time.
 
> And a free book! Great! Except... I'm not so much into shared memory
> parallel programming, but others here may be.
 
Been into it for a long time now. Went through periods of pause, when I
got into fractals. Remember a long time ago when creating externally
assembled functions was safer. Now we have C++11... What a JOY!
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 12 04:31PM -0800

On 2/12/2021 4:28 PM, Chris M. Thomasson wrote:
 
> Been into it for a long time now. Went through periods of pause, when I
> got into fractals. Remember a long time ago when creating externally
> assembled functions was safer. Now we have C++11... What a JOY!
 
Going into the fractal world is akin to a large realm "embarrassingly"
parallel algorithms. Example:
 
https://www.shadertoy.com/view/ltycRz
 
So, really dove into GPU and shaders. However, there can be cases and
fractals that are not so "embarrassingly" parallel...
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: