soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Here is my new variants of Scalable RWLocks that are powerful.. - 1 Update
Performance of unaligned memory-accesses - 19 Updates
In the end, rason will come - 1 Update
In the end, rason will come - 3 Updates
Why can't I understand what coroutines are? - 1 Update

Here is my new variants of Scalable RWLocks that are powerful..

aminer68@gmail.com: Aug 07 11:18AM -0700

Hello,

Here is my new variants of Scalable RWLocks that are powerful..

Author: Amine Moulay Ramdane

Description:

A fast, and scalable and starvation-free and fair and lightweight Multiple-Readers-Exclusive-Writer Lock called LW_RWLockX, the scalable LW_RWLockX does spin-wait, and also a fast and scalable and starvation-free and fair Multiple-Readers-Exclusive-Writer Lock called RWLockX, the scalable RWLockX doesn't spin-wait but uses my portable SemaMonitor and portable event objects , so it is energy efficient.

The parameter of the constructors is the size of the array of the readers , so if the size of the array is equal to the number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention, please look at the source code of my scalable algorithms to understand.

I have used my following hash function to make my new variants of RWLocks scalable:

---

function DJB2aHash(key:int64):uint64;
var
i: integer;
key1:uint64;

begin
Result := 5381;
for i := 1 to 8 do
begin
key1:=(key shr ((i-1)*8)) and $00000000000000ff;
Result := ((Result shl 5) xor Result) xor key1;
end;
end;

---

You can download them from:

https://sites.google.com/site/scalable68/new-variants-of-scalable-rwlocks

Thank you,
Amine Moulay Ramdane.

Performance of unaligned memory-accesses

Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 07 12:37PM

On Wed, 2019-08-07, Bonita Montero wrote:
> I just wrote a litte test that checks the performance of unaligned
> memory-acesses on x86 / Win32. I've run this code on my Ryten 1800X:

What's the point of the exercise, in a C++ context? Unaligned access
in portable code is always the result of a programming error.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 02:47PM +0200

> What's the point of the exercise, in a C++ context? Unaligned access
> in portable code is always the result of a programming error.

Pure theory. All platforms support unaligned access either directly
through the CPU or through trapping by the operating-system (very slow).

David Brown <david.brown@hesbynett.no>: Aug 07 03:57PM +0200

On 07/08/2019 14:08, Bonita Montero wrote:
> ... is not what you'd like to tell. You wanted to tell that the
> operator is compiled in a way that the shifts and loads are bundled
> in a single load. So I misunderstood you.

I wrote what I intended to write, but you misunderstood. (That happens,
sometimes, especially if when you have to work with a second language.
It's no problem.)

The compiler turns the shift-and-or code into optimised code using an
unaligned access. I was surprised that MSVC could not do this
optimisation - that compiler is often quite good at optimisations.

David Brown <david.brown@hesbynett.no>: Aug 07 04:18PM +0200

On 07/08/2019 14:47, Bonita Montero wrote:
>> in portable code is always the result of a programming error.

> Pure theory. All platforms support unaligned access either directly
> through the CPU or through trapping by the operating-system (very slow).

No, they don't. Some cpus support direct unaligned accesses. For
others, various things could happen. On big OS's, you are likely to get
a trap or exception causing the OS to kill your program with a fault - I
can't imagine why an OS would bother simulating the unaligned access.
On embedded systems, unaligned access may lead to a bus fault of some
sort, halting the system or causing a restart. And on some systems that
I have used, unaligned access will silently give you muddled reads and
corrupting writes.

Unaligned access is always an error. Use code with shifts and masks, if
you need it, or use memcpy. If your compiler isn't good enough to give
you efficient enough code for your needs, get a better compiler.

scott@slp53.sl.home (Scott Lurndal): Aug 07 02:46PM

>Try using unaligned addresses with several threads. Try doing a LOCK
>XADD on a location that straddles two cache lines, and is not aligned on
>a line, vs one that is aligned on a cache line, and properly padded.

Processor vendors work hard so that most unaligned accesses don't add
significant additional latencies to the instructions. Our ARM64 processor
generally has no perf difference between aligned and unaligned to DRAM
(unaligned isn't supported to device memory).

Locked transactions on intel systems that straddle cache lines need
to assert a system bus lock, which causes extreme performance degradation,
particularly in NUMA systems. Don't do that.

scott@slp53.sl.home (Scott Lurndal): Aug 07 02:47PM

>for performantly mofifying datastructure for persistence or
>transmission over the network. So this is clearly an unique
>advantage of the Intel-Architecture.

No, it's not unique. See AArch64.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 04:53PM +0200

For portability-reasons almost any system tries to be comatible to
x86-systems for unaligned accesses. But on some exotic systems which
don't run a common operating system or none at all you might be right.
Unaligned accesses are simply useful for data-structures which are
sent over the network or persisted on disk to save the padding bytes.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 04:55PM +0200

>> transmission over the network. So this is clearly an unique
>> advantage of the Intel-Architecture.

> No, it's not unique. See AArch64.

Ok, except for atomicity: i.e. loads / stors aren't atomic and
atomic RMW-instruction will alway fault.

scott@slp53.sl.home (Scott Lurndal): Aug 07 03:11PM

>> No, it's not unique. See AArch64.

>Ok, except for atomicity: i.e. loads / stors aren't atomic and
>atomic RMW-instruction will alway fault.

B2.2:

Atomicity is a feature of memory accesses, described as atomic accesses. The Arm architecture description refers to
two types of atomicity, single-copy atomicity and multi-copy atomicity. In the Armv8 architecture, the atomicity
requirements for memory accesses depend on the memory type, and whether the access is explicit or implicit. For
more information, see:

B2.2.1 Requirements for single-copy atomicity

If ARMv8.4-LSE is implemented, all loads and stores are single-copy atomic when the following conditions are
true:
· Accesses are unaligned to their data size but are aligned within a 16-byte quantity that is aligned to 16 bytes.
· Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.

Otherwise it is IMPLEMENTATION DEFINED whether loads and stores are single-copy atomic.

David Brown <david.brown@hesbynett.no>: Aug 07 05:30PM +0200

/Please/ learn to use Usenet properly! Keep attributions, and quote an
appropriate amount of context!

On 07/08/2019 16:53, Bonita Montero wrote:
> For portability-reasons almost any system tries to be comatible to
> x86-systems for unaligned accesses.

Total nonsense.

There is rarely any reason for wanting unaligned access, and no
justification for using it in code that should be portable. (An
implementation can use it under the hood, when implementing memcpy, or
for the kind of optimisations I showed gcc doing. But you don't write
unaligned accesses in the source code.)

Other cpu designs do not attempt to copy the x86. The great majority of
code that is written that is reliant on a processor working like an x86
is written for Windows, and does not need to be portable to anything
other than x86.

The rest of the programming world is mostly either aimed at reasonable
portability, such as across different *nix systems, or targeted at
smaller embedded systems. For portable code, you don't care about
unaligned accesses because you don't use them in the source code - you
only care that the implementation handles your source code efficiently.

Many cpus implement unaligned accesses - because the designers think the
balance between use and cost makes it appropriate. It is /not/ for
compatibility with x86.

And for processors that don't support unaligned access in hardware, no
one would bother supporting it by software emulation except if it were
required for /binary/ compatibility with other processors in the same
family. I don't know of any systems where that applies.

> But on some exotic systems which
> don't run a common operating system or none at all you might be right.

These "exotic" systems far and away outnumber the PC's of this world.
The programming world does not revolve around x86.

> Unaligned accesses are simply useful for data-structures which are
> sent over the network or persisted on disk to save the padding bytes.

Nonsense. Proper programming is useful for data structures that are
sent over the network or are stored on files. Use portable coding, or
implementation dependent coding (like "packed" structs), and let the
compiler use whatever instructions are supported and most efficient for
the platform.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 05:39PM +0200

> There is rarely any reason for wanting unaligned access, and
> no justification for using it in code that should be portable.

I didn't say that it's always good style, but a lot of code relies
on it so that most platforms support it through the CPU or the OS.
So the code formerly written for x86 will also run without changes.

> Many cpus implement unaligned accesses - because the designers think
> the balance between use and cost makes it appropriate. It is /not/
> for compatibility with x86.

I'll bet no one would have implemented this on other CPUs if there
won't be a lot of code written on x86-machines that relies on this.

> These "exotic" systems far and away outnumber the PC's of this world.

Most of these trap unaligned acceses through the OS.

> implementation dependent coding (like "packed" structs), and let the
> compiler use whatever instructions are supported and most efficient
> for the platform.

That's your taste of proper programming, but not the facts.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 05:41PM +0200

>> atomic RMW-instruction will alway fault.

> B2.2:
> ...

I mis-worded what I wanted to tell: I simply wanted to say that
_unaligned_ loads / stores are not atomic on ARM.

scott@slp53.sl.home (Scott Lurndal): Aug 07 03:58PM

>> ...

>I mis-worded what I wanted to tell: I simply wanted to say that
>_unaligned_ loads / stores are not atomic on ARM.

But they are atomic on ARMv8.4. Which is what the text you
elided showed.

scott@slp53.sl.home (Scott Lurndal): Aug 07 04:07PM

>/Please/ learn to use Usenet properly! Keep attributions, and quote an
>appropriate amount of context!

Good luck with that, it's been tried before.

>> For portability-reasons almost any system tries to be comatible to
>> x86-systems for unaligned accesses.

>Other cpu designs do not attempt to copy the x86.

Certainly not the instruction set, unless you consider AMD, Cyrix,
Nat Semi, Harris, IBM, TI or Transmeta :-)

On the other hand, any processor vendor attempting to make a competing
server processor will have to accomodate the standard programming
methods used on Intel processors if they want to gain any market share
since most of the software would be ported from X86. That means things
like unaligned accesses and providing something that looks like the
intel strongly (program) ordered memory model are high on the desirable feature list.

AArch64 was specifically designed to be a competing server processor and thus
supports unaligned accesses. The memory model is a bit weaker but generally
provides program ordering; a small percentage of software ported from x86
may require some changes (unless it uses the appropriate C11 or C++14
capabilities).

>Many cpus implement unaligned accesses - because the designers think the
>balance between use and cost makes it appropriate. It is /not/ for
>compatibility with x86.

For ARMv8 it was _specifically_ for compatibility with X86(_64).

David Brown <david.brown@hesbynett.no>: Aug 07 06:12PM +0200

On 07/08/2019 17:39, Bonita Montero wrote:
>> no justification for using it in code that should be portable.

> I didn't say that it's always good style, but a lot of code relies
> on it so that most platforms support it through the CPU or the OS.

Any code that relies on this is very badly written. It may be that it
is common in the Windows world, where it is clear that many people have
quite a poor knowledge of legal C and C++, and little concept or
interest in writing clear, safe, and portable code. And it may be that
MSVC, knowing that its users are often unaware of the details of their
programming languages, is dumbed down to support such broken code.

Attempts to use unaligned access on other compilers may fail in
unexpected ways due to undefined behaviour.

> So the code formerly written for x86 will also run without changes.

Again, nonsense.

The x86 is quite a "programmer friendly" ISA. It supports unaligned
accesses, it has a strong memory model, it has support for many types of
atomic operations. People do write code that is dependent on these
features, and also dependent on compilers that have extra semantics to
support non-portable coding (such as guaranteeing wrapping on signed
integer overflow). Code that is written "assuming an x86 processor"
will often have strange breakages on other platforms and other compilers
- because the code is not portable C or C++.

Other processors do not copy these x86 features. These kind of features
can often be very expensive to implement (in terms of die size, power
consumption, speed, etc.) and only exist in the x86 world because of
backwards compatibility with the kind of badly written code that exists
in the Windows world.

>> for compatibility with x86.

> I'll bet no one would have implemented this on other CPUs if there
> won't be a lot of code written on x86-machines that relies on this.

Bet whatever you like. But don't quit your day job.

>> These "exotic" systems far and away outnumber the PC's of this world.

> Most of these trap unaligned acceses through the OS.

Can you give any kind of a reference for even a single case where you
know the OS will trap unaligned accesses and emulate them in software?
If not, then I think we can dispense with the fantasy that OS's provide
support for unaligned access when the cpu does not.

>> compiler use whatever instructions are supported and most efficient
>> for the platform.

> That's your taste of proper programming, but not the facts.

Quote the section in the C++ standards that says unaligned access is
allowed in C++, and I'll believe you.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 06:33PM +0200

> Any code that relies on this is very badly written.

Depends on which platforms you target.

> It may be that it is common in the Windows world, where it is clear
> that many people have quite a poor knowledge of legal C and C++,

Unaligned accesses come not by doing straight coding. You must code
with special alignment-directives or with pointer-casting. So these
developers that use unalignes acesses know what they're doing and
they know the target-platforms.

> integer overflow). Code that is written "assuming an x86 processor"
> will often have strange breakages on other platforms and other compilers
> - because the code is not portable C or C++.

Maybe it will break because of other features; but the unaligned
themselfes accessses are mostly de-facto-portable to the target
-platforms.

> Can you give any kind of a reference for even a single case where you
> know the OS will trap unaligned accesses and emulate them in software?

It's just a tiny task to support this by an OS and helps to run a lot
of old code; so this is very likely.

>> That's your taste of proper programming, but not the facts.

> Quote the section in the C++ standards that says unaligned access is
> allowed in C++, and I'll believe you.

My statement was related to your taste of proper programming and not
to the standard.

You're simply one of those compulsive and intolerant programmers.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 06:38PM +0200

> provides program ordering; a small percentage of software ported from x86
> may require some changes (unless it uses the appropriate C11 or C++14
> capabilities).

I don't think that ARM is cosiderring AArch64-implementations as a
server-competitor.
It's just convenient to have unaligned loads / stores for persistence
and network-transfers.

scott@slp53.sl.home (Scott Lurndal): Aug 07 05:34PM

>> capabilities).

>I don't think that ARM is cosiderring AArch64-implementations as a
>server-competitor.

It doesn't matter what you think. You can't even be troubled to
properly attribute your posts.

Aarch64 was specifically designed as a server-capable processor. I
was there.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 08:07PM +0200

> properly attribute your posts.

> Aarch64 was specifically designed as a server-capable processor.
> I was there.

All attempts to establish Aarch64-based machines as servers failed.
F.e. this here: https://en.wikipedia.org/wiki/Calxeda
The people simply want x86 and in rare cases SPARC or POWER.

And I doubt that Aarch64 was designed mainly with servers in mind.
A 64 bit address-space has advantages even on smartphones with <=
4GB RAM.

In the end, rason will come

ram@zedat.fu-berlin.de (Stefan Ram): Aug 07 04:52PM

> x = a/b
>versus
> x = a//b

In VBA it's

x = a/b

vs

x = a\b

which has the elegance of using the same symbols as used in
mathematics. (A backslash-like symbol, in mathematics, is
used for set difference, quotient group, and integral division.)

In the end, rason will come

Manfred <noname@add.invalid>: Aug 07 04:05PM +0200

Premise: I agree with you on most of your arguments about overflow.

I also note that the authors do explicitly define wrapping for signed
integers (section 2, [basic.fundamental]), but also explicitly leave out
of scope (section 3) /compiler/ behaviour with wrapping on signed
integers, which makes the proposal inherently flawed, I believe.
It sounds like they know there is a problem with the matter, yet they
want signed integer wrapping as defined behaviour, but do not want to
address the consequences of this decision.

On 8/6/2019 2:32 PM, David Brown wrote:
> /types/ for which overflow is defined (unsigned types) and types for
> which it is not defined. But overflow behaviour should be part of the
> operations, not the types.

Here I tend to disagree, or at least I am fine with the behaviour being
attached to the types.
You are probably recalling that in ASM overflow is part of the
instruction, but I think this is because the ASM type system is
obviously much more rudimentary than in higher level languages.
Programming languages are meant to translate human logic into machine
instructions, and we are used to operations that behave differently
depending on the type they are performed on - see e.g. addition on real
and complex numbers.
From this perspective, in binary arithmetic it does make sense that
addition behaves differently for signed and unsigned integers.
On the other hand, having some sort of "+" and "ǂ" would complicate the
syntax, and be overly redundant too: you would have to specify the
behavior of non-wrapping addition on unsigned integers, and wrapping
addition on signed integers as well - this would bring more confusion
than help, IMHO.

David Brown <david.brown@hesbynett.no>: Aug 07 05:00PM +0200

On 07/08/2019 16:05, Manfred wrote:
> It sounds like they know there is a problem with the matter, yet they
> want signed integer wrapping as defined behaviour, but do not want to
> address the consequences of this decision.

Yes. But the link was to the first draft of the proposal - I made
another post with a link to a later version, where the idea of defining
signed overflow is dropped.

>> operations, not the types.

> Here I tend to disagree, or at least I am fine with the behaviour being
> attached to the types.

I think it would be unpractical, in general, to have overflow behaviour
attached to operations rather than types - but it is the operations that
have the behaviour.

It would be entirely possible to put together classes and some operator
overloads that would let you write things like :

int a, b, c;

c = a +wrapping+ b;
c = a -saturating- b;

and so on.

But I suspect people would find that too verbose for most uses. Hence
we have the current solution.

> You are probably recalling that in ASM overflow is part of the
> instruction, but I think this is because the ASM type system is
> obviously much more rudimentary than in higher level languages.

It was not what I was thinking of, no. (There are several reasons for
assembly arithmetic operations working the way they do and having the
flags they do, at least on some cpus.)

> and complex numbers.
> From this perspective, in binary arithmetic it does make sense that
> addition behaves differently for signed and unsigned integers.

Certainly some aspects of behaviour have to depend on the operand types
and the result types. But the behaviour is not fully defined by them.
In mathematics, when you divide two integers you can decide if you want
the result rounded/truncated to an integer, or expressed as a rational,
or perhaps as a real number. It is the operation that determines this,
not the operand types. When you have two unsigned integers and subtract
them, you could decide the result should be a signed integer rather than
an unsigned integer - it is the operation that determines it. Your
choice of wrapping, saturating, trapping, ignoring overflow, etc., is a
matter of the operation, independent of the types.

For practical reasons (which I am mostly happy with), C says that when
the operands are unsigned types (after integer promotion) the operation
is carried out as a wrapping operation, while for signed types (after
promotion), overflow is UB.

> behavior of non-wrapping addition on unsigned integers, and wrapping
> addition on signed integers as well - this would bring more confusion
> than help, IMHO.

I agree that practicality forces the language to use types to determine
the operations you get from +, -, etc. But the overflow behaviour is
part of the operation, not the type.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 07 06:39PM +0200

On 07.08.2019 17:00, David Brown wrote:
> I think it would be unpractical, in general, to have overflow behaviour
> attached to operations rather than types - but it is the operations that
> have the behaviour.

Consider in Python,

x = a/b

versus

x = a//b

... where the former is always, reliably, floating point division, and
the latter is always, reliably, integer division. IMO that's nice. The
C++ way with operator behavior influenced by types just trips up people.

> c = a +wrapping+ b;
> c = a -saturating- b;

> and so on.

For this I would consider the C# `checked` keyword.

<url:
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/checked>

There is AFAIK nothing corresponding to saturation, but by default
integer arithmetic is modulo. Using the `checked` keyword like

checked { c = a + b; }

... one specifies overflow checking, with a well defined exception,
namely `System.OverflowException`, in case of overflow.

The default in C++ is instead UB for overflow. So in C++ one could also
have keyword `wrapping`. I.e, keywords/contexts `wrapping` and
`checked`, plus maybe `unchecked` for the case where someone uses a
compiler option to change default to `wrapping` or `checked`, but one
really wants the possible optimization and efficiency of guaranteed UB.

> But I suspect people would find that too verbose for most uses.

Yeah, but it's like the old proof that doors are practically impossible,
by envisioning one particular door that's obviously very impractical.

For doors there is existence proof that they're not practically impossible.

And ditto for type-independent reliable operator behavior, in particular
the C# approach.

> Hence we have the current solution.

No, for sure. But more my opinion: it's more historical, that some
decades ago the optimization possibilities one could give the compiler
for stuff like this, did matter. Today there is existence proof, in
particular of Java outperforming C++ in certain cases, that it not only
does not matter but can be a directly counter-productive approach.

[snip more]

Cheers!,

- Alf

Why can't I understand what coroutines are?

scott@slp53.sl.home (Scott Lurndal): Aug 07 02:41PM

>> obtain, which is likely to require use of the CPU. So why not just use
>> threads without AIO?

>It is better to use either async or blocking I/O but not to mix.

For the same file descriptor or for the program? why?

Use whatever is necessary to provide the required functionality
and performance.

>difference there is if such translation from blocking to async I/O
>was made by glibc or ourselves (other than less work for
>ourselves when glibc did it)?

What's wrong with using blocking I/O for some files and async I/O for
others? (or even both on the same file, for that matter).

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, August 7, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

No comments:

Blog Archive

About Me