soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Performance of unaligned memory-accesses - 23 Updates
Why can't I understand what coroutines are? - 1 Update
cmsg cancel <2cd1a63d-ee8d-4a48-b810-18c7983af023@googlegroups.com> - 1 Update

Performance of unaligned memory-accesses

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@example.com>: Aug 11 04:37PM -0700

On 8/11/2019 8:26 AM, David Brown wrote:
> <https://godbolt.org> doesn't have it.

> If someone could compile this code, and also show the instructions used
> by the compiler for accessing obj.x and obj.z, that would be nice.

DAMN! I won a Sun Fire T2000 from the CoolThreads contest a while back,
around 2005 iirc. But sold it for 2000$ after several years of winning
it! I will never forgive myself for this. It sounded like a damn vacuum
cleaner.

https://www.webwire.com/ViewPressRel.asp?aId=6542

All of the Sun links are destroyed.

It was for my experimental vZoom project. Sun got purchased by Oracle
right around the same time. Actually, I was a finalist. Each one was
given a T2000.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@example.com>: Aug 11 04:42PM -0700

On 8/11/2019 4:37 PM, Chris M. Thomasson wrote:
>>> David Brown <david.brown@hesbynett.no> writes:
>>>> On 11/08/2019 00:59, Ian Collins wrote:
>>>>> On 11/08/2019 09:32, David Brown wrote:
[...]
> around 2005 iirc. But sold it for 2000$ after several years of winning
> it! I will never forgive myself for this. It sounded like a damn vacuum
> cleaner.
[...]

Here is an old post where I mention it:

http://boost.2283326.n4.nabble.com/lockfree-fifo-Review-td2648279i40.html

David Brown <david.brown@hesbynett.no>: Aug 11 11:59PM +0200

On 11/08/2019 22:11, Keith Thompson wrote:
>> unaligned pointers - but if it does not, then you have to assume that
>> they are simply not allowed.

> (Did you mean to say "C++ standard"?)

I had been quoting from the C standards, because I am more familiar with
them. James showed similar quotations from the C++ standards.

> the problem, and the run-time behavior *can* be exactly what someone
> might naively expect.

> I suggest sticking to the terms defined by the standard itself.

Fair enough. I have been a little too colloquial in my language. Yes,
I did mean "undefined behaviour". And that means the standards say
nothing about the results - there are no guarantees of getting the
effects you want (unless the compiler documentation gives additional
information), but equally nothing to say that you /won't/ get the
results you want.

Keith Thompson <kst-u@mib.org>: Aug 11 05:07PM -0700

> When done properly, then I should be able to comment out the pack(1),
> and it ought to still work, and with the same sizes of structs, and the
> same member offsets.

Then there's no point in using pack(1), is there? But I asked
specifically about a case where it causes a member to be aligned
differently than it would without the pack(1).

> But pack(1) can also be used to define a particular layout, in instances
> where I don't need to access individual misaligned members, just to get
> the right overall size.

If you have
struct foo { char c; int i; }
(assume int is 4 bytes), and #pragma pack(1) gives the struct a size of
5 bytes, then an array of struct foo will result in misaligned members.

#pragma pack(1)
struct foo { char c; int i; } array[2];

One of array[0].i and array[1].i will be at an odd address.

> More difficult is supporting pack(1) when a machine doesn't have
> byte-addressability, but I don't know if that is common these days.

It depends on what you mean by "byte-addressibility". Almost all
machines can access bytes at arbitrary addresses. The question is
whether they can access words at odd addresses.

> (My own compilers target x64.
[snip]

And since x64 can access words at odd addresses, there's no problem.

If you wanted to target your compiler to systems that don't support
accessing words at odd addresses, you might have a problem.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Bart <bc@freeuk.com>: Aug 11 11:58PM +0100

On 11/08/2019 21:07, Keith Thompson wrote:
> probably just work. On other targets, it might blow up -- and if
> so, I would seriously question the wisdom of claiming to support
> #pragma pack(1).

Actually, most of the time when pack(1) is used, elements will be
properly aligned anyway. Because they will be sized and arranged
manually to get the most efficient packing.

When done properly, then I should be able to comment out the pack(1),
and it ought to still work, and with the same sizes of structs, and the
same member offsets.

But pack(1) can also be used to define a particular layout, in instances
where I don't need to access individual misaligned members, just to get
the right overall size.

More difficult is supporting pack(1) when a machine doesn't have
byte-addressability, but I don't know if that is common these days.

(My own compilers target x64. A quick test involving passes over an
array of 100 million of your structs, took 1.33 seconds with normal
alignment to write to and read from the .i element (all unoptimised code).

Using pack(1), that increased to 1.57 seconds. But this included
multiplication by 5 for each access. When I manually optimised that out,
the timing went down to 1.26 seconds; faster then aligned access!

I guess because 37.5% less memory was used, coupled with the x64's
apparently efficient handling of misaligned reads and writes.)

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@example.com>: Aug 11 10:05PM -0700

On 8/11/2019 4:42 PM, Chris M. Thomasson wrote:
> [...]

> Here is an old post where I mention it:

> http://boost.2283326.n4.nabble.com/lockfree-fifo-Review-td2648279i40.html

Remember firing that sucker up, right next to my computer with windows
installed, and got a Solaris gui viewable on the windows machine right
up running on the server after the Sun ALOM booted. I forgot the damn
program.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 12 09:13AM +0200

>> It is ridiculous to expect a compiler for an CPU-architecture which
>> supports unaligned operations not to support this.

> Sorry, I thought we were talking about programming in C++, not in assembly.

We're talking about specific compiler-implementations.

> Certainly some people expect it to work.
> The question is whether compiler writers expect it to work.

A compiler can't distinguish between unaligned and aligned pointers.
In both cases it generates the same code. If the code really runs
depends on the CPU and / or the OS.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 12 09:15AM +0200

>> And as I told the Oracle C/C++ compiler on solaris can do both.

> The link does not mention packed structs!

The link says that accesses to memory larger than a byte are
(dis)asssembled by smaller operations if you set the proper
compile-flag. So this also does apply for packed stucts.

Keith Thompson <kst-u@mib.org>: Aug 11 01:07PM -0700

Bart <bc@freeuk.com> writes:
[...]

> It doesn't consider the legality of such accesses, although the whole
> compiler does assume the target is byte-addressable (offsets of struct
> members are byte-offsets from the start).

And what happens if a program accesses a misaligned member?

#pragma pack(1)
struct foo {
char c;
int i;
} obj = { 0 };
printf("%d\n", obj.i);

If your compiler targets x86 and/or x86_64, such an access will
probably just work. On other targets, it might blow up -- and if
so, I would seriously question the wisdom of claiming to support
#pragma pack(1).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Aug 12 10:14AM +0100

On Sun, 11 Aug 2019 17:07:21 -0700
Keith Thompson <kst-u@mib.org> wrote:
[snip]
> And since x64 can access words at odd addresses, there's no problem.

> If you wanted to target your compiler to systems that don't support
> accessing words at odd addresses, you might have a problem.

Even with compiler systems that do support accessing words at odd
addresses, there is a potential problem in a multi-threaded program. The
guarantee that, in updating one scalar, the computer cannot alter an
adjacent scalar and then restore it (that is, it cannot interfere with
scalars in use by another thread) may no longer apply. The program may
no longer be thread safe.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 12 11:30AM +0200

> an adjacent scalar and then restore it (that is, it cannot interfere
> with scalars in use by another thread) may no longer apply. The program
> may no longer be thread safe.

That might also be implementation-dependent.

Bart <bc@freeuk.com>: Aug 12 10:34AM +0100

On 12/08/2019 01:07, Keith Thompson wrote:
>> and it ought to still work, and with the same sizes of structs, and the
>> same member offsets.

> Then there's no point in using pack(1), is there?

It isn't always equivalent. I also like to be aware of the layout of my
structs. I just don't like padding and prefer to make use of that space;
pack(1) provides an incentive to do that to avoid misalignment.

But I asked
> specifically about a case where it causes a member to be aligned
> differently than it would without the pack(1).

The example that you snipped showed how that could result in faster
performance, likely due to the significantly reduced memory
requirements, when the hardware complies.

> struct foo { char c; int i; }
> (assume int is 4 bytes), and #pragma pack(1) gives the struct a size of
> 5 bytes, then an array of struct foo will result in misaligned members.

Doesn't matter, as the i member is misaligned anyway. (Actually, around
every 4th array element, i will be aligned properly.)

>> (My own compilers target x64.
> [snip]

> And since x64 can access words at odd addresses, there's no problem.

It's something you want to keep in mind. As I said, usually you will
strive to keep elements aligned. And the x64 does need alignment in some
areas (the stack must be 64-bit aligned; the ABI requires it to be
128-bit aligned at call-points; and some instructions require strictly
128-bit aligned data).

> If you wanted to target your compiler to systems that don't support
> accessing words at odd addresses, you might have a problem.

Then I'll cross that bridge at the right time. Since at the moment, the
very common x64 and ARM64 architectures appear to allow misaligned
accesses for most data, then that's not something I'm likely to have to
worry about in my lifetime.

(I also concentrate on 64-bit targets. With mixed 32- or 64-bit, then
pack(1) introduces some problems, as sometimes T* is 32 bits, and
sometimes 64, screwing up your carefully crafted layout.)

Bonita Montero <Bonita.Montero@gmail.com>: Aug 12 01:03PM +0200

>> with scalars in use by another thread) may no longer apply. The program
>> may no longer be thread safe.

> That might also be implementation-dependent.

I was interested in that more specific and wrote a little test.
This test has a structure with two unaligned uint32_ts where one
of them optionally crosses a cacheline-boundary depending on a
macro. Both values are incremented by two threads and each thread
checks in each iteration if the currently read value is that it
was before. If it isn't there's something gonna happen you suppose.
So here's the code:

#include <iostream>
#include <cstddef>
#include <cstdint>
#include <thread>
#include <atomic>

using namespace std;

#define TEST_CACHELINE_BOUNDARIES

size_t const CACHELINE_SIZE = 64;

#pragma pack(1)
struct alignas(CACHELINE_SIZE) UnalignedStruct
{
#ifdef TEST_CACHELINE_BOUNDARIES
char fill[CACHELINE_SIZE - sizeof(uint32_t) - 1];
#else
char fill[1];

soft and program

Monday, August 12, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me