Monday, September 28, 2020

Digest for comp.lang.c++@googlegroups.com - 11 updates in 2 topics

Pankaj Jangid <pankaj.jangid@gmail.com>: Sep 28 07:44AM +0530

On Sun, Sep 27 2020, olcott wrote:
 
> internal processor operations. I am not sure of the speed cache fetch
> compared to internal processor operations. Because of this the minimum
> space approach may also be the maximum speed approach.
 
I was thinking on exactly these lines after getting followup posts from
other participants. Thanks for clarifying.
--
Pankaj Jangid
Juha Nieminen <nospam@thanks.invalid>: Sep 28 09:41AM


> Certainly. But I have one more query in this regard. This clearly saves
> memory. But is there an effect on speed as well? Ultimately, the
> compiler aligns it; so I guess there is no effect on speed.
 
As long as the elements don't cross word boundaries, there shouldn't be
an effect on speed.
 
(On x86 architectures reading a value that crosses word boundaries might
incur a penalty, which may be several clock cycles long. On some other
architectures it's not possible at all, as doing so would cause a CPU
interrupt. If the compiler deliberately packed members in such a way that
they cross word boundaries, it then needs to always add code that reads
and writes these values in parts, in a manner that never crosses the
word boundary, which of course incurs a speed penalty because of additional
code being executed.)
 
The struct taking less space might give a theoretical speed advantage
in that being smaller means that it consumes less cache space (and thus
more other data, including other instances of this same struct, will fit
in the cache.)
olcott <NoOne@NoWhere.com>: Sep 28 10:45AM -0500

On 9/28/2020 4:41 AM, Juha Nieminen wrote:
> in that being smaller means that it consumes less cache space (and thus
> more other data, including other instances of this same struct, will fit
> in the cache.)
 
A Cache hit with a smaller struct would be much faster then a cache miss
even if the smaller struct requires extra bitwise operations to separate
to the less than word sized fields.
 
--
Copyright 2020 Pete Olcott
scott@slp53.sl.home (Scott Lurndal): Sep 28 04:44PM

>> internal processor operations. I am not sure of the speed cache fetch
>> compared to internal processor operations. Because of this the minimum
>> space approach may also be the maximum speed approach.
 
A typical modern cache has a load-to-use latency of between
three and four cycles for a hit in the L1I or L1D cache.
 
With a sufficiently long pipeline and out-of-order execution, that
latency will be filled with other operations.
 
Latency to second level cache and last level caches can be up to
20ns, and to dram 65-100+ns.
scott@slp53.sl.home (Scott Lurndal): Sep 28 04:50PM

>> compiler aligns it; so I guess there is no effect on speed.
 
>As long as the elements don't cross word boundaries, there shouldn't be
>an effect on speed.
 
Assuming natural alignment.
 
>incur a penalty, which may be several clock cycles long. On some other
>architectures it's not possible at all, as doing so would cause a CPU
>interrupt.
 
Even on x86, it might cause a page fault if the word straddles
a page boundary (permission or not-present fault) orif the
Alignment Flag is not set in the processor flags.
 
>and writes these values in parts, in a manner that never crosses the
>word boundary, which of course incurs a speed penalty because of additional
>code being executed.)
 
x86 (and ARM64 for compatability with ported x86 apps) processors are generally
configured to support unaligned accesses (but both can be configured to
fault on unaligned accesses). Depending on the processor,
there may be a one-cycle penalty for barrel shifter.
 
Unaligned accesses to uncachable memory or device memory on both X86 and ARM64
will fault.
 
>in that being smaller means that it consumes less cache space (and thus
>more other data, including other instances of this same struct, will fit
>in the cache.)
 
Indeed, and for large applications, that can make a noticable difference.
olcott <NoOne@NoWhere.com>: Sep 28 12:05PM -0500

On 9/28/2020 11:44 AM, Scott Lurndal wrote:
> latency will be filled with other operations.
 
> Latency to second level cache and last level caches can be up to
> 20ns, and to dram 65-100+ns.
 
Thanks for the actual clock cycle numbers, I could not find these on a
Google search. This seems to confirm my estimate that manually arranging
the fields of a struct to minimize the need for padding would be a
really good idea.
 
Although compilers could be smart enough to do this, they must refrain
just in case the order of the fields must correpond to disk storage.
 
--
Copyright 2020 Pete Olcott
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Sep 28 11:07AM -0700

> and writes these values in parts, in a manner that never crosses the
> word boundary, which of course incurs a speed penalty because of additional
> code being executed.)
 
It's not always possible for the compiler to know that it needs to do
that. If you take the address of a misaligned member, code that deals
with the resulting pointer doesn't know that it's misaligned. That's
not much of an issue in x86, where misaligned accesses (usually?) just
impose a speed penalty, but it could be a problem on architectures that
impose strict alignment requirements.
 
More information:
https://stackoverflow.com/q/8568432/827263
https://stackoverflow.com/a/8568441/827263
 
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
olcott <NoOne@NoWhere.com>: Sep 28 01:31PM -0500

On 9/28/2020 1:07 PM, Keith Thompson wrote:
>> code being executed.)
 
> It's not always possible for the compiler to know that it needs to do
> that.
 
All that I had to do to optimize the need for padding of the reference
struct is simply sort the members in size order from largest to
smallest. Certainly every compiler could compare the need for padding of
the specified version with the sorted version and then know whether or
not sorting reduces padding requirements.
 
> If you take the address of a misaligned member, code that deals
> with the resulting pointer doesn't know that it's misaligned. That's
 
If we sort every member by largest to smallest size order then to
provide perfectly aligned access to data members might at most require
three bytes of padding at the end. The compiler can certainly be smart
enough to know that it needs to do some bitwise operations to isolate
the required field when two or more fields are loaded by one aligned
memory access.
 
The problem with the compiler doing this is when the struct must conform
to its disk storage positions. I ran into this issue reccently when I
needed both Windows and Linux to parse COFF object files correctly.
 
 
--
Copyright 2020 Pete Olcott
Vir Campestris <vir.campestris@invalid.invalid>: Sep 28 09:12PM +0100

On 28/09/2020 19:31, olcott wrote:
> enough to know that it needs to do some bitwise operations to isolate
> the required field when two or more fields are loaded by one aligned
> memory access.
 
There's no guarantee how much padding you'll get at the end if these
objects are allocated individually on the heap, rather than in an array.
 
There are all sorts of clever heap allocation routines.
One common way is to round all requests up to a multiple of some
smallish size, for example 16 bytes.
 
Another common way is to have a series of pools for items of different
sizes, and give you a block from the pool that has big enough blocks.
These blocks can be quite large, so you might find your 1025-byte
structure gets allocated from the 2048 byte pool. These pools help avoid
fragmentation, but there's a cost.
 
Finally although it's correct that ordering them largest-to-smallest
might be most efficient in space terms the sanity of the programmer can
also be important. Not all structures are used by performance critical code!
 
Andy
olcott <NoOne@NoWhere.com>: Sep 28 05:02PM -0500

On 9/28/2020 3:12 PM, Vir Campestris wrote:
> also be important. Not all structures are used by performance critical
> code!
 
> Andy
 
Yes that last point is most important. If the struct has less than six
members order is not crucial for sufficiently readable code.
 
If the struct has hundreds of members it is best to put related members
together for much more readable code.
 
--
Copyright 2020 Pete Olcott
Ed.Vance@f10.n1.z32882.fidonet.org (Ed Vance): Sep 27 09:28PM +1200

Note to Moderator - I wrote this in the Survivor Echo and later thought that
my idea could be of help to readers in other echos.
Thank You for allowing My OFF TOPIC Post.
 
* Originally in: Survivors
* Originally on: 09-04-20 20:02
 
@MSGID: <5F52D60D.354.survivor@capitolcityonline.net>
Howdy!,
 
I think that a "Baby Bottle Brush" is the ultimate Back Scratcher I have
tried using.
 
Just thought I'd tell about my discovery to those who read this echo.
 
The "Bottle Brush" I have, has a long handle of twisted wires.
 
I can use it all over my back.
 
As Jerry Clower tells about the feller up in the tree top with a Lynx
who hsd tp hollor out to his friends on the ground:
 
"Just shoot up here amongst us, One of us just has to have some relief".
 
The Bottle Brush is a lot better than the plastic Back Scratcher that has
the shape of a very small hand on the end of it that I won while trying
to win one of the better prizes at a Game of Chance in the Arcade at a
Amusement Park years ago.
 
The long tiny handle broke when I used it many years ago and the pieces were
thrown in the trash.
 
I threw those pieces away long before any "Recycling Program" was thought of
to recycle plastic to keep it out of the trash or getting in the waterways.
 
I will keep using that Bottle Brush (or a replacement for it) as a Back
Scratcher, until I learn of some other way to scratch what itches on my back
that works better.
 
It works for Me!, but I'm open to learn of any other instrument that anyone
uses or knows about that does the job as well or better.
Estecially if the cost for that other thing is very cheap.
 
I don't know what a Bottle Brush replacement sells for today but if I need
to buy another one the Scotchman part of me won't keep me from buying one
if I need another one.
 
73 de Ed W9ODR . .
 
... Have you checked your smoke detector batteries & Fire Ext, LATELY?!
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: