- Question on padding and alignment of `struct` and its members (olcott) - 10 Updates
- Ultimate Back Scratcher? - 1 Update
Pankaj Jangid <pankaj.jangid@gmail.com>: Sep 28 07:44AM +0530 On Sun, Sep 27 2020, olcott wrote: > internal processor operations. I am not sure of the speed cache fetch > compared to internal processor operations. Because of this the minimum > space approach may also be the maximum speed approach. I was thinking on exactly these lines after getting followup posts from other participants. Thanks for clarifying. -- Pankaj Jangid |
Juha Nieminen <nospam@thanks.invalid>: Sep 28 09:41AM > Certainly. But I have one more query in this regard. This clearly saves > memory. But is there an effect on speed as well? Ultimately, the > compiler aligns it; so I guess there is no effect on speed. As long as the elements don't cross word boundaries, there shouldn't be an effect on speed. (On x86 architectures reading a value that crosses word boundaries might incur a penalty, which may be several clock cycles long. On some other architectures it's not possible at all, as doing so would cause a CPU interrupt. If the compiler deliberately packed members in such a way that they cross word boundaries, it then needs to always add code that reads and writes these values in parts, in a manner that never crosses the word boundary, which of course incurs a speed penalty because of additional code being executed.) The struct taking less space might give a theoretical speed advantage in that being smaller means that it consumes less cache space (and thus more other data, including other instances of this same struct, will fit in the cache.) |
olcott <NoOne@NoWhere.com>: Sep 28 10:45AM -0500 On 9/28/2020 4:41 AM, Juha Nieminen wrote: > in that being smaller means that it consumes less cache space (and thus > more other data, including other instances of this same struct, will fit > in the cache.) A Cache hit with a smaller struct would be much faster then a cache miss even if the smaller struct requires extra bitwise operations to separate to the less than word sized fields. -- Copyright 2020 Pete Olcott |
scott@slp53.sl.home (Scott Lurndal): Sep 28 04:44PM >> internal processor operations. I am not sure of the speed cache fetch >> compared to internal processor operations. Because of this the minimum >> space approach may also be the maximum speed approach. A typical modern cache has a load-to-use latency of between three and four cycles for a hit in the L1I or L1D cache. With a sufficiently long pipeline and out-of-order execution, that latency will be filled with other operations. Latency to second level cache and last level caches can be up to 20ns, and to dram 65-100+ns. |
scott@slp53.sl.home (Scott Lurndal): Sep 28 04:50PM >> compiler aligns it; so I guess there is no effect on speed. >As long as the elements don't cross word boundaries, there shouldn't be >an effect on speed. Assuming natural alignment. >incur a penalty, which may be several clock cycles long. On some other >architectures it's not possible at all, as doing so would cause a CPU >interrupt. Even on x86, it might cause a page fault if the word straddles a page boundary (permission or not-present fault) orif the Alignment Flag is not set in the processor flags. >and writes these values in parts, in a manner that never crosses the >word boundary, which of course incurs a speed penalty because of additional >code being executed.) x86 (and ARM64 for compatability with ported x86 apps) processors are generally configured to support unaligned accesses (but both can be configured to fault on unaligned accesses). Depending on the processor, there may be a one-cycle penalty for barrel shifter. Unaligned accesses to uncachable memory or device memory on both X86 and ARM64 will fault. >in that being smaller means that it consumes less cache space (and thus >more other data, including other instances of this same struct, will fit >in the cache.) Indeed, and for large applications, that can make a noticable difference. |
olcott <NoOne@NoWhere.com>: Sep 28 12:05PM -0500 On 9/28/2020 11:44 AM, Scott Lurndal wrote: > latency will be filled with other operations. > Latency to second level cache and last level caches can be up to > 20ns, and to dram 65-100+ns. Thanks for the actual clock cycle numbers, I could not find these on a Google search. This seems to confirm my estimate that manually arranging the fields of a struct to minimize the need for padding would be a really good idea. Although compilers could be smart enough to do this, they must refrain just in case the order of the fields must correpond to disk storage. -- Copyright 2020 Pete Olcott |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Sep 28 11:07AM -0700 > and writes these values in parts, in a manner that never crosses the > word boundary, which of course incurs a speed penalty because of additional > code being executed.) It's not always possible for the compiler to know that it needs to do that. If you take the address of a misaligned member, code that deals with the resulting pointer doesn't know that it's misaligned. That's not much of an issue in x86, where misaligned accesses (usually?) just impose a speed penalty, but it could be a problem on architectures that impose strict alignment requirements. More information: https://stackoverflow.com/q/8568432/827263 https://stackoverflow.com/a/8568441/827263 -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
olcott <NoOne@NoWhere.com>: Sep 28 01:31PM -0500 On 9/28/2020 1:07 PM, Keith Thompson wrote: >> code being executed.) > It's not always possible for the compiler to know that it needs to do > that. All that I had to do to optimize the need for padding of the reference struct is simply sort the members in size order from largest to smallest. Certainly every compiler could compare the need for padding of the specified version with the sorted version and then know whether or not sorting reduces padding requirements. > If you take the address of a misaligned member, code that deals > with the resulting pointer doesn't know that it's misaligned. That's If we sort every member by largest to smallest size order then to provide perfectly aligned access to data members might at most require three bytes of padding at the end. The compiler can certainly be smart enough to know that it needs to do some bitwise operations to isolate the required field when two or more fields are loaded by one aligned memory access. The problem with the compiler doing this is when the struct must conform to its disk storage positions. I ran into this issue reccently when I needed both Windows and Linux to parse COFF object files correctly. -- Copyright 2020 Pete Olcott |
Vir Campestris <vir.campestris@invalid.invalid>: Sep 28 09:12PM +0100 On 28/09/2020 19:31, olcott wrote: > enough to know that it needs to do some bitwise operations to isolate > the required field when two or more fields are loaded by one aligned > memory access. There's no guarantee how much padding you'll get at the end if these objects are allocated individually on the heap, rather than in an array. There are all sorts of clever heap allocation routines. One common way is to round all requests up to a multiple of some smallish size, for example 16 bytes. Another common way is to have a series of pools for items of different sizes, and give you a block from the pool that has big enough blocks. These blocks can be quite large, so you might find your 1025-byte structure gets allocated from the 2048 byte pool. These pools help avoid fragmentation, but there's a cost. Finally although it's correct that ordering them largest-to-smallest might be most efficient in space terms the sanity of the programmer can also be important. Not all structures are used by performance critical code! Andy |
olcott <NoOne@NoWhere.com>: Sep 28 05:02PM -0500 On 9/28/2020 3:12 PM, Vir Campestris wrote: > also be important. Not all structures are used by performance critical > code! > Andy Yes that last point is most important. If the struct has less than six members order is not crucial for sufficiently readable code. If the struct has hundreds of members it is best to put related members together for much more readable code. -- Copyright 2020 Pete Olcott |
Ed.Vance@f10.n1.z32882.fidonet.org (Ed Vance): Sep 27 09:28PM +1200 Note to Moderator - I wrote this in the Survivor Echo and later thought that my idea could be of help to readers in other echos. Thank You for allowing My OFF TOPIC Post. * Originally in: Survivors * Originally on: 09-04-20 20:02 @MSGID: <5F52D60D.354.survivor@capitolcityonline.net> Howdy!, I think that a "Baby Bottle Brush" is the ultimate Back Scratcher I have tried using. Just thought I'd tell about my discovery to those who read this echo. The "Bottle Brush" I have, has a long handle of twisted wires. I can use it all over my back. As Jerry Clower tells about the feller up in the tree top with a Lynx who hsd tp hollor out to his friends on the ground: "Just shoot up here amongst us, One of us just has to have some relief". The Bottle Brush is a lot better than the plastic Back Scratcher that has the shape of a very small hand on the end of it that I won while trying to win one of the better prizes at a Game of Chance in the Arcade at a Amusement Park years ago. The long tiny handle broke when I used it many years ago and the pieces were thrown in the trash. I threw those pieces away long before any "Recycling Program" was thought of to recycle plastic to keep it out of the trash or getting in the waterways. I will keep using that Bottle Brush (or a replacement for it) as a Back Scratcher, until I learn of some other way to scratch what itches on my back that works better. It works for Me!, but I'm open to learn of any other instrument that anyone uses or knows about that does the job as well or better. Estecially if the cost for that other thing is very cheap. I don't know what a Bottle Brush replacement sells for today but if I need to buy another one the Scotchman part of me won't keep me from buying one if I need another one. 73 de Ed W9ODR . . ... Have you checked your smoke detector batteries & Fire Ext, LATELY?! |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment