Monday, June 7, 2021

Digest for comp.lang.c++@googlegroups.com - 19 updates in 2 topics

Richard Damon <Richard@Damon-Family.org>: Jun 06 09:30PM -0400

On 6/6/21 7:10 PM, Keith Thompson wrote:
>> have been fully separate in the physical memory. Good luck with
>> forming a difference of pointers in a 16-bit size_t variable when the
>> segments are more than 64kB separate in the physical memory!
 
Actually, one reason offsetof is a Standard Macro so it can be made to
use implementation dependent tricks to make it work here.
 
The implementation has enough information to handle a bigger than a
segment structure if it wanted to. It might only be able to make it work
easily for 'real' mode where segments offset from the current segment
can just be computed, or it might need special support from the OS to
make multiple overlapping segments to build a net address space bigger
than one segment long.
 
If you reference an member of the structure whose offset is big enough
that it won't fit in the first segment of the structure, the
implementation just needs to make a segment offset from the start of the
object that does hold that full member.
 
I will say that I never heard of a compiler that did that for a
structure, there were implementation that did that for specified arrays,
but pointers to elements of that array had a non-standard type to keep
track of the fact that segment updates might be needed for pointer
arithmetic. These weree __huge__ arrays, with __huge__ pointers.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 06 06:49PM -0700

> but pointers to elements of that array had a non-standard type to keep
> track of the fact that segment updates might be needed for pointer
> arithmetic. These weree __huge__ arrays, with __huge__ pointers.
 
Sure, and it can do the same thing for arrays.
 
Both array objects and struct objects has to *act like* they occupy a
contiguous range of memory addresses. If the implementation has to play
some tricks to make it act that way, that's fine. And if it restricts
object to a single segment, that's fine too (as long as the segment size
is big enough -- 65535 bytes for hosted implementations, C99 and later).
 
And if an implementation plays tricks for arrays but not for structures,
that's probably fine too. The standard doesn't mention segments.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 06 06:51PM -0700

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]
> some tricks to make it act that way, that's fine. And if it restricts
> object to a single segment, that's fine too (as long as the segment size
> is big enough -- 65535 bytes for hosted implementations, C99 and later).
 
Or whatever the corresponding limits are for C++, of course. *sigh*
 
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Richard Damon <Richard@Damon-Family.org>: Jun 06 10:57PM -0400

On 6/6/21 9:49 PM, Keith Thompson wrote:
> is big enough -- 65535 bytes for hosted implementations, C99 and later).
 
> And if an implementation plays tricks for arrays but not for structures,
> that's probably fine too. The standard doesn't mention segments.
 
The problem with doing it for arrays is that either you need to do it
for ALL pointers, or you need to make big array use a special syntax.
 
The key thing for structs is that the pointer to the big structure is
already a special type, so is easy to recognize without cost to other code.
 
In essence, you can treat a BigStruct* pointer special when you apply
the member selection operation to it.
 
Given a int* pointer into a big array, you don't know if it IS a big
array, unless you pessimistically do for all, or make big arrays an
extension that creates a special type of pointer that needs a
non-standard type definition (like __huge__)
"Öö Tiib" <ootiib@hot.ee>: Jun 06 11:17PM -0700

On Monday, 7 June 2021 at 02:10:30 UTC+3, Keith Thompson wrote:
> > such limitation for struct members.
 
> I believe there is. Without that limitation, the offsetof macro
> wouldn't work.
 
In C++ offsetof is required only to work on standard layout types (UB
otherwise). So (either there is some other constraint we haven't
thought about or) only standard layout classes and arrays have to
be limited to work in single segment (when memory is segmented).
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 07 12:27AM -0700

> arithmetic on that integer to produce a different integer value, and
> converting back again, has undefined behavior due to the omission of
> any explicit definition of the behavior.
 
Converting an arbitrary integer to a pointer is implementation-defined
behavior, not undefined behavior.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 07 12:44AM -0700

> requirements do not hold for other types.
> which means that plain char and signed char have no trap representations
> and no padding bits.
 
The passage quoted is about _un_signed narrow character types.
It doesn't apply to signed char or to plain char of the signed
variety.
 
An earlier sentence (in the C++17 standard) says this
 
For narrow character types, all bits of the object
representation participate in the value representation.
 
which does rule out padding bits, but it still admits the
possibility of there being a trap representation.
scott@slp53.sl.home (Scott Lurndal): Jun 07 02:52PM


>I've gone through the whole thread, and nobody else has commented.
 
>That GCC disassembly can't possibly be a correct and complete rendition
>of the function.
 
The first input parameter is passed in %rdi, the second in %rsi and
the result is returned in %rax. That's the normal ABI on linux.
scott@slp53.sl.home (Scott Lurndal): Jun 07 02:53PM


>One can tell gcc (or at least g++) to use the less noisy and more
>conventional, in short more reasonable, Intel syntax via option
>`-masm=intel`, unless I recall the details of that incorrectly.
 
Purely a matter of opinion. Probably depends on what world you
came from - Unix people prefer the AT&T syntax, DOS/Windows people
prefer the Intel syntax.
 
Personally, I detest the Intel syntax with passion.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 07 06:02PM +0200

> Personally, I detest the Intel syntax with passion.
 
That's a matter of habituation.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 07 09:19AM -0700

> memory models, pointer arithmetic only works in a single segment, and
> accordingly the arrays are limited to a single segment. There is no
> such limitation for struct members.
 
In C there is, because any object can be treated as an array
of character type, and that includes structs.
 
In C++ the rules are so complicated that no one knows whether
that reasoning applies, so the safest course of action is not
to use C++ on machines that use segmentation.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 07 09:43AM -0700

> big array, unless you pessimistically do for all, or make big
> arrays an extension that creates a special type of pointer that
> needs a non-standard type definition (like __huge__)
 
In C a pointer to any object can be converted to unsigned char *
which then can be used as though it were pointing into a
character array whose extent covers the entire object.
 
Do you have some reason to believe that property does not apply
to C++?
"Öö Tiib" <ootiib@hot.ee>: Jun 07 09:51AM -0700

On Monday, 7 June 2021 at 19:44:12 UTC+3, Tim Rentsch wrote:
> character array whose extent covers the entire object.
 
> Do you have some reason to believe that property does not apply
> to C++?
 
Yes in C++ it is so only about pointers to objects of trivially copyable
or standard-layout types as only those are required to occupy
contiguous bytes of storage.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Jun 07 10:13AM -0700

On Monday, June 7, 2021 at 12:44:12 PM UTC-4, Tim Rentsch wrote:
...
> character array whose extent covers the entire object.
 
> Do you have some reason to believe that property does not apply
> to C++?
 
"An object of trivially copyable or standard-layout type (6.8) shall occupy contiguous bytes of storage." (6.7.2p8.4). Which implies that any type that is neither trivially copyable nor a standard-layout type need not occupy contiguous bytes of storage. C's requirement (6.2.6.1p2) has no such exceptions.
Paavo Helde <myfirstname@osa.pri.ee>: Jun 07 09:58PM +0300

07.06.2021 19:19 Tim Rentsch kirjutas:
 
> In C++ the rules are so complicated that no one knows whether
> that reasoning applies, so the safest course of action is not
> to use C++ on machines that use segmentation.
 
I'm pretty sure that any C++ implementations on segmented architectures
would require also structs and classes to fit in a single segment, it's
much easier that way.
 
Anyway, this is not really important because Bonita talked about "any
pair of pointers", no mention of structs was made.
 
When you take one pointer from say DS segment and the other from ES
segment, then calculating their difference might not have any meaning
for the program, not to speak about fitting this difference into a
size_t variable or copying this memory range somewhere. At best you
could copy a tail of the DS segment and the head of the ES segment
somewhere, but why not vice versa?
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 07 12:13PM -0700

> representation participate in the value representation.
 
> which does rule out padding bits, but it still admits the
> possibility of there being a trap representation.
 
True, but C++17 has this rather odd wording later in the same paragraph:
 
For each value i of type unsigned char in the range 0 to 255
inclusive, there exists a value j of type char such that the result
of an integral conversion (7.8) from i to char is j, and the result
of an integral conversion from j to unsigned char is i.
 
I believe this implies that there must be 256 distinct values of type
signed char (and plain char if it's signed), disallowing treating -0 or
-128 as a trap representation.
 
What's odd about it is that it uses the value 255, so it wouldn't apply
that same requirement if CHAR_BIT > 8.
 
C++20 (at least in the draft I have) requires 2**CHAR_BIT distinct
values for the narrow character types (char, unsigned char, signed char,
and char8_t).
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Bo Persson <bo@bo-persson.se>: Jun 07 09:58PM +0200

On 2021-06-07 at 21:13, Keith Thompson wrote:
> -128 as a trap representation.
 
> What's odd about it is that it uses the value 255, so it wouldn't apply
> that same requirement if CHAR_BIT > 8.
 
Not so odd really. On the CHAR_BIT == 9 system that I once used (Univac,
and for C not C++), the char size was *not* chosen to get additional
character values, but because the hardware had part-word operations that
let you extract any quarter of the 36-bit word.
 
That made using four 9-bit characters per word a lot more efficient than
four-and-a-half 8-bit characters.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 06 04:33PM -0700

> structure is known to the compiler. This is just one of these odd
> things in compilers that seems simple from the outside, but has subtle
> complications in practice.
 
Sure, but the type div_t is still defined in the <stdlib.h> or <cstdlib>
header, and gcc has to work with arbitrary library implementations.
 
The C standard's requirements for the representation of complex types:
 
Each complex type has the same representation and alignment
requirements as an array type containing exactly two elements of the
corresponding real type; the first element is equal to the real
part, and the second element to the imaginary part, of the complex
number.
 
could easily be reworked to specify the representation of div_t
(and ldiv_t, and lldvi_t, and intmaxdiv_t). I strongly suspect
that all existing implementations already define the *div_t types
consistently, with quot at offset 0 and rem following it. If the
layout were specified, gcc could optimize calls to the *div()
functions (assuming it knows that the library that will be used
is conforming).
 
[...]
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
David Brown <david.brown@hesbynett.no>: Jun 07 08:41AM +0200

On 07/06/2021 01:33, Keith Thompson wrote:
>> complications in practice.
 
> Sure, but the type div_t is still defined in the <stdlib.h> or <cstdlib>
> header, and gcc has to work with arbitrary library implementations.
 
I must admit I did not pay much attention to the details of the gcc
developers' comments there. It is not something I know a lot about, nor
something that affects me directly (I use the operators) - it is just
something that I think is interesting and that might be important to
others. I'm cc'ed on a bug in the gcc bugzilla for it, so if there is
more progress, I can report back.
 
> layout were specified, gcc could optimize calls to the *div()
> functions (assuming it knows that the library that will be used
> is conforming).
 
Certainly that would be the most convenient solution, and I can't see
how adding that to the standard could cause trouble for existing code.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: