Sunday, June 6, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

Richard Damon <Richard@Damon-Family.org>: Jun 05 08:03PM -0400

On 6/5/21 9:06 AM, Bonita Montero wrote:
>> indexes within a single array.
 
> That coudn't be true because you can cast any pointer-pair
> to char *, subtract them and use the difference for memcpy().
 
Actually it is true. Char (and its relatives) do have a few special
cases, as Any pointer is allowed to be converted into a pointer to char
and back and still be used. Also I believe that any object can be
treated as an array of char. So you can do the conversion to char and
subtract for two pointers in the same object.
 
It is still not defined if the pointers are to two separate objects. For
machines with segmented memory, this can be an issue.
David Brown <david.brown@hesbynett.no>: Jun 06 11:05AM +0200

On 05/06/2021 23:42, James Kuyper wrote:
 
> I'm not sure that the above argument covers every possibility, but I do
> believe that every possible way of getting a misaligned pointer is
> covered, in some fashion, by both standards.
 
The only major possibility I can think of that is missing above, is type
punning of various types (such as via a union in C, or by accessing the
pointer via a char pointer or memcpy). But I expect they too would have
to provide a valid pointer or the results would be undefined when the
pointer was used.
 
Thank you for the explanation.
David Brown <david.brown@hesbynett.no>: Jun 06 11:24AM +0200

On 06/06/2021 01:08, James Kuyper wrote:
> Apologies to David, who has already received two versions of this
> message as e-mail, because I keep hitting the Thunderbird "Reply" button
> instead of their new "Followup" button.
 
No problem. It makes me feel special :-)
 
> required to be contiguous. If two such objects are both sub-objects of
> the same larger object, the difference between those pointers satisfies
> that requirement, otherwise the subtraction is undefined.
 
When you do that, however, you are not subtracting the original pointers
- you are subtracting two char* pointers. And that subtraction works as
the difference of their indexes into an array (of char type).
 
So if you have:
 
struct A {
int x;
iny y;
};
 
A a;
 
the expression "&a.x - &a.y" is not defined by the C or C++ standards.
You have to cast the pointers to character type, and then you are
subtracting two pointers that are part of the same array, rather than
subtracting pointers to two ints in a structure.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 06 10:33AM +0100

On Sat, 5 Jun 2021 19:08:49 -0400
> required to be contiguous. If two such objects are both sub-objects of
> the same larger object, the difference between those pointers satisfies
> that requirement, otherwise the subtraction is undefined.
 
Assuming C++20, on my (possibly faulty) reading you could _not_ access
any given object as an array of char, but you could access it as an
array of unsigned char or std::byte. This is because you can construct
any object in an array of unsigned char or of std::byte, so that even
where some other storage for a particular object has in fact been
provided, such an array will implicitly arise in consequence of the
carrying out of any pointer arithmetic which prods around in the
object's internals.
 
So although the strict aliasing rule is not offended by the use of
char* for this purpose, I suspect the rules on pointer arithmetic are.
Richard Damon <Richard@Damon-Family.org>: Jun 06 07:56AM -0400

On 6/6/21 5:33 AM, Chris Vine wrote:
> object's internals.
 
> So although the strict aliasing rule is not offended by the use of
> char* for this purpose, I suspect the rules on pointer arithmetic are.
 
I haven't poured over the standard recently, but my memory was that the
type family char / signed char / unsigned char all had this property,
but if you actually accessed the values, signed char (and char is
signed) had the possibility of a trap value (-0).
 
unsigned char had the natural property that its values were precisely
defined by the standard, but any character type could be used, if only
because of ancient code that used char for this purpose.
MrSpook_ie@q67rq6_2ly9ut44j.org: Jun 06 01:55PM

On Sat, 5 Jun 2021 18:16:52 +0200
 
>The discussion is about behaviour that is not defined by the C and C++
>standards. If you can point to /documentation/ that says clang on x86
>MacOS defines the behaviour of unaligned access, that would be
 
Probably isn't any. I suspect most compilers just create the machine code and
the rest is up to the CPU.
MrSpook_zn7@4tzq7j92.gov: Jun 06 01:59PM

On Sat, 5 Jun 2021 19:50:23 +0200
 
>00007FF71990106D mov rcx,rsi
>00007FF719901070 mov edx,0FFFFFFFFh
>00007FF719901075 call printf (07FF719901090h)
 
Nope, guess the compiler didn't notice either:
 
movq -40(%rbp), %rax ## 8-byte Reload
movq %rax, -24(%rbp)
movq -24(%rbp), %rax
addq $1, %rax
movq %rax, -32(%rbp)
movq -24(%rbp), %rsi
movq -32(%rbp), %rdx
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
movq -32(%rbp), %rcx
movl $-1, (%rcx)
movq -32(%rbp), %rcx
movl (%rcx), %esi
leaq L_.str.1(%rip), %rdi
movl %eax, -44(%rbp) ## 4-byte Spill
movb $0, %al
callq _printf
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Jun 06 03:00PM +0100

On Sun, 6 Jun 2021 07:56:07 -0400
> On 6/6/21 5:33 AM, Chris Vine wrote:
> > On Sat, 5 Jun 2021 19:08:49 -0400
> > James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
[snip]
 
> unsigned char had the natural property that its values were precisely
> defined by the standard, but any character type could be used, if only
> because of ancient code that used char for this purpose.
 
I wasn't thinking of the point concerning the assigning of
indeterminate values into narrow character types (permitted for
unsigned but not for signed), although I imagine that may come into
play.
 
The issue I was referring was that (i) new objects can be constructed
(either by placement new or as implicit-lifetime types) in an array of
unsigned char or array of std::byte if properly aligned
([intro.object]/3 and /4), (ii) iterating by pointer arithmetic can only
be carried out in respect of arrays ([expr.add]/4), (iii) in C++20
arrays (but not necessarily their elements) are implicit-lifetime
types, (iv) implicit-lifetime types arise spontaneously where necessary
to obtain defined behaviour, and (v) accordingly you can iterate over
an object as if it were stored in an array of unsigned char or
std::byte even if it isn't[1].
 
I imagine that in practice you can construct an object in an array of
char on any compilers in widespread use but it is not formally supported
by the standard. I don't know why there is that restriction. It cannot
be entirely down to indeterminate values, because you can put an
indeterminate value in a char object if the char type is in fact
unsigned, but that is not true of constructing an object in an array of
char using placement new.
 
This is as I understand it. But my understanding may not be complete.
 
[1]: It is this that enables you to implement your own version of
std::memcpy() in standard C++20. That was impossible in C++17.
David Brown <david.brown@hesbynett.no>: Jun 06 05:34PM +0200

>> MacOS defines the behaviour of unaligned access, that would be
 
> Probably isn't any. I suspect most compilers just create the machine code and
> the rest is up to the CPU.
 
In the majority of cases, that is correct.
 
The fun comes when the compiler can see that if it assumes there is no
undefined behaviour, it can make significant efficiency improvements -
then it might well do so. (Optimisations vary by compiler, flags, etc.)
Remember, "undefined behaviour" means compiler might generate code that
does what you might expect as the "natural" behaviour on the cpu in
question - it also means it might assume it doesn't happen and then you
get weird effects if you break the rules.
 
That's why "it worked when I tested it" is not something you should rely
on for undefined behaviour - you might have got lucky, or maybe things
will change with the next compiler version.
Paavo Helde <myfirstname@osa.pri.ee>: Jun 06 06:48PM +0300

05.06.2021 16:06 Bonita Montero kirjutas:
>> indexes within a single array.
 
> That coudn't be true because you can cast any pointer-pair
> to char *, subtract them and use the difference for memcpy().
 
This holds only for linear memory model. While this is a dominant memory
model nowadays, the C++ language is old enough to take also other memory
models (like segmented ones) into account. In segmented memory models,
pointer arithmetic only works in a single segment, and accordingly the
arrays are limited to a single segment. There is no such limitation for
struct members.
 
As an example, with Intel 386 you could have a 16-bit program working
simultaneously with at least 4 different 64 kB segments, which might
have been fully separate in the physical memory. Good luck with forming
a difference of pointers in a 16-bit size_t variable when the segments
are more than 64kB separate in the physical memory!
Richard Damon <Richard@Damon-Family.org>: Jun 06 02:26PM -0400

On 6/6/21 11:48 AM, Paavo Helde wrote:
> have been fully separate in the physical memory. Good luck with forming
> a difference of pointers in a 16-bit size_t variable when the segments
> are more than 64kB separate in the physical memory!
 
Actually, unless the pointers where specially declared, taking the
difference ignored the segment part of the pointers and only subtracted
the offsets, so if the pointers were to things in different segments,
the difference was largely meaningless.
Vir Campestris <vir.campestris@invalid.invalid>: Jun 06 09:27PM +0100

On 05/06/2021 11:53, Bonita Montero wrote:
> I think that the language-standard assumes equally aligned data
> among same types for the above code and gcc is corcect, but I'm
> not sure.
 
I've gone through the whole thread, and nobody else has commented.
 
That GCC disassembly can't possibly be a correct and complete rendition
of the function.
 
Maybe on entry begin is in eax, and end in rdi, which would explain the
second instruction. But what's it doing with rsi?
 
Andy
Bonita Montero <Bonita.Montero@gmail.com>: Jun 06 07:21PM +0200

Consider the following code:
 
#include <vector>
#include <memory>
 
using namespace std;
 
template<typename Alloc>
void f( Alloc const &alloc )
{
using IAlloc = typename allocator_traits<Alloc>::template
rebind_alloc<int>;
vector<int, IAlloc> viA( IAlloc( alloc ) ); // doesn't work
vector<int, IAlloc> viB( IAlloc( const_cast<Alloc const &>(alloc) ) );
};
 
 
int main()
{
f( allocator<char>() );
}
 
Why does the first vector-definition not work (at least with MSVC and
clang 11).
Bonita Montero <Bonita.Montero@gmail.com>: Jun 06 07:24PM +0200

Am 06.06.2021 um 19:21 schrieb Bonita Montero:
>     }
 
> Why does the first vector-definition not work (at least with MSVC and
> clang 11).
 
g++, even an older version, does compile it.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 06 07:29PM +0200

> Why does the first vector-definition not work (at least with MSVC and
> clang 11).
 
Ok, found it myself: it seems to be a function-definition.
But why is it parsed as a function-definition although IAlloc( alloc )
isn't a valid parameter-definition ?
Paavo Helde <myfirstname@osa.pri.ee>: Jun 06 08:59PM +0300

06.06.2021 20:29 Bonita Montero kirjutas:
 
> Ok, found it myself: it seems to be a function-definition.
> But why is it parsed as a function-definition although IAlloc( alloc )
> isn't a valid parameter-definition ?
 
IAlloc(alloc) is parsed as 'IAlloc alloc' and is a valid parameter
definition. Here is a simplified version of your line:
 
vector<int, IAlloc> viA(int(alloc));
 
which is also parsed as a function declaration:
 
vector<int, IAlloc> viA(int alloc);
 
To get it parsed as a variable definition add some parens:
 
vector<int, IAlloc> viA((IAlloc(alloc))); // does work
 
I agree these parsing rules are obnoxious and should be changed somehow.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 05 05:50PM -0700

> intmax_t doesn't let you print anything that you can't print with "long
> long" in any existing implementations, and is seems unlikely to do so in
> the future.
 
long long is the widest *standard* signed integer type (though I can
imagine wanting to future-proof my code by not assuming that a future
standard won't add long long long). The whole point of intmax_t is that
long long might not be the widest signed integer type. The most likely
scenario is that a compiler makes long long 64 bits and provides a
128-bit extended integer type.
 
That hasn't happened, and we've already discussed the reasons for that.
I don't want to assume that a solution won't be found that allows for
extended integer types wider than long long. Maybe some new popular
platform won't have to conform to existing ABIs.
 
Using intmax_t rather than long long costs me nothing.
 
> "long long" then "intmax_t" could have been useful in such cases. It
> could be seen as a good reason for introducing intmax_t in the first
> place. The reality, however, is different.
 
So far.
 
And I'm not predicting that that won't change. I'm just not assuming
that it won't.
 
> because it is a variadic function with types unknown at declaration and
> compile time (of the printf implementation). For C++, an implementation
> can easily add an overload for << with whatever types the compiler supports.
 
C++ has all of C's standard library (with a few tweaks here and there).
printf is a standard C++ function. So is imaxabs.
 
> you use the <stdint.h> types in general, and possibly compiler-specific
> extensions like __int128. You don't use intmax_t. (At least, /I/ can't
> see how it would be helpful here.)
 
Who says the code has to be compiler-specific?
 
You can use long long for compilers that don't have 128-bit integers, or
__int128 for compilers that do. Or you can use intmax_t.
 
And yes, the fact that gcc's __int128 is not an extended integer type is
a barrier, but you can still write portable future-proof C code that
will use the widest fully supported integer type. I can see that being
useful.
 
[...]
 
> I might have been unclear there - I meant "intmax_t" is a useless type.
> __int128 is useful as it is (albeit not often useful). Adding abs,
> div, etc., support to __int128 would not make it any more useful.
 
I disagree. intmax_t has not turned out to be a useful as it was
intended to be, but it's hardly useless.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jun 05 10:02PM -0700

> On 04/06/2021 02:36, Scott Lurndal wrote:
[...]
> determined by the library implementation and not the standards or the
> compiler).
 
> <https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html>
 
This could be done if a future C standard specified the representation
of type div_t (as it does for complex types).
 
>> there doesn't seem much need for div at all.
 
> Exactly my point - as far as I can tell, "div" is a hangover from weak
> or limited compilers from long ago.
 
But existing code that uses "div" could still run faster if compilers
were able to inline it.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
David Brown <david.brown@hesbynett.no>: Jun 06 11:10AM +0200

On 06/06/2021 02:50, Keith Thompson wrote:
 
<snipping for brevity - as usual, you make good points, but I don't
think I have much to add>
 
> I disagree. intmax_t has not turned out to be a useful as it was
> intended to be, but it's hardly useless.
 
Fair enough. You've convinced me that it has some future-proofing uses,
even if such future-proofing may never be needed in practice.
(Predictions about the future are always hard.) From my own viewpoint,
it looks like the unintended consequences outweigh the usefulness, but
my viewpoint is not the only one!
David Brown <david.brown@hesbynett.no>: Jun 06 11:17AM +0200

On 06/06/2021 07:02, Keith Thompson wrote:
 
>> <https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html>
 
> This could be done if a future C standard specified the representation
> of type div_t (as it does for complex types).
 
One of the suggested ways for handling this situation for gcc is, in
fact, to use a complex type as a container for the div_t results. It
would use the gcc extension of complex integer types (Gaussian integers,
I suppose), precisely because it is a suitable pair of numbers whose
structure is known to the compiler. This is just one of these odd
things in compilers that seems simple from the outside, but has subtle
complications in practice.
 
>> or limited compilers from long ago.
 
> But existing code that uses "div" could still run faster if compilers
> were able to inline it.
 
Indeed.
"Öö Tiib" <ootiib@hot.ee>: Jun 06 04:07AM -0700

On Sunday, 6 June 2021 at 12:10:59 UTC+3, David Brown wrote:
> (Predictions about the future are always hard.) From my own viewpoint,
> it looks like the unintended consequences outweigh the usefulness, but
> my viewpoint is not the only one!
 
But it is very interesting topic. I think that future goes towards more
enforced safety like Rust is doing on one hand and more flexibility
about properties (like bit width of integers) on other hand. So adding
or subtracting two int_t<8> will simply result with int_t<9> and
intmax_t stops making sense whatsoever since int_t<512> takes one
cache line and int_t<1024> two, but otherwise no difference from
int_t<9>.
Richard Damon <Richard@Damon-Family.org>: Jun 06 07:49AM -0400

On 6/6/21 5:10 AM, David Brown wrote:
> (Predictions about the future are always hard.) From my own viewpoint,
> it looks like the unintended consequences outweigh the usefulness, but
> my viewpoint is not the only one!
 
I think the issue that wasn't anticipated was the longevity of an ABI
that used the type intmax_t.
 
intmax_t makes it easier to move a program from one ABI to another with
different sizes of integer types. It unfortunately locks in the biggest
type that a given ABI will handle.
MrSpook_Qfmtjdmi@nc6.com: Jun 06 02:11PM

On Sat, 05 Jun 2021 22:02:22 -0700
 
>> <https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html>
 
>This could be done if a future C standard specified the representation
>of type div_t (as it does for complex types).
 
I'd be surprised if there are any future C standards. After all, whats the
point? There can't be many C programmers left that can't also program in C++
and if you need more than C can provide, just use C++ unless its a very limited
embedded system in which case probably even the version of C it uses is limited
anyway. Also the days of C having a performance advantage are pretty much gone.
 
One think I'd have liked in C in the past is some form of lambda syntax and
while both clang and gcc created their own kinda, sorta versions as code blocks
and nested functions respectively they're unfortunately incompatible. Shame the
teams couldn't have got together and come up with a single solution - they must
have some cross pollination and don't work in silos unaware of whats going on
elsewhere.
David Brown <david.brown@hesbynett.no>: Jun 06 05:46PM +0200


>> This could be done if a future C standard specified the representation
>> of type div_t (as it does for complex types).
 
> I'd be surprised if there are any future C standards.
 
The next version after C17 is likely to be C21 or C22. (It is currently
known as C2x.) You can get a draft here:
 
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf>
 
There aren't any big changes so far, but some nice tidying up of
obsolescent features (K&R function declarations are finally gone, as are
signed integers that are not two's complement). And there are plenty of
proposals being considered for future standards.
 
> and if you need more than C can provide, just use C++ unless its a very limited
> embedded system in which case probably even the version of C it uses is limited
> anyway. Also the days of C having a performance advantage are pretty much gone.
 
Certainly there is little point in adding a lot to the C language - the
language's stability and backwards compatibility are its main benefit.
But small improvements make sense, as do corrections to the standard,
and it could be useful to standardise some of the common extensions to C
or to import a few things from C++.
 
Manfred <noname@add.invalid>: Jun 06 06:39PM +0200

On 6/6/2021 11:17 AM, David Brown wrote:
> structure is known to the compiler. This is just one of these odd
> things in compilers that seems simple from the outside, but has subtle
> complications in practice.
 
This suggestion might be justified by the current status of gcc code,
however obviously div_t results don't seem ti have much in common with
complex numbers.
As Keith said, having the layout of div_t standardized could be an
option, modulo the politics required to get consensus in the committee -
in which btw the cc folks are fairly well represented, I believe.
However, I don't think this is strictly required - in fact the compiler
already /knows/ the layout of div_t from the headers it parses, so it
should be possible to place the appropriate offsets in the generated code.
This might not be straightforward though, or anyway enough of a burden
to handle compared to the demand for it - admittedly low.
 
 
>> But existing code that uses "div" could still run faster if compilers
>> were able to inline it.
 
> Indeed.
 
And not really a hangover from past limitations. I'd say more of a
follow up on the tradition of C to have popular ASM features bubble up
as language features (think e.g. shift operators).
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: