Saturday, June 5, 2021

Digest for comp.lang.c++@googlegroups.com - 20 updates in 2 topics

Bonita Montero <Bonita.Montero@gmail.com>: Jun 05 12:53PM +0200

Consider the following code:
 
 
size_t s( int *begin, int *end )
{
return (end - begin) * sizeof(int);
}
 
This is MSVC's disassembly:
 
sub rdx, rcx
and rdx, -4
mov rax, rdx
ret 0
 
This is gcc's disassembly:
 
movq %rsi, %rax
subq %rdi, %rax
ret
 
It the masking really necessary or just an optimizer weakness;
it seems to me MSVC sees that two bits are shifted out and "in"
and replaces this with a mask.
I think that the language-standard assumes equally aligned data
among same types for the above code and gcc is corcect, but I'm
not sure.
David Brown <david.brown@hesbynett.no>: Jun 05 02:19PM +0200

On 05/06/2021 12:53, Bonita Montero wrote:
> I think that the language-standard assumes equally aligned data
> among same types for the above code and gcc is corcect, but I'm
> not sure.
 
I can't help thinking that putting a non-aligned value into an int
pointer is undefined behaviour - but I can't find a reference to that in
the C standards (and I don't know the C++ standards well enough to look
there). However, when you subtract two pointers, the behaviour is only
defined if they both point to elements inside the same array (or one
past the end). It doesn't matter how they are aligned, as the offset
from the standard "int" alignment will be the same for both pointers.
(This applies also to 8-bit systems where you might have 16-bit int but
8-bit alignment for int.) Thus gcc's code is fine.
 
MSVC has a tradition of being more conservative in its optimisations,
while gcc has a tradition of expecting people to write valid code and
optimise on the assumption that undefined behaviour does not occur.
This is, I think, because MSDOS and Windows programmers have a tradition
of assuming their code runs on one platform and one compiler, and "it
worked when I tested it" means "it is correct". (Your own misstatements
about undefined behaviour have demonstrated that.) gcc users are more
likely to understand that their code could be used on different
platforms and different processors, and pay a bit more attention to the
rules of the language.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 05 02:27PM +0200

> look there). However, when you subtract two pointers, the behaviour
> is only defined if they both point to elements inside the same array
> (or one past the end). ...
 
And in a struct with two ints ?
 
> MSVC has a tradition of being more conservative in its optimisations,
> ...
 
MSVC is not more conservative, but more stupid because it lacks many
safe optimizations.
David Brown <david.brown@hesbynett.no>: Jun 05 02:33PM +0200

On 05/06/2021 14:27, Bonita Montero wrote:
>> is only defined if they both point to elements inside the same array
>> (or one past the end). ...
 
> And in a struct with two ints ?
 
No. Subtraction of pointers is defined as the difference in their
indexes within a single array.
 
>> ...
 
> MSVC is not more conservative, but more stupid because it lacks many
> safe optimizations.
 
I was not being judgemental about what is a good or bad implementation.
I personally prefer gcc's philosophy, but I know other people have
preferences that are somewhere in between. (You have posted in the past
about your beliefs about how compilers handle some kinds of undefined
behaviour - and shown why there is a market for such conservative
compilers.)
"Öö Tiib" <ootiib@hot.ee>: Jun 05 05:38AM -0700

On Saturday, 5 June 2021 at 15:20:07 UTC+3, David Brown wrote:
> pointer is undefined behaviour - but I can't find a reference to that in
> the C standards (and I don't know the C++ standards well enough to look
> there).
 
The C and C++ programs that use unaligned pointers are undefined (in
sense of standard) regardless of the target architecture (that may allow
unaligned accesses) but the implementations can extend.
 
> from the standard "int" alignment will be the same for both pointers.
> (This applies also to 8-bit systems where you might have 16-bit int but
> 8-bit alignment for int.) Thus gcc's code is fine.
 
Also MS code is fine as that masking should not have any ill effects
to conforming code.
 
> likely to understand that their code could be used on different
> platforms and different processors, and pay a bit more attention to the
> rules of the language.
 
I think that most important requirement of MSVC is that it should build
good binaries out of Microsoft's own code base regardless how tricky
that code base is. The gcc as whole does not have that sort of obligations.
Perhaps some people working on gcc code base have but they are from
wide variety of companies.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 05 03:06PM +0200

> No. Subtraction of pointers is defined as the difference in their
> indexes within a single array.
 
That coudn't be true because you can cast any pointer-pair
to char *, subtract them and use the difference for memcpy().
David Brown <david.brown@hesbynett.no>: Jun 05 03:29PM +0200

On 05/06/2021 15:06, Bonita Montero wrote:
>> indexes within a single array.
 
> That coudn't be true because you can cast any pointer-pair
> to char *, subtract them and use the difference for memcpy().
 
Look it up.
 
You can do lots of things in C and C++ that are syntactically correct,
but might have undefined behaviour.
"Öö Tiib" <ootiib@hot.ee>: Jun 05 06:34AM -0700

On Saturday, 5 June 2021 at 16:06:41 UTC+3, Bonita Montero wrote:
> > indexes within a single array.
> That coudn't be true because you can cast any pointer-pair
> to char *, subtract them and use the difference for memcpy().
 
So it couldn't be true that standard specifies it so:
| If the expressions P and Q point to, respectively, elements
| x[i] and x[j] of the same array object x, the expression P - Q has
| the value i − j; otherwise, the behavior is undefined.
 
I don't understand what supposedly stops it?
David Brown <david.brown@hesbynett.no>: Jun 05 03:41PM +0200

On 05/06/2021 14:38, Öö Tiib wrote:
 
> The C and C++ programs that use unaligned pointers are undefined (in
> sense of standard) regardless of the target architecture (that may allow
> unaligned accesses) but the implementations can extend.
 
Of course implementations can add whatever definitions they want beyond
the requirements of the standard.
 
And while dereferencing unaligned pointers is undefined behaviour (by
the standards), I haven't found anything that says that merely assigning
an unaligned value to a pointer is undefined behaviour. But that could
easily be something I missed - hopefully someone can then give the
reference (in the C or C++ standards).
 
>> 8-bit alignment for int.) Thus gcc's code is fine.
 
> Also MS code is fine as that masking should not have any ill effects
> to conforming code.
 
Sure. Suboptimal, but correct.
 
 
> I think that most important requirement of MSVC is that it should build
> good binaries out of Microsoft's own code base regardless how tricky
> that code base is.
 
That is a reasonable requirement!
 
> The gcc as whole does not have that sort of obligations.
 
gcc needs to be able to compile gcc and all its dependencies, libraries,
etc. That in itself is a rather massive and complex code base, full of
all kinds of weird stuff for historic reasons (including garbage
collection, mixes of C and C++, and code that dates back 30+ years that
no one really understands).
 
They also work with the Linux kernel folk and distributions like Debian
to test on a huge variety of existing software. I'm not sure whether
you could call that an "obligation" or a "requirement" for gcc as a
whole or, as you say, just for some people working on gcc. But it is
certainly something they do in the process of testing and preparing
releases.
 
"Öö Tiib" <ootiib@hot.ee>: Jun 05 08:18AM -0700

On Saturday, 5 June 2021 at 16:41:35 UTC+3, David Brown wrote:
> an unaligned value to a pointer is undefined behaviour. But that could
> easily be something I missed - hopefully someone can then give the
> reference (in the C or C++ standards).
 
When to attempt to make the pointer that is unaligned then "resulting
pointer value is unspecified" or equal wording in couple places of
C++ standard. I did mean usage like dereferencing of such unspecified
pointer value is undefined (unless implementation gives some better
guarantees).
 
> all kinds of weird stuff for historic reasons (including garbage
> collection, mixes of C and C++, and code that dates back 30+ years that
> no one really understands).
 
I agree. Still in MS if a code that did run with compiler version A does not
run with with compiler version B then it is about what is cheaper to
business: (1) to fix undefined behavior in that code or (2) to adjust that
compiler B. That (2) is more common with msvc than with gcc that also
compiles undefined behaviors in popular benchmarks "correctly" (as
example of (2) with gcc).
 
> whole or, as you say, just for some people working on gcc. But it is
> certainly something they do in the process of testing and preparing
> releases.
 
That code-base can't be declared sacred by business or by being
popular benchmark. So there we see Linus being vulgar but complying
and fixing such legacy code.
Bonita Montero <Bonita.Montero@gmail.com>: Jun 05 05:38PM +0200


> Look it up.
> You can do lots of things in C and C++ that are syntactically correct,
> but might have undefined behaviour.
 
I don't believe that memcpy()ing this way is UB.
David Brown <david.brown@hesbynett.no>: Jun 05 05:51PM +0200

On 05/06/2021 17:38, Bonita Montero wrote:
>> You can do lots of things in C and C++ that are syntactically correct,
>> but might have undefined behaviour.
 
> I don't believe that memcpy()ing this way is UB.
 
Can you give an example of what you are thinking about?
Richard Damon <Richard@Damon-Family.org>: Jun 05 11:56AM -0400

On 6/5/21 11:38 AM, Bonita Montero wrote:
>> You can do lots of things in C and C++ that are syntactically correct,
>> but might have undefined behaviour.
 
> I don't believe that memcpy()ing this way is UB.
 
The memcpy might not be, but subtracting two pointers that don't point
to elements of the same array is.
Richard Damon <Richard@Damon-Family.org>: Jun 05 11:58AM -0400

On 6/5/21 11:18 AM, Öö Tiib wrote:
> C++ standard. I did mean usage like dereferencing of such unspecified
> pointer value is undefined (unless implementation gives some better
> guarantees).
 
My understanding is that the unaligned pointer has an unspecified value
that might be a trap value, so any operation that uses that value can
cause Undefined Behavior.
 
MrSpook_rs7x@4hhtozmpj299zx.tv: Jun 05 04:00PM

On Sat, 5 Jun 2021 14:19:52 +0200
>> not sure.
 
>I can't help thinking that putting a non-aligned value into an int
>pointer is undefined behaviour - but I can't find a reference to that in
 
Alignment only matters on certain architectures, and even then , not always.
eg this compiles and runs fine on x86 MacOS using clang, setting non aligned
ints on both the stack and the heap:
 
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
 
int main()
{
uint32_t i;
uint32_t j;
uint32_t *p1;
uint32_t *p2;
 
p1 = (&j < &i ? &j : &i);
p2 = (uint32_t *)((char *)p1 + 1);
printf("p1 = %p, p2 = %p\n",p1,p2);
*p2 = (uint32_t)-1;
printf("*p2 = %u\n",*p2);
 
p1 = (uint32_t *)malloc(sizeof(uint32_t) * 2);
p2 = (uint32_t *)((char *)p1 + 1);
printf("p1 = %p, p2 = %p\n",p1,p2);
*p2 = (uint32_t)-1;
printf("*p2 = %u\n",*p2);
return 0;
}
David Brown <david.brown@hesbynett.no>: Jun 05 06:16PM +0200


>> I can't help thinking that putting a non-aligned value into an int
>> pointer is undefined behaviour - but I can't find a reference to that in
 
> Alignment only matters on certain architectures, and even then , not always.
 
The discussion is about behaviour that is not defined by the C and C++
standards. If you can point to /documentation/ that says clang on x86
MacOS defines the behaviour of unaligned access, that would be
interesting. But other than that, a sample of "this example happens to
work on this compiler with these flags on this target" is irrelevant.
 
We all know that on many - but not all - cpu targets, unaligned accesses
work as expected, albeit usually at a performance cost. But we are
talking about the standards definition of C++ here (and perhaps C, in
that C++ inherits such things from C), not cpus.
Bo Persson <bo@bo-persson.se>: Jun 05 07:50PM +0200

> printf("*p2 = %u\n",*p2);
> return 0;
> }
 
You did notice (right?) that the optimiser transformed
 
printf("*p2 = %u\n",*p2);
 
into
 
printf("*p2 = %u\n", (uint32_t)-1);
 
resulting in the code
 
00007FF71990106D mov rcx,rsi
00007FF719901070 mov edx,0FFFFFFFFh
00007FF719901075 call printf (07FF719901090h)
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 05 05:42PM -0400

On 6/5/21 8:19 AM, David Brown wrote:
...
> pointer is undefined behaviour - but I can't find a reference to that in
> the C standards (and I don't know the C++ standards well enough to look
> there).
 
It's not something either standard says explicitly. Rather, it's
something that need to be derived from what it says about other things.
If you start from a pointer that is correctly aligned for it's type,
most pointer operations give a result that still points to the same
type, and is still correctly aligned for that type. This includes
conversion to an integer type and back to the original pointer type.
The only operations that could result in a mis-aligned pointer all have
undefined behavior for one reason or another. For instance, conversion
to a pointer to a more strictly aligned type has undefined behavior if
the original pointer doesn't meet the alignment requirements of the new
type. Converting a pointer to an integer, performing any kind on
arithmetic on that integer to produce a different integer value, and
converting back again, has undefined behavior due to the omission of any
explicit definition of the behavior.
So what about starting with a mis-aligned pointer? If you have a packed
struct, the members of that struct might not be correctly aligned for
their type. By packing a struct is not a core language feature - it's
only available as an extension. On platforms with strong alignment
requirements, implementations that allow struct packing will generally
provide warnings about ways you should not use normal pointers to access
objects that might be misaligned.
You could also convert an integer that represents a memory location that
is not correctly aligned for a given type, and convert it into a pointer
to that type - but the behavior of such a conversion is undefined.
 
I'm not sure that the above argument covers every possibility, but I do
believe that every possible way of getting a misaligned pointer is
covered, in some fashion, by both standards.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 05 07:08PM -0400

Apologies to David, who has already received two versions of this
message as e-mail, because I keep hitting the Thunderbird "Reply" button
instead of their new "Followup" button.
 
On 6/5/21 9:29 AM, David Brown wrote:
 
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().
 
> Look it up.
 
 
That works in C because every C object can be accessed as an array of
char. C++ allows more complicated possibilities, including objects that
are not contiguous. But it works in C++ if the relevant objects are
required to be contiguous. If two such objects are both sub-objects of
the same larger object, the difference between those pointers satisfies
that requirement, otherwise the subtraction is undefined.
MrSpook_6qpp@dggw8.org: Jun 05 09:28AM

On Fri, 4 Jun 2021 13:30:16 -0700
>would be fun. The teacher says something like: We are going to implement
>several std containers under the namespace std_course. Imvvho, it would
>be an interesting and worth while exercise.
 
Writing a basic implementation of a doubly linked list used to be a fairly
standard interview test for C programmers back in the day before C++ became
popular. You'd be amazed (or maybe not) how many of them didn't have a clue
what such a construct even was never mind how to implement it.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: