soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

recovering from std::bad_alloc in std::string reserve - 7 Updates
A way around _HAS_ITERATOR_DEBUGGING - 10 Updates
Solar System Assembly Line - 2 Updates
lamda-optimization - 2 Updates

recovering from std::bad_alloc in std::string reserve

David Brown <david.brown@hesbynett.no>: Jun 03 09:08AM +0200

On 03/06/2021 00:49, Manfred wrote:
>> other standard library functions). I guess the developers simply
>> haven't bothered - perhaps because the "div" functions are rarely used.

> Interesting, that's surprising.

Yes. While I don't expect the "div" functions to be much used, it seems
to me it should a relatively easy optimisation.

(MSVC manages it, gcc, clang and icc do not.)

>> Certainly the operators give simpler and clearer source code, and
>> significantly smaller and faster object code in practice.

> 'Certainly' smaller and faster because you tested it.

The cleaner and simpler source code is the "certainly" part. I am sure
there are some older or more limited compilers for targets without
hardware division and for which calling the "div" function is more
efficient. (Testing gcc on targets like the AVR that don't have
division assembly instructions, basically the same code was generated
for "div" and /, % .)

> As per their
> definition, there is no reason for which div should perform worse than
> '/' and '%'.

The function call overhead here is going to dominate the cost -
shuffling around data into the right registers, calling the function in
a library (imagine if it is in a DLL/so), instruction cache misses,
stacking and restoring other data according to ABI volatile and
preserved register usage, reduced scope for optimisation with constant
propagation, inlining, pre-calculating results, etc. There are many
reasons why it should be worse.

> In fact, the only motivation for the *div functions to exist is that
> they perform better, or at the very least equal, to the pair '/' and '%'.
> To me it sounds like a matter of QoI.

I am sure that in the early days of C, the div functions would - on some
systems at least - have performed better than the division operators
together. But not now - and not for a long time, on most targets.

>> operators.

> Well, it's the /definition/ of "div" that it calculates the quotient and
> the remainder in one go, not an assumption.

The wording is "in a single operation". Since that concept is not
explicitly defined in the standard (AFAIK), and since there is no way
that standard can insist that a particular implementation does the
operation as a single instruction (not all processors have division
instructions of any sort), that part of the description simply says you
get both results from one function call.

> the only reason for it to exist.
> If the implementation is sloppy then it's good to know, but it's also a
> different matter.

On most modern processors, no implementation could possibly have a
library call here that is faster than doing the operations using / and %
with a single division assembly code. It's not being sloppy - it is
impossible. You'd have to go out of your way to make an intentially
poor quality compiler for a call to "div" to be faster than using the
operators. (Failing to replace the "div" call with inline code is a
missed optimisation opportunity, and therefore QoI.)

David Brown <david.brown@hesbynett.no>: Jun 03 10:53AM +0200

On 03/06/2021 09:08, David Brown wrote:

>> Interesting, that's surprising.

> Yes. While I don't expect the "div" functions to be much used, it seems
> to me it should a relatively easy optimisation.

I reported the lack of "div" builtins in the gcc bugzilla, and it was
marked as a duplicate for an existing one that gave a good explanation
for why "div" is awkward for optimising. The problem is that the layout
of the "div_t" struct is not specified in the standard, and gcc can be
used with different standard libraries that might have "quot" and "rem"
in different orders. Thus an optimisation here would depend on the
source code having included <stdlib.h> (and the compiler knowing the
contents of it), or that the compiler can prove that the layout of the
struct doesn't matter. These would both require significant new
optimisation infrastructure in the compiler. So maybe it will happen
one day, but not yet - probably not until the gcc developers have a more
important use for similar infrastructure.

For MSVC, the same supplier makes both the compiler and the library, and
therefore the compiler knows the structure of div_t and can optimise
appropriately.

Manfred <noname@add.invalid>: Jun 03 04:20PM +0200

On 6/3/2021 9:08 AM, David Brown wrote:
> preserved register usage, reduced scope for optimisation with constant
> propagation, inlining, pre-calculating results, etc. There are many
> reasons why it should be worse.

Yes, but having "div" as intrinsics is part of picture here.

> poor quality compiler for a call to "div" to be faster than using the
> operators. (Failing to replace the "div" call with inline code is a
> missed optimisation opportunity, and therefore QoI.)

Yes, but still the only reason for div to exist is to provide /some/
benefit over '/' and '%', which in the end is only a matter of
efficiency, so an implementation that manages to deliver a 'div' family
of functions that performs worse than the pair of operators still
qualifies as sloppy, at least in my book - this includes missing it as
inline or intrinsics, because, as you say, it makes any chance of
efficiency hopeless.

> For MSVC, the same supplier makes both the compiler and the library, and
> therefore the compiler knows the structure of div_t and can optimise
> appropriately.

Your other post (quoted above) reports a reasonable explanation of the
complications involved for gcc - I'd say that from the user's
perspective an implementation consists of the combination compiler +
library, so it simply means that the implementation is suboptimal in
this specific case.

Again, most probably the gcc folks and friends simply didn't bother too
much because this topic is low priority.

Manfred <noname@add.invalid>: Jun 03 04:43PM +0200

On 6/2/2021 11:32 PM, David Brown wrote:
> types. ABI's generally do not specify how to pass larger integer types
> - it is not covered by the specification for a struct comprising of two
> smaller types.

As far as I know ABI's do specify how to pass structs (arrays in C are a
bit different in that they are passed by reference unlike all other types).

The point is that in C there is some provision to pass arguments of
arbitrary size, both at the level of the standard and of actual
implementations.
The idea of having a scalar type of arbitrary size is problematic, but
it is also an opportunity, in my opinion.
In fact one of the few architectural changes in the last couple of
decades is the increase of word size. So it's not unreasonable that
looking in perspective the committee wanted to introduce some support
for 'large' scalars - integers being the obvious choice.
We have seen the demand for such types grow from 8 to 128 bits,
processor words grow from 8 to 64 bits, and given the increasing demand
for e.g. cryptography, with the vital role of secure communications
already today, it makes some sense to envision that these numbers could
keep growing sooner or later.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 03 11:20AM -0400

On 6/3/21 3:08 AM, David Brown wrote:
> On 03/06/2021 00:49, Manfred wrote:
...
> preserved register usage, reduced scope for optimisation with constant
> propagation, inlining, pre-calculating results, etc. There are many
> reasons why it should be worse.

I wouldn't expect function call overhead to be relevant unless div() is
used in a context (usually involving function pointers) that prevents
div() from being inlined. When it is inlined, I would expect a call to
div() to be optimized to essentially the same code as would be generated
for separate / and % expressions.
Note: reality often fails to live up (or, in some cases, down) to my
expectations.

David Brown <david.brown@hesbynett.no>: Jun 03 07:04PM +0200

On 03/06/2021 17:20, James Kuyper wrote:
> for separate / and % expressions.
> Note: reality often fails to live up (or, in some cases, down) to my
> expectations.

If it were inlined, I would expect it to be optimal in speed and size.
But standard library functions are often not inlined unless they are
completely replaced by built-ins that have the same semantics. I'm not
sure if the standard library functions can be declared as "inline" and
defined in headers like <stdlib.h> - it would be interesting to know.
But AFAIUI, glibc - for whatever reason - don't like to have inline
definitions of in their standard C library headers.

scott@slp53.sl.home (Scott Lurndal): Jun 03 10:24PM

>If it were inlined, I would expect it to be optimal in speed and size.
>But standard library functions are often not inlined unless they are
>completely replaced by built-ins that have the same semantics. I'm not

As of GCC 4.8, 'div' wasn't ever inlined, even with -O3.

However, the compiler optimized this into a single idiv:

#include <stdlib.h>

int main(int argc, const char **argv)
{
//div_t qr = div(1234235235, 10);
long q, r;
long a, b;
a = strtol(argv[1], NULL, 0);
b = strtol(argv[2], NULL, 0);

q = a/b;
r = a%b;

return q*8 + r;
}

0000000000400440 <main>:
400440: 55 push %rbp
400441: 31 d2 xor %edx,%edx
400443: 48 89 f5 mov %rsi,%rbp
400446: 53 push %rbx
400447: 48 83 ec 08 sub $0x8,%rsp
40044b: 48 8b 7e 08 mov 0x8(%rsi),%rdi
40044f: 31 f6 xor %esi,%esi
400451: e8 da ff ff ff callq 400430 <strtoul@plt>
400456: 48 8b 7d 10 mov 0x10(%rbp),%rdi
40045a: 48 89 c3 mov %rax,%rbx
40045d: 31 d2 xor %edx,%edx
40045f: 31 f6 xor %esi,%esi
400461: e8 ca ff ff ff callq 400430 <strtoul@plt>
400466: 48 89 c1 mov %rax,%rcx
400469: 48 89 d8 mov %rbx,%rax
40046c: 48 83 c4 08 add $0x8,%rsp
400470: 48 99 cqto
400472: 48 f7 f9 idiv %rcx
400475: 5b pop %rbx
400476: 5d pop %rbp
400477: 8d 04 c2 lea (%rdx,%rax,8),%eax
40047a: c3 retq
40047b: 90 nop

A way around _HAS_ITERATOR_DEBUGGING

"Öö Tiib" <ootiib@hot.ee>: Jun 02 05:30PM -0700

On Wednesday, 2 June 2021 at 21:28:00 UTC+3, Alf P. Steinbach wrote:
> > _HAS_ITERATOR_DEBUGGING.
> But why use an expression with formally Undefined Behavior when there is
> a perfectly well-defined way to do it?

Bonita's actual name is Chuck Norris. He tested it on all machines and
with all compilers and also did count to infinity twice. So it is perfectly
valid to do it.

Juha Nieminen <nospam@thanks.invalid>: Jun 03 09:21AM

> In theory, but there's for sure no actual vector-implementation
> where this doesn't work as long as you don't read from or write
> to the reference.

The problem with relying on undefined behavior is that you have no
guarantees about what the compiler will do to your code.

Most people think that "undefined behavior" only pertains to how
your code will behave when run on a computer. Maybe you'll get a
segmentation fault, maybe you'll be reading garbage, maybe you'll
be overwriting values in the stack somewhere making the program
bug out...

However, not many people are aware that official UB frees the
*compiler* to do whatever it wants with that kind of code. The
standard does not require that the compiler does what you expect
it to do. The compiler might see that the code has UB and optimize
it away, for example. This is not illegal from the standard's
point of view.

So the problem is that you can never be sure what the compiler
will do with your UB code. It might do something you don't expect.

(This was the reason for that one infamous bug in the Linux kernel,
which dereferenced the null pointer, and the compiler performed an
optimization on it that made it behave in an unexpected manner.)

Bonita Montero <Bonita.Montero@gmail.com>: Jun 03 11:33AM +0200

> The problem with relying on undefined behavior is that you have no
> guarantees about what the compiler will do to your code.

In theory - but in this case actually practically not.

Rest of your nonsense unread.

Bo Persson <bo@bo-persson.se>: Jun 03 12:17PM +0200

On 2021-06-03 at 11:33, Bonita Montero wrote:
>> The problem with relying on undefined behavior is that you have no
>> guarantees about what the compiler will do to your code.

> In theory - but in this case actually practically not.

So enlighten us - what are the guarantees?

Bonita Montero <Bonita.Montero@gmail.com>: Jun 03 12:25PM +0200

>> In theory - but in this case actually practically not.

> So enlighten us - what are the guarantees?

referencing vec.end() is actually UB because there might be machines
on which loading such a reference traps. But actually these systems
aren't relevant today.
And I'm not even referencing vec.end(), but just calling operator ->(),
that's what to_address also does in one variant ([*]).

[*] https://en.cppreference.com/w/cpp/memory/to_address

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 03 11:06AM -0400

On 6/3/21 5:21 AM, Juha Nieminen wrote:
>> to the reference.

> The problem with relying on undefined behavior is that you have no
> guarantees about what the compiler will do to your code.

The C standard, while describing the unary & operator,says "If the
operand is the result of a unary * operator, neither that operator nor
the & operator is evaluated and the result is as if both were omitted,
except that the constraints on the operators still apply and the result
is not an lvalue." (C 6.5.3.2p3).

I could find no corresponding statement in the C++ standard, but
shouldn't there be one? The C++ version would have to be more
complicated, to deal with issues that can't come up in C, such as
operator overloads. However, is there any good reason why there
shouldn't be a similar clause in the C++ standard? That would make
Bonita's code perfectly safe.

David Brown <david.brown@hesbynett.no>: Jun 03 07:07PM +0200

On 03/06/2021 17:06, James Kuyper wrote:
> operator overloads. However, is there any good reason why there
> shouldn't be a similar clause in the C++ standard? That would make
> Bonita's code perfectly safe.

I looked up exactly the same thing myself. In C++, this kind of thing
is complicated by overloaded operators. It is perfectly allowable for
the "*" operator on a class to have side-effects, or do something that
is not normal dereferencing. The same applies to the "&" operator. So
it would not be good to say that "&*" is eliminated in the way it is in
C - you would have to say it only applied to standard types, and then
things get inconsistent.

Paavo Helde <myfirstname@osa.pri.ee>: Jun 03 08:18PM +0300

03.06.2021 18:06 James Kuyper kirjutas:
> operator overloads. However, is there any good reason why there
> shouldn't be a similar clause in the C++ standard? That would make
> Bonita's code perfectly safe.

C does not have user-defined operators. In C++ every iterator has its
own implementation of operator*() and there is no guarantee that the
operator & would exactly undo that operation. Indeed, typically it
cannot and does not. Typically the result of &*iter is a pointer, not an
iterator, and that's exactly why one might use such a construction in
the first place.

So in C++ one just cannot "cancel" &* as the result would be of a wrong
type. It might be possible to make it happen (maybe by adding a special
operator()&* to the standard which iterator classes would use for
converting an iterator into a pointer), but the needed legalese seems
daunting, and the benefits seem pretty meager (satisfying one angry
Bonita Montero somewhere on the internet; everybody else is happy with
vec.data()+vec.size()).

Juha Nieminen <nospam@thanks.invalid>: Jun 03 05:49PM

>> guarantees about what the compiler will do to your code.

> In theory - but in this case actually practically not.

> Rest of your nonsense unread.

I honestly cannot understand why you feel the need to behave like such
an asshole. I merely made a remark about the language standard and what
it could allow the compiler to behave like (and gave a practical example
of when this caused a huge issue in the Linux kernel).

So why are you treating it like shit? I don't understand. What exactly
did I do or say that elicits such irrational behavior?

It's not even the first time. You keep doing this again and again,
most often completely unpromted out of the blue, in response to completely
innocuous posts, for completely unknown reasons. Why?

Juha Nieminen <nospam@thanks.invalid>: Jun 03 05:52PM

> operator overloads. However, is there any good reason why there
> shouldn't be a similar clause in the C++ standard? That would make
> Bonita's code perfectly safe.

It's different when dealing with iterators. I thing some std::vector
implementations in fact use raw pointers as iterators, but that's
not guaranteed.

Solar System Assembly Line

Daniel65 <daniel47@eternal-september.org>: Jun 03 07:23PM +1000

Rick C. Hodgin wrote on 3/6/21 2:07 am:
> devout Christians, scientists, and nobody can debunk it with
> anything that's not related to some other theory which contradicts
> it.

WHAT?? You're complaining because people debunk your theory with "some
other theory which contradicts it"!!

How else would you expect people to debunk your theory??

> I announce today that I am rejecting this theory for one reason:

You are finally coming to your senses??

> to us on this Earth in this existence is unique and special in His
> universe.

> I reject the Solar System Assembly Line theory on that basis.

Owww!! Did-mums!!
--
Daniel

Juha Nieminen <nospam@thanks.invalid>: Jun 03 09:24AM

> I reject the Solar System Assembly Line theory on that basis.

And we needed to know this why, exactly?

lamda-optimization

MrSpook_9na7kf9y25@bv5tkwx.biz: Jun 03 08:10AM

On Wed, 2 Jun 2021 05:05:06 +0000 (UTC)
>erode the credibility of the person by accusing the person of having
>done or said something objectionable (usually completely unrelated
>to the discussion at hand).

Ok, whatever. You keep your own definition of ad hominem, the rest of us will
stick with the standard one.

MrSpook_2TF@zp_6z10zl3.info: Jun 03 08:12AM

On Wed, 02 Jun 2021 14:21:31 GMT

>>AFAIK Bonita is a woman and as such you all owe her respect.

>That is two assumptions, and I've strong doubts about the first,
>and the second is silly.

Can't you spot sarcasm?

Plus stick her name into google translate.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, June 3, 2021

Digest for comp.lang.c++@googlegroups.com - 21 updates in 4 topics

No comments:

Blog Archive

About Me