soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

uninitialized build-in types - 4 Updates
Dependecy Injection - 11 Updates
uninitialized build-in types - 2 Updates

flimflim1172@gmail.com: Dec 16 11:59AM -0800

I've long wished C++ worked a bit like this:

int x; // initialized to zero in all scopes (not just global)

int y = std::undefined; // it's left uninitialized, only when I say so

struct Foo {
Foo() { }
Foo(std::undefined_t) : x_(std::undefined) { }
int x_;
};

struct Bar {
Bar() { }
int x_;
};

Foo foo; // foo.x_ is zero
Foo foo = std::undefined; // foo.x_ is undefined

Bar bar; // bar.x_ is zero
Bar bar = std::undefined; // compiler error

I know there's at least a few of us out there who would welcome this sort of thing. I also know there's massive opposition to this from really smart people, and so I want to learn from you. If I could understand why it's better for built-ins to be undefined by default rather than by request, it would be a great help to me.

Please keep in mind I'm asking for guidance completely separate from the issues of legacy code or precedent set by C. I understand many of those issues. I instead want to learn why it shouldn't be done even if C++ were being designed for the first time today. If I could get that through my thick head, I'd be very happy.

Here's something that may help understand part of where I'm hung up:

int x; // undefined
float x; // undefined
void* x; // undefined
std::vector<int> x; // well defined
std::string x; // well defined
std::shared_ptr<int> // well defined (empty shared_ptr)

In a perfect world, why should they be different?

I've read rationale that cites valgrind. I get why valgrind is great for catching uninitialized data bugs, but I'm not sure why it would matter to valgrind whether the lack of initialization was default or explicit.

Paavo Helde <myfirstname@osa.pri.ee>: Dec 16 02:48PM -0600

flimflim1172@gmail.com wrote in
> std::string x; // well defined
> std::shared_ptr<int> // well defined (empty shared_ptr)

> In a perfect world, why should they be different?

For comparison: in the D language all variables are initialized by
default. In particular, floating-point variables are default-initialized
to signalling NaN, that's a nice touch IMO!

To leave something uninitialized one has to specifically say it:

int x = void;

> for catching uninitialized data bugs, but I'm not sure why it would
> matter to valgrind whether the lack of initialization was default or
> explicit.

It would make it impossible to catch some errors. E.g.

void InitializeXY(int& x, int y) {
x = 1;
y = 2;
}

int x, y;
InitializeXY(x, y);
// go on to use x,y

As of now, the compiler or a tool like valgrind will report the error,
but if the variable had been zero-initialized there would be no error to
detect.

My wishlist: zero overhead signalling NaN-s for integer types as well,
and use them as default initializers.

Cheers
Paavo

David Brown <david.brown@hesbynett.no>: Dec 16 11:19PM +0100

On 16/12/15 20:59, flimflim1172@gmail.com wrote:> I've long wished C++
worked a bit like this:

> int x; // initialized to zero in all scopes (not just global)

As Stefan says, you mean "not just static". Everything with a fixed
address in memory is initialised to zero, or with the default
constructor, if you don't explicitly give an initialisation value.

Stack allocated objects, as well as anything with memory allocated by
"malloc" (as distinct from "new"), is uninitialised.

This is a /good/ thing. Normally, you would not declare your local
variable until you have something to put in it:

int foo() {
...
int x = bar();
...
}

You don't write:

int foo() {
int x;
...
x = bar();
...
}

That style is a left-over from C before C99, and is rarely what you want
in C++ code. The only exception is code like this :

int foo(int a) {
int x;
if (a >= 0) {
x = positive();
} else {
x = negative();
}
}

In such cases, having x zero'ed in advance would be inefficient.

And note that if you have a half-decent compiler and know how to use it,
you are in (almost) no danger of accidentally using an uninitialised
variable - the compiler will warn you if you make such a mistake. On
the other hand, if you use "int x = 0;" or had implicit initialisation
of local variables as part of the language (as you suggest), you would
lose that warning ability that helps spot mistakes in code.

So having local data uninitialised by default is a good thing.

> int y = std::undefined; // it's left uninitialized, only when I say so

You don't mean "undefined", you mean "unspecified" or "indeterminate".
In C++ (and C), when something is "undefined" the behaviour can do
/anything/, including launching nasal daemons. What you mean here is
that y should have a valid and legal value for its type, but you don't
care which particular value. This means it out be legal behaviour to
use it (but hopefully with a compiler warning) - if it were "undefined"
then the compiler could assume the code that uses it could never happen.

In particular, I would expect a variable initialised to an "unspecified"
or "indeterminate" value to retain its value, and not change it at
different times. So "x == x" would always be true. An "undefined"
value, on the other hand, /could/ change - "x == x" could sometimes be
true, and sometimes false.

Having a "std::indeterminate" or "std::unspecified" (or using D's syntax
of "void", which I think is more practical) would be a nice idea for
some code. For example, you might have code like this:

struct ValidValue {
bool valid;
double value;
ValidValue(valid_, value_) : valid(valid_), value(value_) {};
};

ValidValue squareroot(double x) {
if (x >= 0) {
return ValidValue(true, sqrt(x));
} else {
return ValidValue(false, std::unspecified);
}
}

Initialising to "void" or "std::unspecified" would also be useful for
static and global data at times (perhaps particularly for embedded systems).

> Bar bar = std::undefined; // compiler error

> I know there's at least a few of us out there who would welcome this
> sort of thing.

I think people who want locals to be zero-initialised by default haven't
thought things through well enough.

Having a way to explicitly declare something as uninitialised or
unspecified would be useful, however. It is a different concept.

> really smart people, and so I want to learn from you. If I could
> understand why it's better for built-ins to be undefined by default
> rather than by request, it would be a great help to me.

Hopefully what I wrote above will be helpful to you.

> std::string x; // well defined
> std::shared_ptr<int> // well defined (empty shared_ptr)

> In a perfect world, why should they be different?

Objects often have an invariant that defines part of their structure -
it should not be easy in the language to create an object that does not
have that invariant. Part of the constructor's job is to establish the
invariant (indeed, for a default constructor that is often its only job)
- thus constructors need to be called on all objects. POD types like
"int" have no invariant, so there is no initialisation needed to make
the type a consistent member of that type.

Note that a default constructor does not have to initialise all members
of the object - only those that are needed to establish the invariant.

(I don't know whether "EnumClass e;" forces "e" to be an element of the
enum class, or if it is left indeterminate.)

flimflim1172@gmail.com: Dec 16 03:31PM -0800

> As Stefan says, you mean "not just static".

My mistake - you're right. I sometimes conflate "global" and "static".

> x = bar();
> ...
> }

Those cases have never bothered me. What makes me want different behavior in C++ is this:

In Gizmo.h:

class Gizmo {
public:
Gizmo();
... blah blah
private:
int x_; // I just added this member
};

In Gizmo.cpp:

Gizmo::Gizmo() :
x_(0) // and I have to go far away and do this too
{
}

and I still haven't done anything with x_ yet, which happens in other code that is similarly not next to declaration or initialization. C++11 allows me to do "int x_ = 0;" in the class declaration, but I can still forget, and it's still a different requirement on me than if x_ had been an std::vector<int>, or had static storage.

> the other hand, if you use "int x = 0;" or had implicit initialisation
> of local variables as part of the language (as you suggest), you would
> lose that warning ability that helps spot mistakes in code.

I do see these warnings a lot when the declaration, lack of initialization, and usage are all together. I've never seen such a warning when the data I forgot to initialize was a class member variable. Is that even possible? I use msvc, clang, ghs, armcc, codewarrior, and sn on a regular basis and I can't remember any analysis picking up on this type of mistake with uninitialized members.

> You don't mean "undefined", you mean "unspecified" or "indeterminate".

Sorry - you're right again. I sometimes conflate undefined and unspecified. This is even worse than global/static. Very bad of me.

> Having a "std::indeterminate" or "std::unspecified" (or using D's syntax
> of "void", which I think is more practical) would be a nice idea for
> some code. For example, you might have code like this:

I would really like this, wow. I actually considered 'void' too, but thought it might have ambiguity problems (I'm not qualified to know I think, so I resorted to something like std::nullptr_t). As long as using 'void' doesn't restrict me from using it with user-defined types too. I feel I'd sometimes want an un-initializing constructor, distinct from the default constructor.

> the type a consistent member of that type.

> Note that a default constructor does not have to initialise all members
> of the object - only those that are needed to establish the invariant.

Hmmm, but with that rational, std::vector<int>'s default constructor could just as well decide to put 5 ints into the vector. That would satisfy it's invariants. But it doesn't. It puts zero ints in there, and we'd all be mad if it didn't. Still, you may not want zero - you may want it to have 10 ints each initialzied to 77 - and you have to ask for that or accept a well specified default state. I often find myself wishing the built-in types worked this way, almost always when adding new variables to classes.

Dependecy Injection

"Öö Tiib" <ootiib@hot.ee>: Dec 15 04:18PM -0800

On Wednesday, 16 December 2015 01:33:52 UTC+2, jacobnavia wrote:

> The code generators offer the compiler a common set of actions and
> produce different outputs.

> Is this a case of "dependency injection" ?

May be. Only that "compiler" sounds huge module in that story and sort of
injects the "code generator" modules (dependencies) to itself. Typically
injector is some other module (like "configuration").

David Brown <david.brown@hesbynett.no>: Dec 16 09:53AM +0100

On 16/12/15 00:33, jacobnavia wrote:
> When a compiler starts up it determines the cpu type it is running in.
> If the program is for the current machine, it can choose from a set of
> different code generators which model corresponds best to the machine.

(It is questionable whether you would want to do exactly this on a
compiler, since you might be targeting something other than the exact
cpu you are running on, but that's another matter.)

> The code generators offer the compiler a common set of actions and
> produce different outputs.

> Is this a case of "dependency injection" ?

That sounds like one way to do this - a little like "plugins", except
that the plugins are built into the main binary.

Juha Nieminen <nospam@thanks.invalid>: Dec 16 09:33AM

> data section or stack with respect to cache-line occupancy or
> hit rates, particularly when the object exceeds the native
> cache line size.

One would think so, indeed. However, one test of mine showed otherwise.

Scenario: A class requires a (relatively small) array of data for its
calculations. The function that does these calculations is called
hundreds of thousands of times. What is faster:

1) Keep a std::vector as a member of the class which is created only
once, and which is reused every time the calculation function is called.
(In other words, no allocations are done each time the function is called.)

2) Allocate an array of the required size on the stack every time the
function is called. (This cannot be done in standard C++, but platform
specific functions can be used for this purpose.)

Personally I would have thought that either they are equally fast, or
#1 is slightly faster (because #2 requires extra operations every time
the function is called.)

To my surprise, #2 turned out to be faster. (Not by a whole lot, but
measurably so.)

Moreover, #1 is not thread-safe, while #2 is, which is an additional bonus.

Granted, I ran this test on one single (32-bit) platform so it can't be
taken as a general rule, but I think it's rather telling.

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

Wouter van Ooijen <wouter@voti.nl>: Dec 16 12:55PM +0100

Op 16-Dec-15 om 10:33 AM schreef Juha Nieminen:

> 2) Allocate an array of the required size on the stack every time the
> function is called. (This cannot be done in standard C++, but platform
> specific functions can be used for this purpose.)

You mean the size of this small array is not known at compile time?
Otherwise I don't see why this can't be a simple local variable.

> Personally I would have thought that either they are equally fast, or
> #1 is slightly faster (because #2 requires extra operations every time
> the function is called.)

Personally I know that what is faster depends on so many details (that
might vary from system to system) that I would not trust my gut feeling.
(But my gut feeling is that 2) would be faster because stack-based
addressing is ofte faster than this-based addressing, and it might
permit the data to reside in registers, and local (stack-relative) data
is more likely to be in the cache.)

Wouter van Ooijen

scott@slp53.sl.home (Scott Lurndal): Dec 16 01:49PM

>addressing is ofte faster than this-based addressing, and it might
>permit the data to reside in registers, and local (stack-relative) data
>is more likely to be in the cache.)

In both cases (stack-based and this-based addressing), the object code is
identical on all modern architectures. A load/store from an offset relative to
a base address in a register. With x86_64 and 64-bit RISC processors, it
is entirely likely that 'this' is always in a register and seldom if ever
spilled to stack.

Wouter van Ooijen <wouter@voti.nl>: Dec 16 03:39PM +0100

Op 16-Dec-15 om 2:49 PM schreef Scott Lurndal:
> a base address in a register. With x86_64 and 64-bit RISC processors, it
> is entirely likely that 'this' is always in a register and seldom if ever
> spilled to stack.

As I said, it depends. On an ARM the SP is a dedicated register, so it
will always be available for stack-offset based addressing. This would
occupy a register, which means that one register less is available for
other purposes, which could affect the code.

Wouter van Ooijen

Juha Nieminen <nospam@thanks.invalid>: Dec 16 02:56PM

> You mean the size of this small array is not known at compile time?

Yes, I mean that. The size required for the array depends on the input to the
class, and can change arbitrarily at runtime.

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

scott@slp53.sl.home (Scott Lurndal): Dec 16 04:27PM

>will always be available for stack-offset based addressing. This would
>occupy a register, which means that one register less is available for
>other purposes, which could affect the code.

I did refer to 64-bit RISC processors, not ARMv5/6/7 where register
pressure is much less a problem.

ARMv8 has 32 64-bit integer registers, with one Architecturally reserved
for the stack pointer. 30 of the other 31 registers are available. As
parameters are passed in registers, 'this' will _always_ start a function
in a register, and in my (quite extensive) experience with arm64 code, it
is very uncommon for register pressure to be sufficient that the compiler
will chose to spill the register containing 'this'. Likewise, pointers
to objects will be kept in a register and the compiler will analyze the
flow of the code to determine if it makes sense to spill that pointer
to the stack. Doesn't happen often in reasonable sized functions.

Likewise with 64-bit intel systems.

Generally there is much lower-hanging fruit to pick in any case.

scott@slp53.sl.home (Scott Lurndal): Dec 16 04:30PM

>> You mean the size of this small array is not known at compile time?

>Yes, I mean that. The size required for the array depends on the input to the
>class, and can change arbitrarily at runtime.

Which, for me, would completely preclude dynamic allocation
on the stack (e.g. via alloca(3)), since the stack size is often
constrained in the environments I program in, and if it can
change arbitrarily (with unknown bounds), it doesn't belong on
the stack anyway.

Wouter van Ooijen <wouter@voti.nl>: Dec 16 07:17PM +0100

Op 16-Dec-15 om 5:27 PM schreef Scott Lurndal:
>> other purposes, which could affect the code.

> I did refer to 64-bit RISC processors, not ARMv5/6/7 where register
> pressure is much less a problem.

I referred to Cortex-M, which is IMO a modern architecture too, although
not for desktop use.

> Generally there is much lower-hanging fruit to pick in any case.

If you mean "there are other aspects that are likley to have much more
impact": I totally agree.

Wouter

scott@slp53.sl.home (Scott Lurndal): Dec 16 09:17PM

>> pressure is much less a problem.

>I referred to Cortex-M, which is IMO a modern architecture too, although
>not for desktop use.

Modern in the sense that it was designed in 1983-85 for the Acorn Risc Machine (ARM)?

uninitialized build-in types

ram@zedat.fu-berlin.de (Stefan Ram): Dec 16 08:24PM

>I've long wished C++ worked a bit like this:
>int x; // initialized to zero in all scopes
>(not just global)

It's, »not just static« IIRC.

>std::string x; // well defined
>std::shared_ptr<int> // well defined (empty shared_ptr)
>In a perfect world, why should they be different?

Maybe it has to do with C++ wanting to be an »object-oriented
language«? In OOP it is common to assume that a new object is
in a valid state. (This seems to be the literal meaning of
»resource acquisition is initialization«, by the way.) The
behaviour for primitives may have to do with the C legacy.

ram@zedat.fu-berlin.de (Stefan Ram): Dec 16 08:42PM

>>int x; // initialized to zero in all scopes
>>(not just global)
>It's, »not just static« IIRC.

(I was referring to a static storage duration.)

>in a valid state. (This seems to be the literal meaning of
>»resource acquisition is initialization«, by the way.) The
>behaviour for primitives may have to do with the C legacy.

The prototypical situation when one does not initialize
an int object is

#include <iostream>
#include <istream>
...
{ int n; ::std::cin >> i; }

corresponds to

#include <iostream>
#include <istream>
#include <string>
...
{ ::std::string s; ::std::cin >> s; }

, and may be, for this, »s« already needs to have a valid state.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, December 16, 2015

Digest for comp.lang.c++@googlegroups.com - 17 updates in 3 topics

No comments:

Blog Archive

About Me