soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Is this undefined behavior? - 2 Updates
who's at fault, me or compiler? - 7 Updates
Observable end padding in arrays - 3 Updates
Link with library of exact filename (i.e. exact version) - 4 Updates
Fixing some undefined behavior - 3 Updates

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jul 08 08:23AM -0700

> converting iterations into a recursive function, or is this
> "permission" merely based on the lack of prohibition under the
> umbrella of the "as if" rule?

To a certain extent it is both. There is no specific statement in
the C++ standard (or the C standard either) that an iterative
function may be implemented using recursive object code, or vice
versa. But the C++ standard does say (in intro.execution, p1)
"[Conforming implementations] need not copy or emulate the
structure of the abstract machine." Any such mapping does fall
under the "as if" rule, so in that sense the freedom is implicit
rather than explicit. At the same time there is an explicit
freedom granted to disregard what the abstract machine would do
(provided of course the "as if" requirements are met). A C++
implementation can be conforming even if the source code is
"compiled" by translating it to pure Lisp and then running the Lisp
code. Pure Lisp doesn't have any way of iterating; all it has is
recursion. As long as the Lisp code produces the same output that
the abstract machine would, the implementation is conforming.
(Note: there are some other aspects of what is called "observable
behavior" that I've left out, but that doesn't change the key point
that compiling to a recursive-only environment such as pure Lisp
can still be conforming.)

> time) than iterative ones, so the source code may have well defined
> behavior in the iterative form, but run into UB in the recursive form
> due to resource exhaustion.

I know some people think that running out of stack space (as easily
might happen in a deeply recursive function) is necessarily
undefined behavior. That's not right. Running out of stack space,
whether because of recursion or otherwise, is a property of the
execution environment, and the implementation has no control over
that. Indeed the implementation might not even be able to discover
if it's about to happen. These things do not affect whether a
program has undefined behavior, which is determined solely by what
is specified (or not) to happen in the abstract machine.

I explained this stuff in more detail, sometime last year, in several
postings in this newsgroup. If you would like more explanation, I
can dig around and see if I can find some of that commentary, to
help answer further questions. Of course I reserve the right to
answer further questions directly, without making any reference to
my previous comments. ;)

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jul 09 12:59AM +0200

On 08.07.2020 17:23, Tim Rentsch wrote:
> These things [like stack usage] do not affect whether a
> program has undefined behavior, which is determined solely by what
> is specified (or not) to happen in the abstract machine.

No. UB is not solely a static property of a program. It's also a dynamic
property, such as exceeding a buffer size, or an implementation limit.

I seem to have conceded your POV that a C or C++ compiler /can/
translate to recursive function implementation. But only by then
supposing that a compiler /can/ introduce likely dynamic UB, or at the
very least remove a conditional guarantee of well defined operation. So
this is now the point where we differ.

In my view having standards that allow that is very ungood. But it's a
thorny problem. The "conditional" I mention is because there are a
zillion possible dynamic UB sources, such as a `bool` variable: the
compiler is free to willy-nilly decide that in this particular
compilation `sizeof(bool)` is, say, 2M, and furthermore to know that its
implementation limit of stack size is less, then prove to itself that
hence the `main`, which here happens to have a `bool` temporary, would
incur UB, hence that any behavior added to `main` would be fine.

It's the anything goes.

And again, that happens because an informal, practically oriented
standard is treated as a precise formal work, which it clearly isn't.

- Alf

who's at fault, me or compiler?

boltar@nowhere.co.uk: Jul 08 09:05AM

On Tue, 7 Jul 2020 15:15:41 +0000 (UTC)
>integers and do effectively a memset() on that memory block
>to zero it. Any code that accesses the members just use
>offsets from the beginning of that memory block.

I was using "stored" in a rather liberal sense. I didn't mean it had to be
some kind of lookup table but if you have

struct mystruct
{
int i;
char c;
short s[5];
int j;
};

then the binary needs to be aware that (assuming no padding) s is 5 bytes
away from i in the memory and j is 10 bytes away however that awareness is
stored internally.

boltar@nowhere.co.uk: Jul 08 09:05AM

On Tue, 7 Jul 2020 20:38:20 +0200
>> The class memory layout has to be stored in some form somewhere in the binary

>> otherwise on the fly objects could never be created.

>Yes, they can. It is called construction.

Whoooosh....

boltar@nowhere.co.uk: Jul 08 09:07AM

On Wed, 8 Jul 2020 00:16:21 +0300
>time and to create objects from them "on the fly". The compiled code
>accesses object members at fixed offsets which are hardcoded in the code
>and not looked up anywhere at run time.

And the difference between memory layout being stored and hardcoded offsets
is.... what exactly?

Juha Nieminen <nospam@thanks.invalid>: Jul 08 09:25AM

>>and not looked up anywhere at run time.

> And the difference between memory layout being stored and hardcoded offsets
> is.... what exactly?

"otherwise on the fly objects could never be created"

would imply that classes can only be instantiated if their exact internal
structure is known, else it's impossible.

As mentioned, that's not necessarily the case. For example, if the class
consists only of integer variables, for instance, which are all
zero-initialized or default-initialized, it can be instantiated by
knowing the size of the class, with no knowledge of its internal
structure. (If it's completely zero-initialized, then what amounts
to a single memset() call can be used to initialize it.)

No code in the program might access every single member variable of that
class, which means that only some of the offsets will be stored in code,
not all of them. This means that even by the loosest possible definition
of "storing the memory layout of the class", it would only have been
"stored" partially, not fully, yet that doesn't make it impossible to
instantiate.

For example, maybe the class has 20 integer member variables, but this
particular program only accesses the first one of them, ignoring the
rest. This means that, effectively, the program only accesses the
very first value in the class. None of the rest of the internal
structure of the class is stored anywhere in the code, in any way.
Yet that doesn't make it impossible to instantiate the class.

Paavo Helde <eesnimi@osa.pri.ee>: Jul 08 03:28PM +0300

>> and not looked up anywhere at run time.

> And the difference between memory layout being stored and hardcoded offsets
> is.... what exactly?

The difference is in how easy is to access that information. For
starters, write a function which will take a pointer to any class object
and print out its class definition. This can be done quite easily in
some other languages.

boltar@nowhere.co.uk: Jul 08 03:01PM

On Wed, 8 Jul 2020 09:25:19 +0000 (UTC)

>"otherwise on the fly objects could never be created"

>would imply that classes can only be instantiated if their exact internal
>structure is known, else it's impossible.

It doesn't imply anything of the sort. However the binary needs some
representation of the class and if not stripped it'll also contain the function
and attribute names too otherwise debuggers wouldn't work.

boltar@nowhere.co.uk: Jul 08 03:03PM

On Wed, 8 Jul 2020 15:28:46 +0300
>starters, write a function which will take a pointer to any class object
>and print out its class definition. This can be done quite easily in
>some other languages.

In interpreted languages like python anything is possible because their
object format isn't constrained by the OS executable format but if the
language is compiled down to a binary there are limitations on what can be
stored in the binary unless the OS supports it. Eg MacOS supports a large
amount of exe metadata.

Observable end padding in arrays

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 07 04:53PM -0700

>> language. I'm not aware of anything in the C++ core language that's
>> defined by the C standard.

> Minimum ranges of integer types.

Fair enough -- though that's also an example of inheriting the contents
of <limits.h> from the C library.

> For the C++ standard it's explicitly stated in normative text, but
> implicitly referring to some other unspecified text via "this
> implies".

Right, and I'm trying to figure out what the "this implies" refers to.

> fix this in C++11 but as I recall they didn't manage to clear it up
> completely, although that would not be hard to do -- which IMHO on its
> own brings the competence of the committee into question.

Hmm. C++17 8.3.1 [expr.unary.op] :
The unary * operator performs indirection: the expression to which
it is applied shall be a pointer to an object type, or a pointer to
a function type and the result is an lvalue referring to the object
or function to which the expression points.
C++11 has similar or identical wording. This defines the behavior when
there is such an object or function. By failing to define the behavior
when there is no such object or function, it leave the behavior
undefined by omission. A note under the definition of undefined
behavior (3.27 [defns.undefined]):
Undefined behavior may be expected when this document omits any
explicit definition of behavior or when a program uses an erroneous
construct or erroneous data.

I wouldn't mind if the standard were a bit more explicit about
dereferencing a null pointer (or any other pointer that doesn't
point to an object or function), but it seems unambiguous as it is.
My only complaint might be that the wording assumes that the object
or function exists rather than saying what happens *if* it exists.

[snip]

My guess is that the authors of the standard thought it was so obvious
that arrays can't have padding at the end that they didn't bother to
state it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Manfred <noname@add.invalid>: Jul 08 01:43PM +0200

On 7/7/2020 9:29 PM, James Kuyper wrote:
> unnamed padding within a structure object, but not at its beginning."
> (6.7.2.1p15) and "There may be unnamed padding at the end of a structure
> or union." (6.7.2.1p17). There are no such statements for arrays.

Which is "some other statement in the C standard" like I mentioned in my
previous post.

> padding is permitted between the members of a struct and at the end
> implies that such padding is not permitted for arrays, for which no such
> exceptions have been specified.

I understand your point, however this would assume a very high level of
self-consistency of the standard, a level that I am not 100% confident I
can acknowledge. That said, I tend to consider the C standard somewhat
more solid than the C++ one.

Somehow related to your position (an exception implies a rule even if
the rule is missing), the approach of the C++ standard according to
which "Undefined behavior may be expected when this document omits any
explicit definition of behavior..." puts a lot of responsibility on the
standard committee, probably too much given how controversial the UB
topic has become.
Expecially given the meaning of UB when it comes to compilers: it is a
very well-defined Bad Thing™.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 08 10:37AM -0400

On 7/8/20 7:43 AM, Manfred wrote:
>>> On 7/7/2020 3:57 PM, James Kuyper wrote:
>>>> On 7/6/20 11:14 PM, Alf P. Steinbach wrote:
>>>>> On 07.07.2020 04:21, Tim Rentsch wrote:
...
>> or union." (6.7.2.1p17). There are no such statements for arrays.

> Which is "some other statement in the C standard" like I mentioned in my
> previous post.

Yes, but your wording "this may well follow from" implied uncertainty
about the existence of that other statement, uncertainty which I hope
I've removed.

> more solid than the C++ one.

> Somehow related to your position (an exception implies a rule even if
> the rule is missing), ...

While that is true in general, the rule should not actually be missing
in this case - if it were, that would constitute a defect in the
standard. I argue that the rule is actually present as an implication of
the "An array type describes ..." clause, for precisely the reasons I
gave in my original post. It's merely inobvious and correspondingly
debatable. The existence of an exception provides evidence in support of
my side in that debate; but if I were wrong in my interpretation of the
"describes" clause, then the exception would not make me right.

Link with library of exact filename (i.e. exact version)

Frederick Gotham <cauldwell.thomas@gmail.com>: Jul 08 05:32AM -0700

On Tuesday, July 7, 2020 at 6:57:16 PM UTC+1, Manfred wrote:

> Usually the GNU linker puts the SONAME of the shared library in the
> executable, and at runtime the dynamic linker loads the shared library
> by SO_NAME rather than by filename (see ld -soname).

I haven't investigated into this fully but I think you're right, Manfred. I used the program "patchelf" to change the "soname" inside my ".so" file, and now when I link with it, the resultant executable file is dependent upon the full filename.

Alternatively, and in similar fashion to "patchelf", I could have just altered a 5 - 10 bytes in the resultant binary to get it to look a file with a longer filename.

By the way, does anyone know if these two lines do exactly the same thing on Linux?

g++ -o prog main.cpp -L./ -l:libmonkey.so

g++ -o prog main.cpp ./libmonkey.so

From what I can see so far, these two commands are identical in effect.

Manfred <noname@add.invalid>: Jul 08 03:02PM +0200

On 7/8/2020 2:32 PM, Frederick Gotham wrote:
>> executable, and at runtime the dynamic linker loads the shared library
>> by SO_NAME rather than by filename (see ld -soname).

> I haven't investigated into this fully but I think you're right, Manfred. I used the program "patchelf" to change the "soname" inside my ".so" file, and now when I link with it, the resultant executable file is dependent upon the full filename.

If you are compiling and linking the .so yourself, then you'd probably
better use the "-soname" ld option rather than patching the binary
afterwards - if it is a 3rd party project, it is probably a good idea to
suggest this to the maintainer.

> g++ -o prog main.cpp -L./ -l:libmonkey.so

> g++ -o prog main.cpp ./libmonkey.so

> From what I can see so far, these two commands are identical in effect.

I would guess so, anyway the documentation of both gcc and ld is usually
pretty accurate, so if you want to be sure better check "info gcc" and
"info ld" (or man gcc, man ld)
If you find some inconsistency, usually bug reports do get consideration
on these projects - note that ld is part of binutils.
(BTW I think there is one between the manpages of ld and ld.so on this
very topic)

Frederick Gotham <cauldwell.thomas@gmail.com>: Jul 08 06:37AM -0700

On Wednesday, July 8, 2020 at 2:03:11 PM UTC+1, Manfred wrote:

> better use the "-soname" ld option rather than patching the binary
> afterwards - if it is a 3rd party project, it is probably a good idea to
> suggest this to the maintainer.

We contact the maintainer at most once a week and only when absolutely necessary. Furthermore the maintainer didn't intend for us to use multiple versions of the same library, so they probably like the "soname" that they currently are using.

As "patchelf" seems to work fine for my needs, this is the best option.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 08 10:24AM -0400

On 7/8/20 9:02 AM, Manfred wrote:
> On 7/8/2020 2:32 PM, Frederick Gotham wrote:
...
> on these projects - note that ld is part of binutils.
> (BTW I think there is one between the manpages of ld and ld.so on this
> very topic)

-L./ adds ./ to the list of locations that are searched for libraries.
-l:libmonkey.so tells it to look in the current list of locations for a
library named libmonkey.so.

Using ./libmonkey.so searches only for ./libmonkey.so

These two commands can do different things (depending upn the context)
for couple of reasons:
A. The -L option affects all subsequent library searches, not just this one.
B. the -l option causes other places to be searched for the library, not
just ./

Fixing some undefined behavior

woodbrian77@gmail.com: Jul 07 05:55PM -0700

On Tuesday, July 7, 2020 at 12:36:31 PM UTC-5, Mr Flibble wrote:
> On 07/07/2020 07:37, Keith Thompson wrote:

> > Any of them.
> There is only one homophobic misogynistic religious bigot in this thread. He is probably a racist Trump supporter too.

Not me. And I'd be a Trump supporter if I had to
choose between him and Biden. Thankfully there are
other options.

Brian

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jul 07 05:56PM -0700

> Not me. And I'd be a Trump supporter if I had to
> choose between him and Biden. Thankfully there are
> other options.

Holy Moly! One vs the Other. Well, thats fair. ;^)

David Brown <david.brown@hesbynett.no>: Jul 08 09:41AM +0200

On 08/07/2020 02:56, Chris M. Thomasson wrote:
>> choose between him and Biden. Thankfully there are
>> other options.

> Holy Moly! One vs the Other. Well, thats fair. ;^)

Again - /please/ stop feeding the trolls - both of them. (And you are
quickly joining that category.)

Both Brian and Mr. Flibble are capable of making sensible, on-topic
posts and contributing to the group. When they do that, by all means
join in.

But when they get together, they both write things that are clearly and
intentionally provocative, offensive, and non-productive. When they do
that, they are trolling. Do not encourage it.

I am, more than some regulars, quite happy with the occasional off-topic
thread in a technical group. But it must be /occasional/, interesting,
informative, enjoyed by many, and in its own thread that does not spoil
a technical thread. This thread does not fit on any count.

If Brian and Mr. Flibble want to fight, let them do it in private - both
their email addresses are accessible.

And if you want to respond to this (other than by simply not posting
more encouragement to such threads), you have my email address.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, July 8, 2020

Digest for comp.lang.c++@googlegroups.com - 19 updates in 5 topics

No comments:

Blog Archive

About Me