Monday, May 27, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

Hans Bos <hans.bos@xelion.nl>: May 22 07:24PM +0200

Op 22-5-2019 om 14:23 schreef Paavo Helde:
>> *always* outputs the exact amount of digits to represent the value
>> accurately.
 
> Yes, hexfloat is defined as an exact representation.
 
But there is no guarantee that the radix is 2.
Suppose my system has doubles with radix 10.
What, in that case, is the exact hex representation of 0.1?
James Kuyper <jameskuyper@alumni.caltech.edu>: May 22 08:40AM -0400

On 5/22/19 2:40 AM, Juha Nieminen wrote:
...
> How would you know, using standard C/C++, how many digits do you need to
> output in order to ensure no loss of bits when reading the value back?
 
#include <float.h>, and look at the value of FLT_DECIMAL_DIG,
DBL_DECIMAL_DIG, or LDBL_DECIMAL_DIG, as appropriate. Paavo has already
given you the modern C++ equivalent, but this will also work with C and
with older versions of C++.
 
 
...
> It is my understanding that hexadecimal floating point representation
> *always* outputs the exact amount of digits to represent the value
> accurately.
 
It would be more accurate to say that his is true by default. If you
specify a particular length, it will obey your specification, whether or
not you specify enough digits to meet that requirement.
Paavo Helde <myfirstname@osa.pri.ee>: May 22 04:42PM +0300

On 22.05.2019 16:10, Bart wrote:
 
> Here's one very basic example: if A is signed, and B unsigned, then my
> language says that A+B is performed as signed, with overflow
> well-defined, and at at least 64 bits.
 
I don't question your design, I'm just curious: what would be the use
case of 64-bit signed wrapover? I.e. in what situation is it useful to have
 
9223372036854775807 + 1 == -9223372036854775808
 
For unsigned wrapover in C and C++ at least there is a use case for
emulating hardware bit registers, or to have an automatic reset for some
generated ID numbers.
 
 
> extra ones.
 
> With + it doesn't matter, but what about * or /? And we've only just
> tried to translate A+B!
 
Looks like this should be translated to C++, not to C, with appropriate
C++ number-like classes and custom arithmetic operators.
 
 
> So it is easy to see that it can be a considerably bigger pain to
> generate perfectly correct C, than to generate ASM.
 
I personally have found almost *everything* a bigger pain in C than,
say, C++.
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 09:17PM +0100

> (7.1) -- the glvalue is used to access the object, [...]
 
> Taken together I think these passages make the case pretty
> airtight.
 
Ah, thank you so much. I hope this did not take too long. I thought it
would be the result of a number of passages but it never occurred me to
to look at lifetime.
 
--
Ben.
scott@slp53.sl.home (Scott Lurndal): May 22 06:19PM


>> Actually at least one major processor vendor has been thinking about
>> changing this in the future.....
 
>Details, please?
 
Unfortunately they're not public.
 
However, there are public projects that propose new pointer
formats, for example:
 
https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-faq.html
Paavo Helde <myfirstname@osa.pri.ee>: May 22 08:52PM +0300

On 22.05.2019 20:35, Bart wrote:
 
> common sense compilers.
 
> Is /that/ is supposed to be preferable?
 
> How on earth could an clear 0 to 9 loop turn into an endless loop?
 
Just to be clear: I do not advocate having UB. I advocate producing an
error if the operation cannot be completed as intended. The ideal output
would be:
 
2147483644
2147483645
2147483646
2147483647
Program terminated because of uncaught exception: numeric overflow in
'a++;' with a=2147483647.
David Brown <david.brown@hesbynett.no>: May 22 12:18AM +0200

On 21/05/2019 21:53, Scott Lurndal wrote:
 
> Actually at least one major processor vendor has been thinking about
> changing this in the future.....
 
> And it certainly wasn't true in the past.
 
It is not true at the moment either. There are more processors around
than just x86 and ARM. (I know you, Scott, know this - I am expanding
on your post, not correcting it.)
 
And of course, the size of pointers has absolutely /nothing/ to do with
the undefined nature of trying to access an object through a pointer to
a different type.
Ben Bacarisse <ben.usenet@bsb.me.uk>: May 20 09:21PM +0100


>> I consider that to be allowed. There's no way that C can't specify the
>> result, so this is as "defined" a construct as it can.
 
> Did you mean "there's no way that C /can/ specify the result" ?
 
<sigh> Yes I did. Far too many of my typos negate my meaning.
 
> If so, then that is not quite true - C99 (and C11) specify it better,
> even though the final result is still implementation dependent.
 
Sure, they are clearer in a footnote (I take it you refer to the
"reinterpret the bits" footnote). Given the context (the late 80s) I
don't think anyone was ever in much doubt about what the implementation
defined result would be. Those were simpler times!
 
<cut>
--
Ben.
Keith Thompson <kst-u@mib.org>: May 22 10:41AM -0700

>>every type and always will be.
 
> Actually at least one major processor vendor has been thinking about
> changing this in the future.....
 
Details, please?
 
> And it certainly wasn't true in the past.
 
True.
 
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
scott@slp53.sl.home (Scott Lurndal): May 22 06:09PM


>>It is not true at the moment either. There are more processors around
 
>So which architectures have a variable number of memory addressing bits
>depending on what C type is stored at the address then?
 
More often than not, it is function pointers that are a different size than
pointers to other objects; but it depends entirely on the architecture.
 
Some (now basically extinct) burroughs systems, for example, have 80-bit
function pointers, but 32-bit data pointers. Not uncommon for segmented
architectures (a la 80286).
 
I can't address future processor vendor work in this area without violating non-disclosure
agreements, unfortunatly.
Juha Nieminen <nospam@thanks.invalid>: May 22 06:40AM

> floating-point values, but AFAIK these bugs got fixed in the C runtime
> libraries about 10-20 years ago or so. Plus there are libraries which
> ensure the minimum number of decimal digits for perfect round-trip.
 
How would you know, using standard C/C++, how many digits do you need to
output in order to ensure no loss of bits when reading the value back?
(And this is assuming that the C or C++ standard library being used has
been implemented such that given enough decimal digits, they will be
rounded to the correct direction as to restore the original value
exactly.)
 
It is my understanding that hexadecimal floating point representation
*always* outputs the exact amount of digits to represent the value
accurately.
Bonita Montero <Bonita.Montero@gmail.com>: May 22 07:20AM +0200


> That depends on what you mean by "syntactic sugar". static_cast
> will carry out pointer adjustment so it can be used to navigate
> an inheritance graph correctly.
 
"syntactic sugar" was related to the case above.
Ian Collins <ian-news@hotmail.com>: May 22 04:33PM +1200

On 22/05/2019 12:07, Bart wrote:
>> bugs nonetheless.
 
> You can call a unwillingness to expend a huge, disproportionate effort
> in overcoming C's many shortcomings for this purpose a bug if you like.
 
That's never stopped you expending a huge, disproportionate effort in
whinging about C. That time would have easily been enough to fix your code.
 
 
> g++ (no options):
 
> t.c:66:5: error: invalid conversion from 'void (*)()' to 'void*'
> [-fpermissive]
 
The conversion is "conditionally-supported" in C++>=11 which makes it a
"program construct that an implementation is not required to support".
 
Thus:
$ clang++ -std=c++98 -Wall -Werror -Wextra -pedantic /tmp/x.cc
 
/tmp/x.cc:6:14: error: cast between pointer-to-function and
pointer-to-object is an extension [-Werror,-Wpedantic]
t_fnptr = (void*)(&puts);
^~~~~~~~~~~~~~
1 error generated.
 
$ clang++ -std=c++11 -Wall -Werror -Wextra -pedantic /tmp/x.cc
$
 
--
Ian.
Ian Collins <ian-news@hotmail.com>: May 23 07:40AM +1200

On 23/05/2019 01:10, Bart wrote:
 
> Here's one very basic example: if A is signed, and B unsigned, then my
> language says that A+B is performed as signed, with overflow
> well-defined, and at at least 64 bits.
 
So use specialised C++ number classes. There are many things that can't
easily be expressed in C that are simple to do in C++.
 
--
Ian.
David Brown <david.brown@hesbynett.no>: May 22 12:13AM +0200

On 21/05/2019 20:56, Bart wrote:
 
>> Seems like a bug in the auto-translator.
 
> Just a mismatch of languages, even though C in this case superficially
> works the same way.
 
In other words, bugs in the auto-translator. The flaws are in the
design and specification, rather than the implementation, but they are
bugs nonetheless.
 
In order to translate code from one language to another, you need to
understand both languages. And you need to generate correct and valid
code - not something that looks a bit like what you would have liked the
target language to be.
 
 
> C as an intermediate language, even though it is very frequently used
> for that purpose, leaves a lot to be desired.
 
It is fine as a target language, but you need to generate correct C
code. (Alternatively, you need to generate "C for this compiler, this
target and these options" code - and be honest about it. That is a
perfectly reasonable solution, and the one used by most code generators.)
 
Other people who write code generators or translators that produce C
manage it. And when their generated code has flaws, they blame their
generators - not the language.
 
 
> The ones I had in mind were x64 and ARM64. I think I decided it would be
> simpler to target those two than to try and generate C code which would
> always compile warning-free and UB-free.
 
You understand how assembly works. You are willing to use features of
assemblers. You don't understand how C works. You are unwilling to use
many features of the language. It is not surprising that you find
generating assembly easier than generating C code.
 
> Are there are any others I'm likely to be able to program in consumer
> equipment?
 
Since your languages and tools are for you alone, it is up to you to
answer that one.
 
> language designed to work with every conceivable architecture, past,
> present and future, and which therefore have to designate as UB,
> behaviour which cannot be guaranteed to work across all of them.
 
I agree that code rarely has to be very portable. Of course, I disagree
about your characterisation of UB - in particular, it does not make
sense to suggest that code with undefined behaviour could work at all.
By the meaning of the words, code with undefined behaviour does not have
any definition of what it is supposed to do, and therefore cannot be
considered to "work". At best, you mean the code should do what it
looks like you think it should do. That might be okay to a human
reader, but computers are fussy about definitions.
 
 
> As far as I'm concerned, any function pointer can be stored within the
> same space as a void* pointer on all targets I want this to run on. It
> should be a non-issue.)
 
C and C++ do not share your opinion - and you are asking the compiler to
treat your code as (approximately) standard C or C++. However, gcc (and
all other serious compilers) give you a lot of flexibility about
choosing warnings and other options, precisely to let you tune the
details of the language you want. If you want to generate code that
only works on platforms where you can store a function pointer in a
void* pointer (though I can't imagine why it would be useful), you can
tune your options to suit. Perhaps try with "-fpermissive" ?
Juha Nieminen <nospam@thanks.invalid>: May 21 01:52PM

> I didn't even know hexfloat existed. It seems a spectacularly useless
> manipulator. What on earth is the point of it?
 
If you save a floating point value in ascii usint the normal decimal
representation, in many (perhaps even most) cases there's a high
chance of losing accuracy when it's read back, for the simple
reason that the base-10 representation cannot accurately represent
every single base-2 floating point value.
 
Base-16 representation, however, can. It exactly represents the
original floating point value, to the last bit, and nothing is
lost in the conversion to either direction.
 
Its advantage is that it's agnostic to the actual floating point
value binary representation in the hardware (eg. it doesn't
assume that it's an IEEE floating point value of a given size).
Thus exact floating point values can be transferred between
computers that may use different native floating point
formats.
 
If you save the floating point bits as raw data, you'll at the
very least run into the problem of endianess, and of course you'll
be assuming that both the source and target architectures use the
exact same internal floating point bit representation.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 21 08:47PM +0100

On Tue, 21 May 2019 19:28:17 +0000 (UTC)
> >-fno-strict-aliasing switch is applied or not.
 
> And someone else explained what happened which was nothing to do with type
> punning.
 
Nonsense - see below.
 
> every type and always will be.
 
> However, feel free to provide a proper example where it fails or you can just
> keep on farting out indignant hot air. You choice.
 
No, still wrong I am afraid. I repeat: "The 'it' which doesn't work is
type punning through casting pointers." It doesn't work because it
doesn't work, end of story. Your bluster that when the standard says
incorrect aliasing gives rise to undefined behaviour it really means
that it works fine in your toy programs, is crap.
 
And the example to which I referred was corrected (20 May 2019 21:49:30
+0200) and did demonstrate the strict aliasing issue.
 
If you want another example, there is one in the posting of another
person:
 
int foo( float *f, int *i ) {
*i = 1;
*f = 0.f;

return *i;
}
 
int main() {
int x = 0;

std::cout << x << "\n"; // Expect 0
x = foo(reinterpret_cast<float*>(&x), &x);
std::cout << x << "\n"; // Expect 0?
}
 
You will find many similar examples in articles on the internet about
strict aliasing.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: May 21 08:01PM +0200

On 21.05.2019 18:20, Paavo Helde wrote:
> *fp = 1.0f;
 
> and one should use placement new instead:
 
> new (p) float {1.0f} ;
 
The initialization of `fp` wouldn't compile as C++, but let's assume a
`static_cast` or `reinterpret_cast` there. I favor the latter since it
communicates better to the reader, but I believe for reasons having to
do with a shortcoming of C++03 Herb Sutter and Andrei Alexandrescu
recommended used `static_cast` in their old coding guidelines book.
 
 
> > Now that I find hard to crasp. If this is true, how is it even possible
> to write e.g. a custom memory allocator?
 
Magic is indeed performed in a `new` expression: it transforms a `void*`
produced by an allocator function, to a typed pointer.
 
Another place this magic occurs, is in the member functions of a
`std::allocator`. At least in C++03. I'm not as up-to-date as I should
be to participate in C++ discussions.
 
Anyway, even placement `new` doesn't save one from UB when there is an
object other than byte in that memory chunk, and one obtains a pointer
to it of an unrelated pointee type. Wham bang, you're formally dead.
 
On the other hand, when there is no object of type other than bytes,
then `reinterpret_cast` is technically good and so is placement `new`.
 
 
> std::vector<float>::data())? Do I really need to perform a dummy
> placement new in the beginning of the memory block, to obtain a valid
> float* pointer?
 
Nah, just FUD.
 
 
Cheers!,
 
- Alf
David Brown <david.brown@hesbynett.no>: May 20 09:57PM +0200

On 20/05/2019 20:20, Tim Rentsch wrote:
> changed in C90, or changed between C90 and C99, such a change
> surely would have been mentioned in the Rationale documents.
> AFAICT there isn't any.
 
I don't think that all the changes in C99 are covered in the rationale
documents (at least, not that I have seen). However, I am happy to
believe that the intended behaviour for unions has not changed between
C90 and C99, and it is merely the wording that has been made clearer.
Keith Thompson <kst-u@mib.org>: May 22 10:15AM -0700


> You can already do that with standard hex:
 
> float f = 1.234;
> cout << hex << *((long *)&f) << endl;
 
That assumes that long and float have the same size (which is not
guaranteed) and that &f is correctly aligned for a long* (which is also
not guaranteed). If it happens to work, it prints the *representation*
of f, not its value (if long has no padding bits or trap representations).
 
If you want a safe way to print its representation, you can reinterpret
it as an array of unsigned char -- but the result is still not directly
usable on a system with a different representation for float.
 
std::hex gives you an unambiguous character sequence representing the
*value* of f.
float f = 1.234;
std::cout << std::hexfloat << f << std::endl;
The output I get is
0x1.3be76cp+0
which also happens to be a valid literal. It's exact if the
floating-point radix is two (or a power of two).
 
An exact decimal representation of that value is
1.2339999675750732421875 (on a typical system). You could print "1.234"
and it would *probably* yield the same value if converted back from a
string to a float.
 
Hexadecimal floating-point isn't as human-readable as decimal, of course
(for most humans). But it's unambiguous, and it means you don't have to
think about representation or about loss of precision when converting
back and forth between binary and decimal.
 
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
David Brown <david.brown@hesbynett.no>: May 22 01:22PM +0200

On 22/05/2019 02:07, Bart wrote:
>> bugs nonetheless.
 
> You can call a unwillingness to expend a huge, disproportionate effort
> in overcoming C's many shortcomings for this purpose a bug if you like.
 
You are happy to classify your wilful and determined ignorance of C as a
bug in yourself? Okay, I suppose.
 
Certainly the idea that this is all a "huge, disproportionate effort" is
your own personal problem. Undefined behaviours in C are mostly quite
clear and obvious, you rarely meet them in practice, and they are mostly
straightforward to handle. For a language generator, they are peanuts
to deal with. These have been explained to you countless times.
 
Of course, dealing with them nicely and efficiently involves macros and
the C preprocessor. But it is apparently far better to whine and moan
about deficiencies in C than to use the features of C to get what you need.
 
> not-quite-so-unorthogonal type system, which is 32-bit-based even when
> the final target is 64-bit, with its million and one quirks, and which
> doesn't quite match that of the target language.
 
C is not based on any hardware model - it is more abstract. Yes,
putting that in between the two layers that have matching models will
cause complications, and you will have to be careful to get it right.
But as abstract models go, C's is not difficult to comprehend.
 
 
>> It is fine as a target language, but you need to generate correct C code.
 
> Which means what? So that there are 0 errors and 0 warnings no matter
> what options somebody will apply?
 
No. It means that there are no errors in the code, based on whatever
restrictions you might want to place on how it is used. If you want to
generate fully portable C code (matching a particular standard), then do
so. If you want to generate code that has limitations on the compiler
or flags needed, then do so - but make sure that you document the
restrictions. Far and away the best choice here is to use conditional
compilation and compiler detection. For example, if you want to allow
casting between different pointer types to work for punning, and you
want wrapping overflow behaviour to match your source language, then try
something like this:
 
#ifdef __GNUC__
/* Set options needed by gcc and clang for desired C variant */
#pragma GCC optimize "-fno-strict-aliasing"
#pragma GCC optimize "-fwrapv"
#pragma GCC diagnostic ignored "-Wformat"
#elif defined(_MSC_VER)
/* Set options needed by MSVC for desired C variant */
#elif defined(_BART_C)
/* Bart's C compiler already supports Bart C */
#else
#error Untested compiler - remove this and compile at your own risk

No comments: