Sunday, June 28, 2015

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

"Öö Tiib" <ootiib@hot.ee>: Jun 27 04:42PM -0700

On Sunday, 28 June 2015 00:19:52 UTC+3, Mr Flibble wrote:
 
> Because the size and value range of 'int' can vary from platform to
> platform 'int' is inherently unsafe and non-portable. Use <cstdint>
> typedefs instead.
 
Use for what? Stop repeating those pointless round silver bullet dogmas.
 
Value range and byte size of types from <cstddef> can and do vary
from platform to platform but not in sync with 'int'. These are
therefore more safe and more portable for typical usages.
 
If someone used 'int' instead of 'ptrdiff_t' or 'size_t' then it
is indeed a mild defect but it can not be repaired by using anything
from <cstdint>.
"Öö Tiib" <ootiib@hot.ee>: Jun 27 04:47PM -0700

> > format.
 
> There are a lot of games that use binary formats. Scientific
> and medical applications often use binary formats.
 
All programs that use sounds or images technically use binary formats
but those are abstracted far under some low level API from programmers.
I did not mean that.
BGB <cr88192@hotmail.com>: Jun 27 08:46PM -0500

On 6/27/2015 1:01 PM, Öö Tiib wrote:
> possible compile-time and failing to generate competitive binaries.
 
> The libraries that may survive that trend longer are in-house libraries
> but those are not general purpose anyway.
 
I wasn't speaking either for or against templates.
 
 
> implement what it does. 32kB however can be quite lot of C++. Templates
> (that is what we mean with "generic" in C++) are powerful tool on that
> level.
 
ok.
 
but, yeah, it depends a lot on the MCU.
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Jun 28 03:12AM +0100

On 27/06/2015 23:05, Jens Thoms Toerring wrote:
> makes you look a bit "strange" to those that have used
> 'int' safely and in a portable manner for a quarter of
> a century.
 
My claims are not unsubstantiated: see the MISRA C++ coding standard for
safety critical systems: it bans the use of 'int' BECAUSE it is
inherently non-portable and therefore unsafe. Use the typedefs from
<cstdint> instead; DO NOT use 'int'.
 
/Flibble
BGB <cr88192@hotmail.com>: Jun 27 09:21PM -0500

On 6/27/2015 4:17 PM, Öö Tiib wrote:
> a value is in range 0..1000 then why not to use only 10 bits for it?
> Well-made bit-packing can be faster than 'memcpy' on modern platforms
> thanks to less cache misses.
 
bit-streams and variable-length numbers can also be useful, but are more
painful to work with.
 
bit-packing and byte-oriented formats can also do pretty good if done
well. though, yes, the code will need to take care of things like endianess.
 
sometimes there are tradeoffs, like one format which could have
technically been a little more space-efficient with differentially-coded
variable-width numbers for a table, but then needing to decode the table
prior to being able to use it. I instead used a design where the field
was 16/24/32 bits wide (for the whole table) depending on what was
needed for the target-index, but the fixed-width entries allow for
random access to the table.
 
this sort of things has in a few cases pointed towards arguably
old/archaic solutions, like using codepages (particularly 1252 and 437)
in a few cases due to fixed-width characters being cheaper for random
access than UTF-8, but less overly expensive vs UTF-16, and sufficient
in the vast majority of cases (and are the cheapest option in cases when
they are an option). note that this would be done per-string.
 
 
> type for a portable binary format. Where 'uint8_t' exists at all there
> it is precisely 'unsigned char' so 'uint8_t' can be considered just a
> shorthand for 'unsigned char'.
 
yeah.
 
pretty much every data format comes down to bytes.
Richard Damon <Richard@Damon-Family.org>: Jun 28 09:40AM -0400

On 6/27/15 10:12 PM, Mr Flibble wrote:
> inherently non-portable and therefore unsafe. Use the typedefs from
> <cstdint> instead; DO NOT use 'int'.
 
> /Flibble
 
They also totally banned the use of 'char', including for holding text
strings. They later realized the sillyness of that and changed it, but
that shows they are not infallible.
"Öö Tiib" <ootiib@hot.ee>: Jun 28 06:43AM -0700

On Sunday, 28 June 2015 05:25:39 UTC+3, BGB wrote:
> painful to work with.
 
> bit-packing and byte-oriented formats can also do pretty good if done
> well. though, yes, the code will need to take care of things like endianess.
 
We have to take care of endianness in any portable format. It is
particularly funny since neither C nor C++ contain standard way for
detecting endianness compile-time. There are some libraries that use
every known non-standard way for that and so produce minimal code.
 
> was 16/24/32 bits wide (for the whole table) depending on what was
> needed for the target-index, but the fixed-width entries allow for
> random access to the table.
 
It indeed depends on anticipated amounts of data if it is worth that
or not but if we know that a table contains only for example values 0..11
then it feels reasonable at least to consider 4 bit wide entries. The
processors crunch numbers at ungodly speeds but it is 4 times shorter
table than one with 16 bit wide entries.

> access than UTF-8, but less overly expensive vs UTF-16, and sufficient
> in the vast majority of cases (and are the cheapest option in cases when
> they are an option). note that this would be done per-string.
 
How we moved somehow now from int32_t of Leigh to binary representations
of text? :) But that is interesting topic as well. Have you noticed that
with text the need for random access of contents is actually rather rare
case? OTOH storage for texts can be significant if there are lot of texts
or lot of translations. Number of PC software let to download and install
translations separately or optionally.

Above can't be done with embedded system so easily since it can affect price
of unit to organize flashing the very same product differently for each
market targeted. When access is sequential then polished Huffman decoding
does actually rarely affect performance. So I have seen embedded systems
keeping the text dictionaries Huffman encoded all time. If to keep texts
Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
are no need for archaic tricks like Windows-1252 or Code Page 437.
Rosario19 <Ros@invalid.invalid>: Jun 28 03:53PM +0200

On Sun, 28 Jun 2015 06:43:29 -0700 (PDT), 嘱 Tiib wrote:
 
>We have to take care of endianness in any portable format.
 
i'm not agree
because in every endianess in the same unsigned type
and the same size
+/-*
would be the same, despite endianes
number & 8
would be the same
so wuold be ok for | too and not etc
etc
 
where is the problem with endianess of the number?
"Öö Tiib" <ootiib@hot.ee>: Jun 28 06:54AM -0700

On Sunday, 28 June 2015 16:40:45 UTC+3, Richard Damon wrote:
 
> They also totally banned the use of 'char', including for holding text
> strings. They later realized the sillyness of that and changed it, but
> that shows they are not infallible.
 
Argument from authority is logically fallacious anyway and MISRA isn't
in list of authorities for me.
 
My feelings about MISRA C++ coding standard were that they had some
interesting points in it and so it was worth reading but it also
contained several apparent controversies that were not thought thru
properly so it can not be used in practice without fixing it.
legalize+jeeves@mail.xmission.com (Richard): Jun 28 02:40AM

[Please do not mail me a copy of your followup]
 
alf.p.steinbach@gmail.com spake the secret code
 
>Thanks for these evaluations. As you can see from the posted code,
>they're not universally agreed on.
 
Many people simply immitate what they see and are unaware that endl
performs buffer flushing.
 
>The first rule of optimization is: MEASURE.
 
True, but doing needless work is always a bad idea.
 
My point is not "performance", but asking the computer to do something
that isn't needed.
 
If we saw code that did:
 
++i;
--i;
 
We would call this out as stupid. It asks the computer to do needless
work for no purpose.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
legalize+jeeves@mail.xmission.com (Richard): Jun 28 02:45AM

[Please do not mail me a copy of your followup]
 
no@notvalid.com spake the secret code
>the famous book "The c++ standard library" Nicolaoi Josuttis seems to
>always use std:end everywhere. So seems like there is no agreement what
>is best...
 
True, he has endl all over his exampels. But when you see endl, you
should ask yourself: why are we flushing the buffer?
 
>Can anybody tell why would somebody want to flush the stream with end?
>what is the benefit of that and in what situation? Can somebody show an
>example where it would be beneficial?
 
If you don't flush the buffer with endl, then it is possible that you
won't see the output as you execute the code in the debugger. It is
also possible that buffered output will not reach its intended
destination if the program crashes before the buffer has been flushed.
This is why clog flushes on '\n' whether you use endl or not.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
legalize+jeeves@mail.xmission.com (Richard): Jun 28 02:56AM

[Please do not mail me a copy of your followup]
 
(Richard) legalize+jeeves@mail.xmission.com spake the secret code
 
>True, he has endl all over his exampels. But when you see endl, you
>should ask yourself: why are we flushing the buffer?
 
Also, I submit that *noone* writes C code like this:
 
printf("%d\n", i);
fflush(stdout);
 
printf("%s\n", s);
fflush(stdout);
 
...and so-on.
 
If we wouldn't flush the buffer on every line in C, or in any other
language that supported buffered I/O (C#, Java, etc.), why are we
chronically doing this in C++?
 
I submit it is simply because people are immitating what they see
around them without thinking about it.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Rosario19 <Ros@invalid.invalid>: Jun 28 08:31AM +0200

On Sat, 27 Jun 2015 04:37:27 +0000 (UTC),
>the buffer.
 
>I routinely see people using endl all over the place when they only
>intended to output '\n'.
 
the error is in make '\n' flush the buffer in C code too, if i
remember well.
 
so i say: "\n would have no flush, endl would not exist"
 
all flush has to be in fflush(stream) or at end of program by compiler
add one instructionn for flush close all streams open etc
 
Rosario19 <Ros@invalid.invalid>: Jun 28 08:41AM +0200

On Sun, 28 Jun 2015 02:56:00 +0000 (UTC), (Richard) wrote:
 
>chronically doing this in C++?
 
>I submit it is simply because people are immitating what they see
>around them without thinking about it.
 
i thought it and go against C standard library for that, in my
implementation, if i remember well
JiiPee <no@notvalid.com>: Jun 28 09:27AM +0100

On 28/06/2015 03:45, Richard wrote:
> also possible that buffered output will not reach its intended
> destination if the program crashes before the buffer has been flushed.
> This is why clog flushes on '\n' whether you use endl or not.
 
I tried couple of suggestions, like:
 
for()
{
doSomethingHeavy();
cout<<"hi";
}
 
but gcc (release) always worked the same with or withour flush. So is
there any example I could test to see the difference?
Rosario19 <Ros@invalid.invalid>: Jun 28 11:50AM +0200

>}
 
>but gcc (release) always worked the same with or withour flush. So is
>there any example I could test to see the difference?
 
cout<<"this is a question:"; cout.flush();
cin >> variable;
 
and
 
cout<<"this is a question:";
cin >> variable;
JiiPee <no@notvalid.com>: Jun 28 01:52PM +0100

On 28/06/2015 10:50, Rosario19 wrote:
 
> and
 
> cout<<"this is a question:";
> cin >> variable;
 
lets test...
JiiPee <no@notvalid.com>: Jun 28 01:54PM +0100

On 28/06/2015 10:50, Rosario19 wrote:
 
> and
 
> cout<<"this is a question:";
> cin >> variable;
 
hmmm, there is no difference ... both run the same with gcc
Rosario19 <Ros@invalid.invalid>: Jun 28 03:02PM +0200


>On 28/06/2015 10:50, Rosario19 wrote:
>> On Sun, 28 Jun 2015 09:27:06 +0100, JiiPee <no@notvalid.com> wrote:
>> test to see the difference?
1)
>> cout<<"this is a question:"; cout.flush();
>> cin >> variable;
 
>> and
2)
>> cout<<"this is a question:";
>> cin >> variable;
 
>hmmm, there is no difference ... both run the same with gcc
 
if i remember well, if cout *is buffered*,
 
in case 1
it would print
"this is a question:"
and wait for the answer
 
in case 2
it would not print anything
""
and wait for the answer
that afther "the answer\n"
it would print
"this is a question:"
using the flush of end of program
 
but possible i'm wrong in something
JiiPee <no@notvalid.com>: Jun 28 02:22PM +0100

On 28/06/2015 14:02, Rosario19 wrote:
 
> in case 2
> it would not print anything
> ""
 
this did not happen
 
Richard Damon <Richard@Damon-Family.org>: Jun 27 08:07PM -0400


> In a formal view that's right, but in practice we do have vtables and per instance vtable pointers(not the only possible implementation of runtime polymorphism) and in practice we do have per instance virtual base class sub-object offset (again, not the only possible implementation). I don't know of any extant compiler that isn't that way. Admittedly, nowaydays I don't know about a great many compilers, but it would be truly remarkable if such a spirit-of-PHP-like C++ compiler had been /introduced/ since way back then.
 
> Cheers & hth.,
 
> - Alf
 
Actually, once you hit multiple inheretance, vtables tend to get a bit
more complicated, in my example (if we add some virtual functions), the
vtable for D would need have one table with several 'distinct' tables,
One pointed to by the actual instance of A, one put in B, and one in C.
Any new functions added in D would tend to be added after the entries
pointed by B, as would be done in single inheritance. For functions
defined in C that have been overridden in D, there needs to be something
to adjust the 'this' pointer (as D::this would point to the beginning of
the full object, while C::this would point to the beginning of the C
sub-object, since that might be all that is known at the call site.).
 
Similar VTables could be used for the offsets of the virtual bases. You
need to use a different pointer, as the virtual function pointer needs
to evolve as the object is created, while constructing C, it points to
the 'C' vtable, but when we move to constructing 'D' it points to the
'D' vtable. The virtual base offset table on the other hand needs to
stay constant throughout the construction process (as the layout is
fixed), being set by the 'topmost' constructor and left by the rest (as
they also skip calling the A constructor). Whether it is better to use a
pointer to the sub object or a pointer to an offset table depends on how
many virtual bases you expect in an object with virtual bases. If you
only have a single virtual base, the object pointer is simpler, and
saves the 'constant' cost of the offset table. If object frequently have
multiple virtual bases, then the table can gain the advantage of a
smaller object as you only need a single pointer to the description
object while the object pointer method needs a pointer to each virtual base.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Jun 28 02:26AM +0200

> It would be easy to check such an assumption before posting, especially since I posted code that you could trivially amend for the purpose, e.g. just add
 
BTDT.
 
> virtual ~A() {}
 
> in class A, then compile and run.
 
> I'm a big fan of checking reality. ;-)
 
Your test is not valid, as it introduces the vtable into A rather than Bv.
Of course, the vtable of a virtual base cannot be joined.
 
> Explanation why the vtable is (sually not used to store the offsets:
[...]
 
Obviously gcc on OS/2 didn't know that.
 
 
Marcel
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 06:11AM +0200

On 28-Jun-15 2:07 AM, Richard Damon wrote:
> Actually, once you hit multiple inheretance, vtables ...
 
Uhm, TLDR, but, in case you or other readers are not familiar with it,
the common object representations for C++ were reportedly discussed in a
good, clear way in
 
* Stanley B. Lippmann's 1996 "Inside The C++ Object Model",
 
<url:
https://books.google.no/books/about/Inside_the_C++_Object_Model.html?id=hLdmQgAACAAJ&redir_esc=y>.
 
A simpler exposition of just single inheritance structures was given in
my old 2005 pointers tutorial, once referred by Wikipedia but then
offline for a long time until I put it on Google Drive,
 
<url:
https://drive.google.com/file/d/0B2oiI2reHOh4M2MzNzYwYzQtMGZkNC00NTljLWJiM2UtOGI0MmRkMTMyZGY4/view?ddrp=1&pli=1>.
 
 
Cheers & hth.,
 
- Alf
[Sorry, I first hit "Reply" instead of "Follow up"]
 
--
Using Thunderbird as Usenet client, Eternal September as NNTP server.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 06:15AM +0200

On 28-Jun-15 2:26 AM, Marcel Mueller wrote:
 
>> in class A, then compile and run.
 
>> I'm a big fan of checking reality. ;-)
 
> Your test is not valid, as it introduces the vtable into A rather than Bv.
 
Well, your argument was that
 
<quote>
as soon as you have at least one virtual function it makes no difference
in size anymore
</quote>
 
And that's now proved incorrect. The example is very valid for that.
I.e. for its purpose.
 
It's rather uncommon to have a non-polymorphic topmost base that one
derives virtually from. I can't think of any use case. But I imagine
that it can occur with template code.
 
On the other hand, when I do introduce polymorphism down in Bv and keep
A as non-polymorphic, as you now suggest, then MinGW g++ 5.1 produces 16
16, apparently using the vtable pointer to encode the offset, while
Visual C++ 2015 produces 12 16, apparently storing an offset value.
 
 
> Of course, the vtable of a virtual base cannot be joined.
 
Not sure what you mean here. Maybe that using the vtable pointer in A to
encode the sub-object offset would introduce a catch-22? One would need
it in order to find the offset that would allow one to find it.
 
But nothing prevents the compiler from introducing a vtable pointer in
each derived class. That would prevent overhead in member function
calls. And this vtable pointer could be used to encode the offset.
 
On the third hand, the reported sizes indicate that neither MingW g++
nor Visual C++ do that.
 
 
>> Explanation why the vtable is (sually not used to store the offsets:
> [...]
 
> Obviously gcc on OS/2 didn't know that.
 
Uhm, a single example that "usually" is not "all" isn't valuable in
itself. But it might just be that I'm wrong about the "usually". It's an
assessment based on what I remember about compilers, simple logic about
practicality and efficiency, the fact that having a topmost
non-polymorphic virtual base is unusual, and that Visual C++ does store
some offset information in each object in addition to vtable pointer, as
does MingW g++ for the case of a polymorphic topmost base.
 
 
Cheers & hth.,
 
- Alf
 
[Sorry, I inadvertently first hit "Reply" instead of "Follow up"]
 
--
Using Thunderbird as Usenet client, Eternal September as NNTP server.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Jun 28 11:46AM +0200

On 28.06.15 06.15, Alf P. Steinbach wrote:
> </quote>
 
> And that's now proved incorrect. The example is very valid for that.
> I.e. for its purpose.
 
Yes there are constraints. The virtual method must belong to the same
block of memory. I.e. to the same class or to a non virtual base. There
is an additional hack to combine the vtable pointers of a class with its
base class. But this only works for single inheritance.
 
So as long as A is non virtual it makes no difference whether B has a
virtual method or derives virtual or both. But as soon as A has also
virtual methods (or derives virtual) then the size increases by one
machine size word. Well, except for the case Bvnv where the virtual
tables of A and Bvnv can be combined.
 
struct A
{
int data;
//virtual ~A() {}
};
 
struct Bnv: public A
{
int data2;
};
 
struct Bv: virtual A
{
int data2;
};
 
struct Bvnv: A
{
int data2;
virtual ~Bvnv() {}
};
 
struct Bvv: virtual A
{
int data2;
virtual ~Bvv() {}
};
 
#include <iostream>
int main()
{
using namespace std;
cout << sizeof( Bnv ) << " " << sizeof( Bv ) << " " << sizeof(
Bvnv ) << " " << sizeof( Bvv ) << endl;
}
 
 
> But nothing prevents the compiler from introducing a vtable pointer in
> each derived class. That would prevent overhead in member function
> calls.
 
No, there is no overhead when joining vtables of base and derived. The
vtable of the derived class simply contains the vtable of the base at
the start. There is no additional indirection.
 
> And this vtable pointer could be used to encode the offset.
 
Of course, the offsets could be stored in the vtable. And well, this is
an additional indirection since first the vtable address has to be
loaded and then the offset. However, the same applies for any virtual
function call as well. But inlining the vtable in every object also has
drawbacks. First it increases the memory footprint, but now the same
values are read from different memory locations. This causes the memory
cache efficiency to decrease.
In case of virtual base classes there is another difference. Why should
one store the instance independent offsets in the class instance?
Instead the pointers to the virtual base could be stored directly. This
saves one integer addition - probably zero or one clock cycle on
nowadays CPUs.
 
 
> On the third hand, the reported sizes indicate that neither MingW g++
> nor Visual C++ do that.
 
I would not be that sure until I had a look to the assembler output.
Maybe it is not that easy.
 
 
>> Obviously gcc on OS/2 didn't know that.
 
> Uhm, a single example that "usually" is not "all" isn't valuable in
> itself.
 
A Raspberry Pi (ARMhf) and LM17 (x86) and Debian Wheezy (x64) show the
same results. But you are right, all of them use some flavor of gcc. So
I guess all gcc versions do it the same way. Tested with different
versions between 3.2.2 and 4.9.2.
But you are right. Other compilers handle it differently. I.e. IBM VAC++
allocates 20 bytes rather than 12 (gcc, 32 bit) for the last test case.
But this one is more than 15 years old.
 
 
> [Sorry, I inadvertently first hit "Reply" instead of "Follow up"]
 
Since you are not in the white list for this address, I did't notice.
 
 
Marcel
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: