Sunday, June 28, 2015

Digest for comp.lang.c++@googlegroups.com - 22 updates in 6 topics

"Öö Tiib" <ootiib@hot.ee>: Jun 28 07:10AM -0700

On Sunday, 28 June 2015 16:53:35 UTC+3, Rosario19 wrote:
> so wuold be ok for | too and not etc
> etc
 
> where is the problem with endianess of the number?
 
Rosario, we did talk about keeping data in some portable binary
format. We have portable binary formats for to achieve that one
computer saves it to disk or sends over internet and other computer
reads or receives it and both understand it in same way.
 
Different computers may keep the numbers in their own memory with
different endianness. So when computer reads or writes the bytes
of binary format then it must take care that those are ordered
correctly. That is what we call taking care about endianness in
portable format.
BGB <cr88192@hotmail.com>: Jun 28 12:17PM -0500

On 6/28/2015 8:43 AM, Öö Tiib wrote:
> particularly funny since neither C nor C++ contain standard way for
> detecting endianness compile-time. There are some libraries that use
> every known non-standard way for that and so produce minimal code.
 
yeah.
 
I prefer fixed endianess formats, personally.
 
granted, probably most everyone else does as well, as most formats are
this way. just a few formats exist where the endianess depends on
whichever computer saved the file, with magic numbers to detect when
swapping is needed. I find these annoying.
 
 
some of my tools (script language and C compiler, as an extension) have
the ability to specify the endianess for variables and pointer types
(so, you can be sure the value is stored as either big or little endian,
regardless of native endianess), and implicitly also makes it safe for
misaligned loads/stores.
 
namely, if you know a value needs to be a particular way, then chances
are the person is willing to pay whatever CPU cycles are needed to make
it that way.
 
 
typically, for compile-time stuff, there is a mess of #define's and
#ifdef's for figuring out the target architecture and other things,
picking appropriate type-sizes and setting values for things like
whether or not the target supports misaligned access, ...
 
I guess it could be nicer if more of this were standardized.
 
 
> then it feels reasonable at least to consider 4 bit wide entries. The
> processors crunch numbers at ungodly speeds but it is 4 times shorter
> table than one with 16 bit wide entries.
 
could be, but the table entries in this case were fairly unlikely to be
that much below 16 bits (so 8 bit or smaller would not have been useful).
 
these were basically offsets within a TLV lump. where you would have one
TLV lump which contains lots of payload data (as an array of
variable-sized sub-lumps packed end-to-end), and a table to say where
each sub-lump is within that bigger lump (to better allow random access).
 
in most of the cases, the lumps were between kB or maybe up to a few MB,
so 16 or 24 bits are the most likely cases, and always using 32-bits "to
be safe" would be a waste.
 
it would be fairly easy to compress them further, but this would require
decoding them before they could be used, which was undesirable in this case.
 
 
> case? OTOH storage for texts can be significant if there are lot of texts
> or lot of translations. Number of PC software let to download and install
> translations separately or optionally.
 
yeah, probably should have been clearer.
 
this was for string literals/values in a VM.
 
in the predecessor VM, M-UTF-8 had been used for all the string literals
(except the UTF-16 ones), which mostly worked (since direct
per-character access is fairly rare), but it meant doing something like
"str[idx]" would take 'O(n)' time (and looping over a string
per-character would be O(n^2)...).
 
in the use-case for the new VM, I wanted O(1) access here (mostly to
make things more predictable, *), but also didn't want the nearly pure
waste that is UTF-16 strings.
 
however, the language in question uses UTF-16 as its logical model (so,
from high-level code, it appears as if all strings are UTF-16). in the
language, strings are immutable, so there is no issue with the use of
ASCII or similar for the underlying storage.
 
in C, it isn't really an issue mostly as C makes no attempt to gloss
over in-memory storage, so you can just return the raw byte values or
similar.
 
 
*: the VM needs to be able to keep timing latencies bounded, which
basically weighs against doing anything in the VM where the time-cost
can't be easily predicted in advance. wherever possible, all operations
need to be kept O(1), with the operation either being able to complete
in the available time-step (generally 1us per "trace"), or the VM will
need to halt and defer execution until later (blocking is not allowed,
and any operations which may result in unexpected behaviors, such as
halting, throwing an exception, ... effectively need to terminate the
current trace, which makes them more expensive).
 
for some related reasons, the VM is also using B-Trees rather than hash
tables in a few places (more predictable, if slower, than hashes, but
less memory waste than AVL or BST variants). likewise, because of their
structure, it is possible to predict in advance (based on the size of
the tree) approximately how long it will take to perform the operation.
 
 
> keeping the text dictionaries Huffman encoded all time. If to keep texts
> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
> are no need for archaic tricks like Windows-1252 or Code Page 437.
 
granted, but in this case, it is mostly for string literals, rather than
bulk text storage.
 
Windows-1252 covers most general use-cases for text (and is fairly easy
to convert to/from UTF-16, as for most of the range the characters map 1:1).
CP-437 is good mostly for things like ASCII art and text-based UIs.
 
for literals, it will be the job of the compiler to sort out which
format to use.
 
 
bulk storage will tend to remain in compressed UTF-8.
though a more specialized format could be good.
 
I had good results before compressing short fragments (such as character
strings) with a combination of LZ77 and MTF+Rice Coding, which for small
pieces of data did significantly better than Deflate or LZMA. however,
the MTF makes it slower per-character than a Huffman-based option.
 
basically, options like Deflate or LZMA are largely ineffective for
payloads much under 200-500 bytes or so, but are much more effective as
payloads get bigger.
JiiPee <no@notvalid.com>: Jun 28 08:01PM +0100

On 26/06/2015 20:39, Mr Flibble wrote:
> ... #include <cstdint> instead!
 
> /Flibble
 
Just watching Scott Meyers videos. He seems to also always use :
int a =9;
 
not
fastint32 a = 9;
 
if int was wrong, surely they would not teach people using int, right? :)
JiiPee <no@notvalid.com>: Jun 28 08:15PM +0100

On 28/06/2015 20:01, JiiPee wrote:
 
> not
> fastint32 a = 9;
 
> if int was wrong, surely they would not teach people using int, right? :)
 
also note that Scott recommends to use
auto a = 9;
 
so using auto. so letting the ccomputer to deside the type !!!
What will you say about this??? :)
woodbrian77@gmail.com: Jun 28 12:56PM -0700

On Saturday, June 27, 2015 at 6:47:38 PM UTC-5, Öö Tiib wrote:
 
> All programs that use sounds or images technically use binary formats
> but those are abstracted far under some low level API from programmers.
> I did not mean that.
 
I didn't mean that either.
 
 
Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 10:13PM +0200

On 28-Jun-15 9:15 PM, JiiPee wrote:
> auto a = 9;
 
> so using auto. so letting the ccomputer to deside the type !!!
> What will you say about this??? :)
 
Using `auto` to declare a variable without an explicit type communicates
well to the compiler but not to a human reader. Also it's longer to
write and read than just `int`. And one can't generally adopt this as a
convention, e.g. it doesn't work for a variable without initializer, so
it's not forced by a convention.
 
Therefore I consider it an abuse of the language.
 
As to why, I guess that Scott has to use all kinds of fancy features
just to grab and keep interest from the audience. And maybe so that
novices can ask "what's the `auto`, huh?", so that he can explain it.
Explaining things and giving advice is after all how he makes a living.
 
 
Cheers & hth.,
 
- Alf
 
--
Using Thunderbird as Usenet client, Eternal September as NNTP server.
JiiPee <no@notvalid.com>: Jun 28 09:22PM +0100

On 28/06/2015 21:13, Alf P. Steinbach wrote:
>> What will you say about this??? :)
 
> Using `auto` to declare a variable without an explicit type
> communicates well to the compiler but not to a human reader.
 
In Visual Studio you can hoover the mouse and see the real type quite
easily, also works on other compilers.
 
> Also it's longer to write and read than just `int`. And one can't
> generally adopt this as a convention, e.g. it doesn't work for a
> variable without initializer, so it's not forced by a convention.
 
think about a funktion returning the size of elements in and
container... you could wrongly put:
unsigend long getSize();
 
if it actually returs a 64 bit integer. auto would find the right type
straight away.
 
 
> Therefore I consider it an abuse of the language.
 
in many places it makes coding safer because the auto always finds the
correct type. You can get bugs by putting a wrong type... and people
have done that when reading forums.
 
> just to grab and keep interest from the audience. And maybe so that
> novices can ask "what's the `auto`, huh?", so that he can explain it.
> Explaining things and giving advice is after all how he makes a living.
 
with comples types it can increase safety, because auto always gets it
right. We might get the type wrong which migh cause bugs.
 
But auto definitely is not always best for sure even if somebody likes it.
JiiPee <no@notvalid.com>: Jun 28 09:23PM +0100

On 28/06/2015 21:13, Alf P. Steinbach wrote:
 
> Using `auto` to declare a variable without an explicit type
> communicates well to the compiler but not to a human reader. Also it's
> longer to write and read than just `int`.
 
but if you take an everage value of all types the auto would win big
time. on average auto makes typenames much shorter if all types are
considered.
jt@toerring.de (Jens Thoms Toerring): Jun 28 10:33PM

> > communicates well to the compiler but not to a human reader.
 
> In Visual Studio you can hoover the mouse and see the real type quite
> easily, also works on other compilers.
 
Please keep in mind that VS is an IDE with an attached compiler
(beside a lot of other things). So this won't work "on other
compilers", since a compiler is a program to comile code and
not something you can "hoover over" with the mouse. You may
be surprised, but not everyone is using an IDE (for various
reasons) - or even a graphical user interface - all of the
time (and thus a mouse or something similar)...
 
> unsigend long getSize();
 
> if it actually returs a 64 bit integer. auto would find the right type
> straight away.
 
If you define a variable and assign to it the return value
of a function then it's relatively clear what the type will
be - it can be easily found out by looking at the function
declaration. But something like
 
auto a = 0;
 
is a bit different: you have to very carefully look at that
'0' to figure out if this will end up being an 'int' or per-
haps something else? And it can be prone to getting the wrong
type by vorgetting (or mis-typing) some character after the
'0' that makes the variable have a different type. There's
definitely a readability issue.
 
> in many places it makes coding safer because the auto always finds the
> correct type. You can get bugs by putting a wrong type... and people
> have done that when reading forums.
 
Yes, but cases like
 
int a = 0f;
 
are places where this isn't the case. 'auto' is very usefulf
in cases like
 
for ( auto it = xyz.begin(); it != xyz.end(); ++i )
 
instead of maybe
 
for ( std::pair< std::vector< std::pair< int, char const * >, double >, std::vector< std::string > >::iterator it = xyz.begin( ); it != xyz.end( ); ++i )
 
since you will be aware of the type of 'xyz', but
 
auto a = 0ull;
 
is different since it makes the type of 'a' hard to recognize
at a glance. And you may not forget anything of the 'ull' bit
at the end or you'll get something you never wanted and thus
don't expect. It actually creates a new class of possible bugs.
 
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
David Brown <david.brown@hesbynett.no>: Jun 28 09:53PM +0200

On 28/06/15 21:19, Stefan Ram wrote:
 
> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code
 
Use guide C - avoid macros unless they really are the clearest and best
way to solve the problem at hand. But don't use #undef except in
/really/ special code, as it leads to confusion - macros should normally
have exactly the same definition at all times in the program, or at
least within the file.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 10:24PM +0200

On 28-Jun-15 9:19 PM, Stefan Ram wrote:
 
> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code
 
Ordinary include guards are incompatible with guide A. Well I could
agree with a preference for #pragma once instead of include guards (and
just don't support any e.g. IBM compiler that doesn't support the
pragma), but /requiring/ that one doesn't use include guards is IMHO to
go too far. It's more work and less clear, but if someone wants to, hey.
 
As an example that's incompatible with guide B, in Windows desktop
programming one will normally, nowadays, defined UNICODE before
including <windows.h>. The definition doesn't matter, just that it's
defined. But if it is defined in code and there is a previous definition
one will get a sillywarning with e.g. g++ or Visual C++. And a simple
solution is to #undef it, like this:
 
#undef UNICODE
#define UNICODE
#include <windows.h>
 
And this is very normal code.
 
Rules to be mechanically followed are generally not compatible with C++
programming, which requires Some Intelligence Applied™.
 
Therefore I think that neither guide referred to and quoted above, can
be of very high quality.
 
 
Cheers & hth.,
 
- Alf
[Sorry, yet again I inadvertently applied Google Groups experience and
hit "Reply". I'm currently searching for the "Unsend" button.]
 
--
Using Thunderbird as Usenet client, Eternal September as NNTP server.
"Öö Tiib" <ootiib@hot.ee>: Jun 28 01:28PM -0700

On Sunday, 28 June 2015 22:19:56 UTC+3, Stefan Ram wrote:
 
> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code
 
I use macros for things that are impossible without macros.
These are mostly things for better runtime debug diagnostics or
traces.
Examples:
I can't optionally use compiler-specific extensions without macros.
I can't get current source code file name, function name, line number
or compiling time without macros.
I can't both stringize and evaluate part of code without macros.
 
Otherwise I avoid macros. The ones that I use I define in general
configuration header that is included everywhere and I never #undef any
of those.
"Öö Tiib" <ootiib@hot.ee>: Jun 28 01:43PM -0700

On Sunday, 28 June 2015 23:24:36 UTC+3, Alf P. Steinbach wrote:
> just don't support any e.g. IBM compiler that doesn't support the
> pragma), but /requiring/ that one doesn't use include guards is IMHO to
> go too far.
 
IBM XL C/C++ certainly supports pragma once. AFAIK only Oracle Solaris
Studio does not support it from C++ compilers still under active
maintenance.
Forbidding include guards is still perhaps going too far with style
guide.
ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 12:48AM

>Can anybody tell why would somebody want to flush the stream with end?
 
Usually, ::std::cout is flushed before ::std::cin is used
for reading or before the program exits, so one would want
to flush it, when this does not suffice.
 
See also: »::std::ios_base::unitbuf«, »::std::cin.tie()«.
ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 03:34PM

Just for fun I'd like to point out that the term of
»concrete class« might have changed in Lippman's
»C++ primer«.
 
The edition of 2005 still defines:
 
»A concrete class is a class that exposes,
rather than hides, its implementation.«.
 
This seems to comply with Stroustrups notion of »concrete
types« (and »concrete class« in that context).
 
But a more recent 5th edition of the »C++ primer« now seems
to use »concrete class« in the other sense of »a class that
is not an abstract class«, although it possibly does not
give an explicit definition for this term anymore.
 
A class that is not concrete but owns ressources sometimes
is called a »resource handle«. I would use this term for
::std::unique_ptr, but not for ::std::string, because in
the case of the former handling the resource is the primary
task, but in the case of the latter the resources is just
a means to be a variable-length (mutable) string.
 
What kind of classes are out there?
 
POD class
primitive class
regular class
trivially copyable type
trivial type
standard-layout type
canonical class
concrete class (a term with at least two different meanings)
abstract class
literal class
resource handle class
class with value semantics
class with reference semantics
constexpr class
 
Any other kind that comes to your mind?
ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 07:19PM

guide A:
 
Don't use macros! OK, (...) treat them as a last resort.
(...) And #undef them after you've used them, if possible.
 
guide B:
 
#undef should not normally be needed. Its use can lead
to confusion with respect to the existence or meaning of
a macro when it is used in the code
"Öö Tiib" <ootiib@hot.ee>: Jun 28 08:21AM -0700

On Sunday, 28 June 2015 05:56:09 UTC+3, Richard wrote:
 
> printf("%s\n", s);
> fflush(stdout);
 
> ...and so-on.
 
Indeed because it adds clutter and we are lazy. If we had no 'endl' then
we would rarely write such code in C++ as well:
 
std::cout << i << '\n' << std::flush;
std::cout << s << '\n' << std::flush;
 
> If we wouldn't flush the buffer on every line in C, or in any other
> language that supported buffered I/O (C#, Java, etc.), why are we
> chronically doing this in C++?
 
We are not. We typically do not use <iostream> for massive (so it
does affect performance) text I/O and on case when we do then we
avoid 'operator<<' whatsoever since that thing trashes performance
even more terribly than superfluous flushing.
 
We do use the streams primarily for slow human-readable I/O and
even that is primarily for debugging. Now in debugging context
I have been actually annoyed that the damn 'printf' did not flush
it before it crashed or broke into breakpoint. Therefore it makes
sense in code that demonstrates some feature or crash to novice
to use 'endl' liberally because novice may want to step it in
debugger.
 
> I submit it is simply because people are immitating what they see
> around them without thinking about it.
 
You are correct that people do lot of things without thinking too
lot about it. Otherwise it is hard to get things done timely.
The particular topic is example of something that is good that you
brought up since I tend to use '\n' and 'std::endl' in mix but have
long stopped thinking about why I do it exactly like I do.
Rosario19 <Ros@invalid.invalid>: Jun 28 06:20PM +0200

On Sun, 28 Jun 2015 08:21:57 -0700 (PDT), 嘱 Tiib wrote:
 
>avoid 'operator<<' whatsoever since that thing trashes performance
>even more terribly than superfluous flushing.
 
>We do use the streams primarily for slow human-readable I/O and
 
i'm not agree
standard input and output can be use with trhu pipes
for connect programs
 
"Öö Tiib" <ootiib@hot.ee>: Jun 28 09:48AM -0700

On Sunday, 28 June 2015 19:20:43 UTC+3, Rosario19 wrote:
 
> i'm not agree
> standard input and output can be use with trhu pipes
> for connect programs
 
How that contradicts with what I wrote above? I do not understand where
is difference.
 
"Primarily" does not mean "always" but it means "for most part" and
"mainly". I wrote above even separately about the cases like the
one that you pointed out: "when we do then we avoid 'operator<<'
whatsoever since that thing trashes performance even more terribly
than superfluous flushing."
Victor Bazarov <v.bazarov@comcast.invalid>: Jun 28 11:56AM -0400

On 6/28/2015 11:34 AM, Stefan Ram wrote:
> class with reference semantics
> constexpr class
 
> Any other kind that comes to your mind?
 
Empty class (sometimes used to denote a type that is different from any
other type in your program).
 
V
--
I do not respond to top-posted replies, please don't ask
Paul <pepstein5@gmail.com>: Jun 28 07:53AM -0700

On Thursday, June 25, 2015 at 2:28:35 PM UTC+1, Öö Tiib wrote:
> so you should keep. Iterators should not manage the object
> they navigate. Smart pointers (that automatically manage)
> are therefore very bad iterators.
 
This is my revised code which uses smart pointers. I also coded another direct way of testing for cycles by seeing if the pointers repeat. Does this seem ok? Thanks a lot for your feedback.
 
#include <cstdio>
#include <unordered_set>
#include <vector>
#include <algorithm>
#include <iostream>
#include <memory>
 
struct Node
{
int data;
std::shared_ptr<Node> next;
 
};
 
// A fast pointer and a slow pointer are both initiated at the head.
// Circular if the slow pointer is ever ahead of the fast pointer.
bool isCycle(std::shared_ptr<Node> head)
{
auto slowPointer = head;
auto fastPointer = head;
 
while(fastPointer && fastPointer->next && fastPointer->next->next)
{
slowPointer = slowPointer->next;
fastPointer = fastPointer->next->next;
 
if(fastPointer == slowPointer || fastPointer->next == slowPointer)
return true;
}
 
return false;
}
 
// A direct algorithm to tell if a cycle is present by seeing if a pointer address repeats.
bool isCycleDirect(std::shared_ptr<Node> head)
{
std::unordered_set<std::shared_ptr<Node>> nodePointers;
 
while(head)
{
// If trying to insert something already inserted, then must contain cycles.
if(nodePointers.find(head) != nodePointers.end())
return true;
 
nodePointers.insert(head);
head = head->next;
}
 
return false;
}
 
// Test against the expected results.
void testCycle(std::shared_ptr<Node> head, bool expected)
{
printf(isCycle(head) == expected ? "Results as expected\n" : "This test case failed\n");
}
 
// Set up tests for small numbers of nodes
void smallTests()
{
std::shared_ptr<Node> emptyList;
testCycle(emptyList, false);
 
std::shared_ptr<Node> List1(new Node);
std::shared_ptr<Node>ListCircular2(new Node);
std::shared_ptr<Node>ListNonCircular2(new Node);
std::shared_ptr<Node>ListCircular3(new Node);
std::shared_ptr<Node>ListNonCircular3(new Node);
 
List1->next = nullptr;
List1->data = 1;
testCycle(List1, false);
 
ListCircular2 = List1;
ListCircular2 -> next = ListCircular2;
testCycle(ListCircular2, true);
 
ListNonCircular2 = ListCircular2;
ListNonCircular2->next = std::shared_ptr<Node>(new Node);
ListNonCircular2->next->data = 2;
ListNonCircular2->next->next = nullptr;
testCycle(ListNonCircular2, false);
 
ListNonCircular3 = ListNonCircular2;
ListNonCircular3->next->next = std::shared_ptr<Node>(new Node);
ListNonCircular3->next->next->data = 3;
ListNonCircular3->next->next->next = nullptr;
testCycle(ListNonCircular3, false);
 
ListCircular3 = ListNonCircular3;
ListCircular3->next->next->next = ListCircular3;
testCycle(ListCircular3, true);
 
}
 
int main()
{
smallTests();
return 0;
 
}
 
 
Paul
"Öö Tiib" <ootiib@hot.ee>: Jun 28 08:40AM -0700

On Sunday, 28 June 2015 17:53:42 UTC+3, Paul wrote:
> > they navigate. Smart pointers (that automatically manage)
> > are therefore very bad iterators.
 
> This is my revised code which uses smart pointers. I also coded another direct way of testing for cycles by seeing if the pointers repeat. Does this seem ok? Thanks a lot for your feedback.
 
You decided to use smart pointers as iterators in 'isCycle'. I already did
try to explain why smart pointers are terrible iterators. If you don't
make or accustom a class for iterator then raw pointer is still better
than smart pointer.

Your 'isCycleDirect' seems quite expensive to make 'unordered_set' of
whole list. You should perhaps try and compare the two with list of
million of entries.
 
Your 'smallTests' is still broken in sense that it leaks memory.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: