soft and program: June 2015

Tuesday, June 30, 2015

Digest for comp.lang.c++@googlegroups.com - 25 updates in 7 topics

comp.lang.c++@googlegroups.com

Google Groups

Struggling with bind syntax - 6 Updates
braced-init-list form of a condition - 7 Updates
direct-initialization - 2 Updates
braced-init-list form of a condition - 1 Update
Checking if a linked list is circular with smart pointers - 2 Updates
About virtual inheritance - 5 Updates
Design flaws (Scott..) - 2 Updates

Paul <pepstein5@gmail.com>: Jun 30 10:18AM -0700

I am trying to write two versions of Quicksort, one that uses randomised partition, and one that doesn't randomise the partition. However, I want them to have signatures std::vector<int> f(std::vector<int> vec) and so I want to bind the quicksorts to Boolean parameters.

I can't find the syntax to do this though. Just so that I haven't left out any context, I will copy-paste my entire code. However, I will mark out the offending line with asterisks.
Many thanks for your help.

#include <vector>
#include <cstdlib>
#include <cstdio>
#include <stdexcept>
#include <functional>

// Basic swapping useful for many sorts.
void Swap(int& x, int& y)
{
int temp = x;
x = y;
y = temp;
}

// Creating a random integer array of size N.
// Integers are randomly chosen between 0 and randmax.
std::vector<int> CreateRandomArray(int N)
{
std::vector<int> data;

for(int i = 0; i < N; ++i)
data.push_back(rand());

return data;
}

// Testing our sorts by checking that every vector is sorted.
// True if sorted.
bool TestSort(const std::vector<int>& vec)
{
if(vec.empty())
return true;

for(int i = 0; i < vec.size() - 1; ++i)
if(vec[i] > vec[i + 1] )
return false;

return true;
}

// Testing a sort -- generalised for any function of the required type
bool TestSortGeneral( std::vector<int> (*const & f)(std::vector<int>), int length)
{
return TestSort(f(CreateRandomArray(length)));
}

// Testing a collection of sorts
void TestSortCollection(std::vector<std::vector<int>(*) (std::vector<int>) > sortTypes, int length)
{
for(auto& alg : sortTypes)
printf(TestSortGeneral(alg, length) ? "Results as expected\n" : "Problem with sort\n");
}

std::vector<int> InsertionSort(std::vector<int> vec)
{
for(int i = 1; i < vec.size(); ++i)
{
// Vector is sorted up to i - 1.
int j = i - 1;
int key = vec[i];

while(j >= 0 && key < vec[j])
{
vec[j + 1] = vec[j];
--j;
}

vec[j + 1] = key;

}

return vec;
}

std::vector<int> BubbleSort(std::vector<int> vec)
{
if(vec.empty())
return vec;

const int j = vec.size()-1;
bool sorted;
do
{
sorted = true;
for(int i = 0; i != j; ++i)
if(vec[i] > vec[i + 1])
{
Swap(vec[i], vec[i + 1]);
sorted = false;
}
}
while(!sorted);

return vec;
}

std::vector<int> SelectionSort(std::vector<int> vec)
{
for(int i = 0; i < vec.size(); ++i)
{
int minIndex = i;
for(int j = i+1; j < vec.size(); ++j)
if(vec[j] < vec[minIndex])
minIndex = j;

Swap(vec[i], vec[minIndex]);
}

return vec;
}

// Randomly selecting a number in [0 n)
int nrand(int n)
{
if(n <= RAND_MAX)
{
const int bucket_size = RAND_MAX / n;
int r;
do r = rand() / bucket_size;
while (r >= n);
return r;
}

const int buckets = n / RAND_MAX;
const int rem = n % RAND_MAX;
// n has been divided into several buckets of RAND_MAX and also a smaller bucket.
// We simulate the condition that the random trial landed in the smaller bucket.
// By recursion we can assume nrand is defined for all smaller n.
// nrand(buckets + 1) == buckets indicates falling off the end.
// Once we've fallen off the end, we hit the small bucket if nrand is small enough.

const int positionWithinBucket = nrand(RAND_MAX); // Bucket can be either normal bucket or smaller bucket.
const int finalBucket = nrand(buckets + 1);
const bool smallerBucket = finalBucket == buckets && positionWithinBucket < rem;

// If not a small bucket, the process is straightforward. Randomly select the large bucket and use the position within the bucket.
const int bucketIndex = smallerBucket ? buckets : nrand(buckets);
return bucketIndex * RAND_MAX + positionWithinBucket;

}

// Index for such that left-of-index members are smaller than right-of-index members
// Call by pointer to preserve sort.
// bool random indicates whether partitioning is random
int QuickSortPartition(std::vector<int>* vec, bool random = false)
{
if(vec->empty())
throw std::runtime_error("Should not be called on an empty vector");

const int j = vec->size() - 1;

if(random)
Swap((*vec)[j], (*vec)[nrand(j)]);

int& key = (*vec)[j];
int lowIndex = 0;

for(int i = 0; i < j; ++i)
if((*vec)[i] < key)
Swap((*vec)[lowIndex++], (*vec)[i]);

Swap(key, (*vec)[lowIndex]);
return lowIndex;

}

std::vector<int> QuickSort(std::vector<int> vec, bool random = false)
{
if(vec.size() <= 1)
return vec;

int index = QuickSortPartition(&vec, random);
std::vector<int> left;
std::vector<int> right;
std::vector<int> results;

for(int i = 0; i < index; ++i)
left.push_back(vec[i]);

for(int i = index + 1; i < vec.size(); ++i)
right.push_back(vec[i]);

left = QuickSort(left, random);
left.push_back(vec[index]);

right = QuickSort(right, random);
results = left;
for(auto i : right)
results.push_back(i);

return results;
}

// Use bind concept to distinguish the two types of Quicksort without repeating code.
/*****************************************************************************************************
Below is meant to declare the variant of QuickSort that uses bool random = false but I can't find the syntax.
************************************************************************************************************
*************************************************************************************************************
Below line gives compile error! */
std::vector<int>(*QuickSortBasic)(std::vector<int>) = std::bind(QuickSort, std::placeholders::_2, false);

// N is the length of each vector
void CombinedTest(int N)
{
std::vector<std::vector<int>(*) (std::vector<int>) > sortTypes;
sortTypes.push_back(InsertionSort);
sortTypes.push_back(BubbleSort);
sortTypes.push_back(SelectionSort);
sortTypes.push_back(QuickSortBasic);
TestSortCollection(sortTypes, N);
}

int main()
{
const int vecLength = 50 * 1000;
CombinedTest(vecLength);
return 0;
}

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 30 08:56PM +0200

On 30-Jun-15 7:18 PM, Paul wrote:
> I am trying to write two versions of Quicksort, one that uses randomised partition, and one that doesn't randomise the partition. However, I want them to have signatures std::vector<int> f(std::vector<int> vec) and so I want to bind the quicksorts to Boolean parameters.

> I can't find the syntax to do this though. Just so that I haven't left out any context, I will copy-paste my entire code. However, I will mark out the offending line with asterisks.
> Many thanks for your help.

Instead of std::bind I'd just use lambdas. They're good at binding and
work as expected. In contrast, std::bind only works as expected by chance.

Cheers & hth.,

- Alf

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

JiiPee <no@notvalid.com>: Jun 30 09:01PM +0100

On 30/06/2015 19:56, Alf P. Steinbach wrote:
> chance.

> Cheers & hth.,

> - Alf

niin se Scottkin sanoi... etta "hyva jos et ole oppinut bindia" :)...
minahan en ole sita opetetllut, eli kaytan suoraan lambdoja

guinness.tony@gmail.com: Jun 30 02:05PM -0700

On Tuesday, 30 June 2015 18:19:11 UTC+1, Paul wrote:
> I am trying to write two versions of Quicksort, one that uses randomised partition, and one that doesn't randomise the partition. However, I want them to have signatures std::vector<int> f(std::vector<int> vec) and so I want to bind the quicksorts to Boolean parameters.

> I can't find the syntax to do this though. Just so that I haven't left out any context, I will copy-paste my entire code. However, I will mark out the offending line with asterisks.
> Many thanks for your help.

<snip>

std::vector<int> QuickSort(std::vector<int> vec, bool random = false)
> {

<snip>

> *************************************************************************************************************
> Below line gives compile error! */
> std::vector<int>(*QuickSortBasic)(std::vector<int>) = std::bind(QuickSort, std::placeholders::_2, false);

<snip>

std::bind() returns an object encapsulating enough information
to call your QuickSort function with the correct set of parameters
when it is "called" (i.e. it passes the parameter passed in the
call and the parameter given to std::bind()).

As such, it is a functional object (a "functor") and definitely
not a function pointer. Furthermore, the Standard tells us that
the type of that object is unspecified, so you cannot write its
type out in your declaration of QuickSortBasic. So just use

auto QuickSortBasic = std::bind(QuickSort, std::placeholders::_2, false);

instead, which is so much neater anyway.

If you *really* want to avoid using 'auto' and spell out the type
of the binder object, you'll need to resort to using the deprecated
bind2nd() binder, thus:

std::binder2nd<
std::pointer_to_binary_function<
std::vector<int>, bool, std::vector<int> > >
QuickSortBasic = std::bind2nd(std::ptr_fun(QuickSort), false);

However, apart from the issue of these binders being deprecated,
I'm sure you'll agree that this is a particularly ugly solution.

Like the rest of us, be thankful for 'auto' and std::bind().

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Jun 30 10:27PM +0100

> If you *really* want to avoid using 'auto' and spell out the type
> of the binder object, you'll need to resort to using the deprecated
> bind2nd() binder, thus:

Nonsense, just store the result of std::bind in a std::function object.

[snip]

/Flibble

guinness.tony@gmail.com: Jun 30 03:26PM -0700

On Tuesday, 30 June 2015 22:27:53 UTC+1, Mr Flibble wrote:

> Nonsense, just store the result of std::bind in a std::function object.

> [snip]

> /Flibble

Well, yes - there are conversions from the unspecified return type
from std::bind to an appropriate specialisation of std::function.

Whilst trying to get that specialisation right, I noticed that the
OP also passes the wrong placeholder into the std::bind() argument
list.

So, for the OP, you /could/ spell out a type to hold *a conversion
from* the result of std::bind():

std::function<std::vector<int>(std::vector<int>)> BasicQuickSort =
std::bind(QuickSort, std::placeholders::_1, false);

But I would still use 'auto'.

braced-init-list form of a condition

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 08:16PM

There is a specification for C++ that contains this noun phrase:

braced-init-list form of a condition

in 8.5p16. What does this refer to?

Could it refer to

if( bool b{ true }) ...

?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 08:31PM

5.5p16 says:

The initialization that occurs in the forms
T x(a);
T x{a};
(...) is called direct-initialization.

(End of quotation).

Is »a« a single expression above?

So,

::std::string s( 3, 'c' );

is not a direct-initialization because there
is not a single expression in the parentheses?

If this is true, what kind of initialization is it?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 08:45PM

>I think the 'condition' requires an equal sign if it's a declaration of
>a variable.

The equals sign is not required. I quote 6.4p1:

selection-statement:
if ( condition ) statement
(...)

condition:
attribute-specifier-seqopt decl-specifier-seq declarator = initializer-clause
attribute-specifier-seqopt decl-specifier-seq declarator braced-init-list

.

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 08:52PM

>>a variable.
>The equals sign is not required. I quote 6.4p1:
>condition:

Of course, the English meaning of »condition« is something else.
My course participants heavily protested when I

#include <cstdio>

int main() { if( ::std::puts( "hello, world" )){} }

, and I just wanted to show that you can program without the
semicolon »;«!

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 09:07PM

>>5.5p16 says:
>Really? The paragraph 5.5/16 doesn't exist in the document I use. Did
>you mean 8.5/16?

Yes. Sorry, that was a typo!

>It is direct. See just lower, "If the destination type is a .. class type:"

The parapraph 17 seems just intended to give the semantics
once the kind of initialization syntax is already determined.

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 09:41PM

>>T x(a);
>>Is »a« a single expression above?
>Most likely.

I hope not. And I found some evidence in an old
standard for C++ from the year 1998.

There in it is written:

»is called direct-initialization and is equivalent
to the form T x(a);«, 1998:8.5p12.

»When objects of class type are direct-initialized ...
The argument list is the expression-list within the
parentheses of the initializer.«, 1998:13.3.1.3p1.

So 1998:13.3.1.3 says »expression-list«!

The recent 13.3.1.3 also uses »expression-list« but it is
less clear since it also refers to other kinds of
initializations, while 1998:13.3.1.3 only referred to
direct-initialization.

ram@zedat.fu-berlin.de (Stefan Ram): Jun 30 10:16PM

>or
> T x{a};
>in which 'a' does *not* have to be a single expression (see p13).

That quotation from above did not explictly refer to
direct-initializations only AFAIK.

It is possible that the form

T x( a, b )

is also an initialization, but not a direct-initialization
(it is possible that it is neither a copy-initialization nor
a direct-initialization, but just »an initialization«). The
quotation than could refer to this.

direct-initialization

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 30 04:52PM -0400

On 6/30/2015 4:31 PM, Stefan Ram wrote:
> 5.5p16 says:

Really? The paragraph 5.5/16 doesn't exist in the document I use. Did
you mean 8.5/16?

> (...) is called direct-initialization.

> (End of quotation).

> Is »a« a single expression above?

Most likely. It's written as an opposite form of the T x = a; described
in the preceding paragraph, and relates to the paragraph that starts
with "The form of initialization". They are talking of a single
expression there.

> is not a direct-initialization because there
> is not a single expression in the parentheses?

> If this is true, what kind of initialization is it?

It is direct. See just lower, "If the destination type is a .. class type:"

V
--
I do not respond to top-posted replies, please don't ask

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 30 05:57PM -0400

On 6/30/2015 5:07 PM, Stefan Ram wrote:

>> It is direct. See just lower, "If the destination type is a .. class type:"

> The parapraph 17 seems just intended to give the semantics
> once the kind of initialization syntax is already determined.

"If the entity being initialized does not have class type, the
expression-list in a parenthesized initializer shall
be a single expression."

From p16 I conclude that in a class type the direct initialization can
take the form

T x(a);
or
T x{a};

in which 'a' does *not* have to be a single expression (see p13).

V
--
I do not respond to top-posted replies, please don't ask

braced-init-list form of a condition

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 30 04:38PM -0400

On 6/30/2015 4:16 PM, Stefan Ram wrote:

> Could it refer to

> if( bool b{ true }) ...

> ?

I think the 'condition' requires an equal sign if it's a declaration of
a variable. IOW, it could be

if (bool b = {true})

Of course it makes more sense with a class that has a conversion to bool
defined for it and has a c-tor to initialize it from a list, so you can
write

class MySpecialClass
{
public:
MySpecialClass(std::initializer_list<int> list);
operator bool() const;
};

int main()
{
if (MySpecialClass obj = { 42, 666 })
return 0;
}

V
--
I do not respond to top-posted replies, please don't ask

Checking if a linked list is circular with smart pointers

Paul <pepstein5@gmail.com>: Jun 30 10:20AM -0700

On Sunday, June 28, 2015 at 4:40:19 PM UTC+1, Öö Tiib wrote:
> whole list. You should perhaps try and compare the two with list of
> million of entries.

> Your 'smallTests' is still broken in sense that it leaks memory.

Yes, I did understand your advice but some companies seem a bit dogmatic about always using smart pointers and I'm trying to learn to please them.

Thanks to all who have taken the time to help me.

Paul

Christopher Pisz <nospam@notanaddress.com>: Jun 30 01:40PM -0500

On 6/30/2015 12:20 PM, Paul wrote:

> Yes, I did understand your advice but some companies seem a bit dogmatic about always using smart pointers and I'm trying to learn to please them.

> Thanks to all who have taken the time to help me.

> Paul

"Always use smart pointers" is an ignorant rule and their bug trackers
will show it.

Use raw pointers when it makes sense to use raw pointers.
Use shared_ptr when it makes sense to use shared_ptr.
Use weak_ptr when it makes sense to use weak_ptr.
Use unique_ptr when it makes sense to use unique_ptr.

Those companies are going to spend just as much, if not more, man hours
debugging "why didn't X get released" and circular references, as they
will from people using raw pointers poorly.

It's always some guy that thinks, it is a good idea to use shared_ptrs
like a C# garbage collector.

--
I have chosen to troll filter/ignore all subthreads containing the
words: "Rick C. Hodgins", "Flibble", and "Islam"
So, I won't be able to see or respond to any such messages
---

About virtual inheritance

Richard Damon <Richard@Damon-Family.org>: Jun 29 09:56PM -0400

On 6/29/15 12:39 PM, Öö Tiib wrote:

> Uhh? What these Base, 'Der1', 'Der2' and 'Join' are? I don't have those
> anywhere. However ... I have just learned it for case that maybe
> someday I meet a case where something somewhere needs that.

Here is an example. Assume we are designing a fantasy game, and one of
the objects we want to be able to represent are creatures, which hold a
number of attributes common to all creatures (like life level). There
also exist some special types of creatures that need extra attributes
and are represented as derived classes (For example, Dragons with flying
and breath weapons, and Warriors with weapon skills). We may then want a
creature that belongs to multiple of these special classes, so we want
to multiply inherent from each of the specialty classes so it collects
all of the special properties, in this case a Dragon Warrior (Yikes).

Without using virtual bases, the Dragon warrior is two creature, each
with their own sets of properties. Thus if I damage it as a Dragon, that
damage doesn't affect the Warrior. This isn't right! We want all
creatures to only have creature class within them, so we make it a
virtual base.

The key is looking to see if you are going to get to the multiple
inheritance case with (a) common base(s). And if you do, should there be
just a single copy of that base shared by the whole object, or does each
path to it need their own distinct copy.

"Öö Tiib" <ootiib@hot.ee>: Jun 30 06:18AM -0700

On Tuesday, 30 June 2015 04:56:37 UTC+3, Richard Damon wrote:
> creature that belongs to multiple of these special classes, so we want
> to multiply inherent from each of the specialty classes so it collects
> all of the special properties, in this case a Dragon Warrior (Yikes).

Yikes? :) Feels that you think yourself that something is wrong with it?
Isn't it that "dragon" is "race" or "kind" and "warrior" is a "profession" or
"occupation" of "creature"? IOW these feel like properties or components
of every being.

The difference is that a base class is too rigid and too closely coupled
to serve as property or component. We can have type of component
dynamically changing but we can't have dynamically changing base
subobjects. For example it is imaginable how that "dragon warrior"
may want one day to change to something like "dragon gladiator" or
"dragon mercenary" during life-time of it. What to do then?

> inheritance case with (a) common base(s). And if you do, should there be
> just a single copy of that base shared by the whole object, or does each
> path to it need their own distinct copy.

I understand how the diamond works and what it does. I still maintain my
impression that concrete "diamond" is where relations between classes are
somehow made incorrectly.

It can perhaps be that it has happened in real application already made and
we have just to maintain it without big corrections in the data architecture.
So there the virtual inheritance is sort of fix. I would perhaps just push
through the correction because it is hard for me to imagine the benefits of
it.

Juha Nieminen <nospam@thanks.invalid>: Jun 30 01:24PM

>> diamond inheritance. C++ also offers another solution (which is not
>> only a bit more efficient, but also feasible in some situations).

> What is the solution?

If you don't use virtual inheritance, then the common base class will
be duplicated for each of the derived classes. In other words, the
most derived class will have inside it everything from Der1 and
everything from Der2 independently (which means that the things from
Base will be duplicated: The Der1 part will have its own version of it
and the Der2 part will have its own version.)

This is not always desired. You may want the members of Base to
appear in Join only once, and for the Der1 and Der2 code to refer
to that one single Base data. (This requires the compiler to generate
special code in the member functions of Der1 and Der2 to access the
base class in a special way. This is what you are telling when you
are inheriting virtually.)

One example that comes to mind:

You have an intrusive smart pointer, and the classes that it can handle
are derived from a special class that contains a reference count. In
other words, you can have something like:

class MyClass1: public ReferenceCountable { ... };

class MyClass2: public ReferenceCountable { ... };

Now for one reason or another you need to multiple-inherit from those
two classes:

class Derived: public MyClass1, public MyClass 2 { ... };

Now how would your smart pointer be able to handle instance of this
Derived class? It has 'ReferenceCountable' as its base class... but
twice. Which one should it use, and how? (Obviously it ought to use
only one of them, else it won't work properly.)

In this case you want 'ReferenceCountable' to appear in 'Derived'
only once, and thus your smart pointer will be able to unambiguously
use that data.

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

"Öö Tiib" <ootiib@hot.ee>: Jun 30 06:46AM -0700

On Tuesday, 30 June 2015 16:24:48 UTC+3, Juha Nieminen wrote:

> In this case you want 'ReferenceCountable' to appear in 'Derived'
> only once, and thus your smart pointer will be able to unambiguously
> use that data.

Ok, now that makes sense.

It somewhat feels also a great explanation why to use 'std::make_shared'
instead of intrusive refcounting. I always suggested it but did not have
clear examples why. Here 'std::make_shared' seems both simpler and
more efficient.

Thanks.

Martijn van Buul <pino@dohd.org>: Jun 30 03:26PM

* Öö Tiib:
> instead of intrusive refcounting. I always suggested it but did not have
> clear examples why. Here 'std::make_shared' seems both simpler and
> more efficient.

Except that it isn't the same, and sometimes the differences matter,
and without further knowledge of the context it's hard to say whether it
is "more efficient" or not. I once rewrote a framework that used shared_ptr
to intrusive_ptr, because it allowed me to avoid using new. The resulting
gain in speed (and reduced memory fragmentation) was significant.

Note that I'm not saying that intrusive_ptr is generally better than
shared_ptr. It's not, but sometimes intrusive_ptr works where shared_ptr
doesn't.

The same can be said for most of the *intrusive part of Boost.
--
Martijn van Buul - pino@dohd.org

Design flaws (Scott..)

JiiPee <no@notvalid.com>: Jun 30 06:29AM +0100

"- Alf
[Sorry, I inadvertently first hit "Reply" instead of "Follow up"] "

Good one/spot Alf... :)
Just watched a good Scott Maeyrs Video about software design. And that
was just one of his point: Button names are not understandable. In
thunderbird newsgroup interface you have just that problem as Alf
writes: replying to a message there are Follow up and Reply button. Me
also (and I bet many many others) have pressed many times that Reply
button. I guess Scott would also complain about it :). I guess better
would be:
Reply = reply to newgroup
Reply Sender = replying to sender
or something else...

Also buttons could be far away from each others.

Just came to my mind as watched that video so recently :). But Scott
opened my eyes , so this is sure what he would also say here.

But otherwise Thunderbird is a good program.

JiiPee <no@notvalid.com>: Jun 30 06:34AM +0100

On 30/06/2015 06:29, JiiPee wrote:
> writes: replying to a message there are Follow up and Reply button.

> Me also (and I bet many many others) have pressed many times that
> Reply button.

So, if me also have done that (and its rare I press wrong buttons in
programs) - and others as well - that is a good indication that the
design makes people easy to do it the wrong way.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Monday, June 29, 2015

Digest for comp.lang.c++@googlegroups.com - 13 updates in 5 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

style guides on "#undef" - 5 Updates
Avoid 'int', 'long' and 'short'... - 4 Updates
About virtual inheritance - 2 Updates
Checking if a linked list is circular with smart pointers - 1 Update
Zeroing in the constructor - 1 Update

style guides on "#undef"

Juha Nieminen <nospam@thanks.invalid>: Jun 29 08:18AM

> guide A:

> Don't use macros! OK, (...) treat them as a last resort.
> (...) And #undef them after you've used them, if possible.

There are things that can only be done with preprocessor macros.
assert() is the ubiquitous example. (And obviously you shouldn't
be #undeffing assert().)

(The reason why assert() can only be done with a macro is that it
prints the file and line where the assertion failure happened,
which is impossible in C++ proper.)

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 29 11:22AM +0200

On 29-Jun-15 10:18 AM, Juha Nieminen wrote:

> There are things that can only be done with preprocessor macros.
> assert() is the ubiquitous example. (And obviously you shouldn't
> be #undeffing assert().)

Also it would be unwise to #undef "errno" after use...

(C++14 §19.4/1 "errno shall be defined as a macro")

* * *

Regarding the associated issue of use of #undef in general for standard
macros, i.e. whether that's acceptable, there's the NDEBUG macro.

C++14 §17.6.2.2/2 "the effect of including either <cassert> or
<assert.h> depends each time on the lexically current definition of NDEBUG."

The dependency is each time on the existence of NDEBUG, not on any
particular definition.

And that does not make sense if NDEBUG is never #undef-ed.

So this is one case where the standard assumes and depends on client
code use of #undef for a standard macro, and I mention it as an example
that that's not always an abomination. In another posting I mentioned
another concrete example, that of using #undef to avoid warnings for a
following #define of UNICODE in Windows desktop programming. However,
UNICODE is only a de facto standard, not part of the C++ standard.

Cheers,

- Alf

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

Rosario19 <Ros@invalid.invalid>: Jun 29 01:48PM +0200

On 28 Jun 2015 19:19:46 GMT, (Stefan Ram) wrote:

> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code

local macro

int f(void)
{
#define a b
#define c d
#define g h

....

#undef g
#undef c
#undef a

return 0;

}

David Harmon <source@netcom.com>: Jun 29 06:51AM -0700

On Mon, 29 Jun 2015 08:18:03 +0000 (UTC) in comp.lang.c++, Juha Nieminen
<nospam@thanks.invalid> wrote,
>There are things that can only be done with preprocessor macros.
>assert() is the ubiquitous example. (And obviously you shouldn't
>be #undeffing assert().)

Of course not. You #undef NDEBUG and #include <assert.h> again.

legalize+jeeves@mail.xmission.com (Richard): Jun 29 08:18PM

[Please do not mail me a copy of your followup]

David Brown <david.brown@hesbynett.no> spake the secret code
>/really/ special code, as it leads to confusion - macros should normally
>have exactly the same definition at all times in the program, or at
>least within the file.

I have used macros to eliminate some boilerplate when constructing a
table of values. (This was pre-std::initializer_list and {}
construction, so if I were do that code again today I might not need a
macro at all.) This is the only time I've used #undef -- once that
table is defined, then I *don't* want anyone to use the macro anymore
and therefore I #undef it. Since the table is defining data, this was
in the context of a source file, not a header file. The macro had an
intention revealing name, so it is unlikely that anyone would be using
a macro of the same name, but different meaning, elsewhere in the code.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Avoid 'int', 'long' and 'short'...

"Öö Tiib" <ootiib@hot.ee>: Jun 28 04:50PM -0700

On Sunday, 28 June 2015 20:22:01 UTC+3, BGB wrote:

> namely, if you know a value needs to be a particular way, then chances
> are the person is willing to pay whatever CPU cycles are needed to make
> it that way.

After we take care of size and we take care of endianness there is
still alignment to care about. ;)
Note that alignment requirements may come as surprise to many.
Sometimes even on platforms with relaxed ones. For example one will
get crash on Win32 because of alignment of XMVECTOR was not 16.

In real application the code that encodes or decodes information for
some binary is usually therefore separate from rest of the code to
abstract such difficulties away.

> picking appropriate type-sizes and setting values for things like
> whether or not the target supports misaligned access, ...

> I guess it could be nicer if more of this were standardized.

Yes, preprocessor metaprogramming is not much better than template
metaprogramming if to read it. So these things are better to keep
away from code, in a separate library.

> be safe" would be a waste.

> it would be fairly easy to compress them further, but this would require
> decoding them before they could be used, which was undesirable in this case.

Interesting. Doesn't TLV mean tag or type, length, value? Most of those
formats encode some sort of tag/type/length in few first bits/bytes and
rest are data bytes, potentially also TLV. So why you used offsets?
Weren't the "sub-lumps" TLV themselves?

> less memory waste than AVL or BST variants). likewise, because of their
> structure, it is possible to predict in advance (based on the size of
> the tree) approximately how long it will take to perform the operation.

One interesting tree is splay that often works best in real application.

> > are no need for archaic tricks like Windows-1252 or Code Page 437.

> granted, but in this case, it is mostly for string literals, rather than
> bulk text storage.

The stings of application do not typically form some sort of bulk but a
sort of dictionary of short texts.

> basically, options like Deflate or LZMA are largely ineffective for
> payloads much under 200-500 bytes or so, but are much more effective as
> payloads get bigger.

Those packing algorithms are all for larger texts. With relatively short
one-liners one can perhaps make some special sub-string-packing but it
will be computationally expensive to pack.

BGB <cr88192@hotmail.com>: Jun 29 02:21AM -0500

On 6/28/2015 6:50 PM, Öö Tiib wrote:

> In real application the code that encodes or decodes information for
> some binary is usually therefore separate from rest of the code to
> abstract such difficulties away.

yeah.

this is why the explicit endian variables could also be implicitly
misaligned. it could also be possible to specify explicit alignment via
an attribute, but this isn't currently supported.

the ways it is implemented generally gloss over whatever the hardware
does (so, it can still do unaligned operations even on targets which
only support aligned loads/stores, by internally using byte operations
and shifts).

currently, my C compiler/VM doesn't do SIMD, but this might get
implemented eventually.

keywords were '__bigendian' and '__ltlendian'.

ex:
__bigendian int x;
or:
i=*(__bigendian int *)ptr;

possible explicit alignment value:
__declspec(align(2)) __bigendian int x;
or (if another keyword were added):
__align(2) __bigendian int x;

> Yes, preprocessor metaprogramming is not much better than template
> metaprogramming if to read it. So these things are better to keep
> away from code, in a separate library.

usually, this is a 'whatever_conf.h' file which is copied/pasted around,
sometimes with tweaks.

it also does explicit-size types, partly because I often use a compiler
which until very recently did not support 'stdint' stuff...

> formats encode some sort of tag/type/length in few first bits/bytes and
> rest are data bytes, potentially also TLV. So why you used offsets?
> Weren't the "sub-lumps" TLV themselves?

offsets are used because this saves needing to scan over the payload
lumps and rebuild the table. granted, the cost of this probably wouldn't
be huge, but the table will be needed either way, and if it is already
present in the data, it doesn't need to be built.

note that the file is not read-in/loaded sequentially, but is basically
loaded into the address space and used in-place.

it is Tag/Length/Value, yes.

the format is vaguely akin to if IFF and ASN.1 BER were fused together.

ASN.1 style tags are used mostly for compact structures (with typically
about 2 bytes of overhead), with TWOCC for intermediate structures (4 or
6 bytes overhead), and FOURCC for top-level structures (8 or 12 bytes of
overhead).

* 0x00-0x1F: Public Primitive (Class=0)
* 0x20-0x3F: Public Composite (Class=1)
* 0x40-0x5F: Private Primitive (Class=2)
* 0x60-0x7F: Private Composite (Class=3)
* 0x80-0x9F: Context Primitive (Class=4)
* 0xA0-0xBF: Context Composite (Class=5)
** ccct-tttt:
*** ccc=class, ttttt=tag
*** tag=0..30, tag encoded directly
*** tag=31, tag is escape coded.
* 0xC0-0xDF: Reserved
* 0xE0-0xFF: Special Markers
** 0xE0, End Of Data
** 0xE1, len:WORD24
** 0xE2, len:BYTE
*** Context Dependent Untagged Data
** 0xE3, len:WORD24, tag:TWOCC
** 0xE4, len:WORD24, tag:FOURCC
** 0xE5, len:BYTE, tag:TWOCC
** 0xE6, len:Word56, tag:FOURCC
** 0xE7, len:WORD24, tag:EIGHTCC
** 0xE8, len:WORD24, tag:SIXTEENCC
** 0xE9, len:Word56, tag:EIGHTCC
** 0xEA, len:WORD56, tag:SIXTEENCC
*** Tagged Markers

I have used variations on this design for a number of formats (it
actually started out mostly in some of my video codecs).

>> structure, it is possible to predict in advance (based on the size of
>> the tree) approximately how long it will take to perform the operation.

> One interesting tree is splay that often works best in real application.

yeah.

splay is fast, but not necessarily all that predictable nor memory
efficient (it is more like AVL in the memory-efficiency sense, and
leaves open the possibility of an O(n) worst case).

B-Tree is not necessarily the fastest option, but should be fairly
predictable in this case.

like, you really don't want a random edge-case throwing a wrench in the
timing, and fouling up external electronics by not updating the IO pins
at the correct times or something.

nevermind if it is questionable to do high-level logic and also deal
with hardware-level IO on the same processor core, but alas... (why have
a separate ARM chip and an MCU, when you can save some money by just
doing everything on the main ARM chip?...).

>> bulk text storage.

> The stings of application do not typically form some sort of bulk but a
> sort of dictionary of short texts.

not sure I follow.

in the case of the VM, they are basically a pair of string tables, one
for ASCII and UTF-8, and the other for UTF-16. each string is terminated
with a NUL character.

for the ASCII table, string references are in terms of byte offsets,
with it depending on context how the string is encoded.

if C is compiled to the VM, it really doesn't care, since it has good
old 'char *', so will use pointers into the string table.

the script-language does care, but will implicitly declare the type as
part of the process of loading the string into a register (and, in this
case, the VM will remember the type of string).

though, granted, in this VM, the string type will be handled by
implicitly using tag bits in the reference. this allows using a
reference directly to the string-table memory, without needing an
intermediate structure (so, it is sort of like a normal 'char *'
pointer, just with a hidden type-tag in the reference).

> Those packing algorithms are all for larger texts. With relatively short
> one-liners one can perhaps make some special sub-string-packing but it
> will be computationally expensive to pack.

well, as noted, I had some limited success with MTF+Rice.
though, there isn't much that can be done with short character strings.

"Öö Tiib" <ootiib@hot.ee>: Jun 29 07:20AM -0700

On Monday, 29 June 2015 10:26:01 UTC+3, BGB wrote:
> >>>>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble wrote:
> >>>>>>>> On 26/06/2015 21:31, JiiPee wrote:
> >>>>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:

... Snip for focus

> like, you really don't want a random edge-case throwing a wrench in the
> timing, and fouling up external electronics by not updating the IO pins
> at the correct times or something.

On that case RB tree is maybe most stable ... on any case it seems more
stable and generally quicker than AVL.

> with hardware-level IO on the same processor core, but alas... (why have
> a separate ARM chip and an MCU, when you can save some money by just
> doing everything on the main ARM chip?...).

You certainly may need separate processor for more demanding I/O (WiFi
network for example) otherwise yes.

> reference directly to the string-table memory, without needing an
> intermediate structure (so, it is sort of like a normal 'char *'
> pointer, just with a hidden type-tag in the reference).

I meant that if we really wan't to have byte pointers into our string
literals then there is indeed no way to keep things compressed
behind scenes. However If we want to have a "text" available in
limited occasions, for example like GNU gettext library provides
then we can have the texts compressed most of the time.

> > will be computationally expensive to pack.

> well, as noted, I had some limited success with MTF+Rice.
> though, there isn't much that can be done with short character strings.

If all texts in application are UTF-16 then Huffman coding compresses about
2.6 times (minus the decoder and its data of course). So the only question is
if there is big enough amount of texts to be handled by application to bother
with it.

BGB <cr88192@hotmail.com>: Jun 29 12:18PM -0500

On 6/29/2015 9:20 AM, Öö Tiib wrote:
>> at the correct times or something.

> On that case RB tree is maybe most stable ... on any case it seems more
> stable and generally quicker than AVL.

could be.

more analysis and testing may be needed.

checking for memory density.
( note: Implicitly assuming a slab allocator or similar, so negligible
overhead apart from alignment ).

say, tree node for a binary tree (BST1):
struct BiNode1 {
BiNode1 *left, *right;
SlotInfo *key;
byte depth;
//pad:3 bytes
Value val;
};
memory cost: ~24 bytes, about 33% payload.
assume: node is either a node or a leaf.
for a node, both left and right are filled and key is a pivot, nodes do
not hold a value.
for a leaf, both left and right are NULL.

or (BST2):
struct BiNode2 {
SlotInfo *key;
byte depth;
//pad: 7 bytes
union {
struct {
BiNode2 *left, *right;
}b;
Value val;
}u;
};
memory cost: ~16 bytes, about 50% payload.
same behavior as BST1, but reducing memory by eliminating
mutually-exclusive data.

BST3:
case of BST1, where each node always contains a value, reducing node counts.
each node has a value, and 1 or 2 child nodes (whereas a leaf lacks
child nodes).

for an order-12 B-Tree (BT1):
#define MSX_BT_KEY 12
struct BtNode {
BtNode *next;
SlotInfo *key[MAX_BT_KEY];
byte nkey;
byte depth;
//pad: 2 bytes
union {
BtNode *child[MAX_BT_KEY];
Value val[MAX_BT_KEY];
}u;
};

Cost: 152 bytes, 63% payload.

for an order-6 B-Tree (BT2):
Cost: 96 bytes, 50% payload.

memory estimates (count, type, total bytes, memory efficiency):
1 item(1L): BST1: 24 bytes (33.3%)
1 item(1L): BST2: 16 bytes (50.0%)
1 item(1L): BST3: 24 bytes (33.3%)
1 item(1L): BT1: 152 bytes ( 5.3%)
1 item(1N): BT2: 96 bytes ( 8.3%)

2 items(1N,2L): BST1: 72 bytes (22.2%)
2 items(1N,2L): BST2: 48 bytes (33.3%)
2 items(1N,1L): BST3: 48 bytes (33.3%)
2 items( 1L): BT1: 152 bytes (10.1%)
2 items( 1L): BT2: 96 bytes (16.6%)

4 items(3N,4L): BST1: 168 bytes (19.0%)
4 items(3N,4L): BST2: 112 bytes (28.6%)
4 items(2N,2L): BST3: 96 bytes (33.3%)
4 items( 1L): BT1: 152 bytes (21.1%)
4 items( 1L): BT2: 96 bytes (33.3%)

8 items(7N,8L): BST1: 360 bytes (17.8%)
8 items(7N,8L): BST2: 240 bytes (26.6%)
8 items(4N,4L): BST3: 192 bytes (33.3%)
8 items( 1L): BT1: 152 bytes (42.1%)
8 items(1N,2L): BT2: 288 bytes (22.2%)

16 items(15N,16L): BST1: 744 bytes (17.2%)
16 items(15N,16L): BST2: 496 bytes (25.8%)
16 items( 8N, 8L): BST3: 384 bytes (33.3%)
16 items( 1N, 2L): BT1: 456 bytes (28.1%)
16 items( 1N, 3L): BT2: 384 bytes (33.3%)

32 items(31N,32L): BST1: 1512 bytes (16.9%)
32 items(31N,32L): BST2: 1008 bytes (25.4%)
32 items(16N,16L): BST3: 768 bytes (33.3%)
32 items( 1N, 3L): BT1: 608 bytes (42.1%)
32 items( 1N, 5L): BT2: 576 bytes (44.4%)

so, it is a lot closer than I was thinking...

BST has an advantage on the memory front for small item counts, but
loses them as item counts increases.

though, it is possible that a B-Tree could suffer from an unfavorable
pattern of inserts which could reduce its memory efficiency (causing it
to under-perform vs a BST variant).

could rig-up some benchmarks and compare them...

>> doing everything on the main ARM chip?...).

> You certainly may need separate processor for more demanding I/O (WiFi
> network for example) otherwise yes.

a lot of things in my case are things like running motors and dealing
with external sensors and similar (some of the sensors generate PWM
signals).

a lot of this stuff runs in the kHz range (say, 25-50 kHz or so), so it
is necessarily to be able to respond quickly enough to keep everything
running smoothly.

internally, there might also be fun like interpreting G-Code or similar
(telling the machine the desired positions to move the motors to, what
RPM the tool spindle should turn, ...).

the same basic code has also been used in a small wheeled robot, and
should be applicable to other types of robots (such as ones with
articulated limbs).

networking is used to some extent to control these, but the network
hardware seems able to handle a lot on its own (doesn't need anywhere
near the levels of micro-management as motors or sensors)

> behind scenes. However If we want to have a "text" available in
> limited occasions, for example like GNU gettext library provides
> then we can have the texts compressed most of the time.

ok.

> 2.6 times (minus the decoder and its data of course). So the only question is
> if there is big enough amount of texts to be handled by application to bother
> with it.

dunno, it depends on the application.

in the VM, a lot of the string data tends to be things like imported
function names, type-signatures, ... some amount of this also tends to
be strings for any printed messages.

bytecode is also a big chunk of memory.

generally the VM's bytecode images are around 1/2 to 1/3 the size of
native x86 versions of the program (and also smaller than ARM code and
Deflate-compressed source code).

though, the bytecode was designed some with code density in mind. it
doesn't so much pull this off with the smallest possible opcodes, but
rather by trying to allow things to be done with a reasonably small
number of operations.

nevermind if the current VM goes and craps all over this by translating
the bytecode into relatively bulky threaded-code, but alas...

About virtual inheritance

Juha Nieminen <nospam@thanks.invalid>: Jun 29 08:14AM

> Everybody knows that with diamond inheritance we need virtual
> inheritance.

You don't *need* it. It's *one possible solution* to how to handle
diamond inheritance. C++ also offers another solution (which is not
only a bit more efficient, but also feasible in some situations).

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

"Öö Tiib" <ootiib@hot.ee>: Jun 29 09:39AM -0700

On Monday, 29 June 2015 11:14:50 UTC+3, Juha Nieminen wrote:

> You don't *need* it. It's *one possible solution* to how to handle
> diamond inheritance. C++ also offers another solution (which is not
> only a bit more efficient, but also feasible in some situations).

What is the solution? Actually I would more like to see a motivating
problem. I have not met a problem to solve with it. Typical examples
from books are also awfully abstract:

Base
/ \
/ \
/ \
Der1 Der2
\ /
\ /
\ /
Join

Uhh? What these Base, 'Der1', 'Der2' and 'Join' are? I don't have those
anywhere. However ... I have just learned it for case that maybe
someday I meet a case where something somewhere needs that.

Checking if a linked list is circular with smart pointers

Christopher Pisz <nospam@notanaddress.com>: Jun 29 11:14AM -0500

On 6/28/2015 9:53 AM, Paul wrote:

> }

> Paul

I didn't examine the code too closely, but I don't really see any reason
to use shared_ptr. Just use a raw pointer.

Consider these things :

The pointers should be private to the list class and its nodes, no user
should ever see them.

shared pointers are for when ownership of the object is going to be
"shared", which should not be the case for objects in the list

ownership of the stored elements should not be shared. The list should
own them and maintain them.

When access to an element from the outside is requested, a copy should
be made. Thus the need to a constructor and copy constructor on any type
being contained.

raw pointer is faster (and simpler) than shared pointer.

It's easy enough to identify the operations that require a new or delete
when the list completely maintains itself.

--
I have chosen to troll filter/ignore all subthreads containing the
words: "Rick C. Hodgins", "Flibble", and "Islam"
So, I won't be able to see or respond to any such messages
---

Zeroing in the constructor

"Öö Tiib" <ootiib@hot.ee>: Jun 29 06:25AM -0700

On Friday, 26 June 2015 12:12:41 UTC+3, Daniel wrote:
> On Thursday, June 25, 2015 at 9:20:20 PM UTC-4, Öö Tiib wrote:

Sorry didn't somehow notice it.

> >Math functions we fortunately expect
> > to be non-members.

> I gather you don't care for extension methods :-)

Oh not so. I just don't like "methods" that actually are "pure functions" in
my mind. For example 'std::string::substr' is fully referentially transparent
function. Substring is not property, component, aggregate (or other
associate) or product of string so it feels like a function not method.

Otherwise there is both attractive clarity and convenience in method
syntax and while there are also some questions and a controversy still
to be solved the extension methods feel good idea. Consider the two
subroutine calling syntaxes:

1) first_parameter.subroutine( second_parameter )
2) subroutine( first_parameter, second_parameter )

Clarity is that IMHO 1) says that the 'first_parameter' is the main,
non-optional IN-OUT argument of subroutine. 2) does not tell anything
like that. There are no reason why the subroutine must be declared
within class definition to achieve that clarity.

Convenience is that during typing 1) smarter code editors can likely offer
faster and better auto-complete choices than 2).

Most difficult question for me is perhaps how the (already complex)
overloading and name-hiding rules will work when extension methods
will enter the playground.

Controversy is like usual that every feature is possible to abuse. Some
people *will* use it for the convenience above even where 'first_parameter'
is for example optional or some SINK-IN parameter (if it is allowed) and
so the outcome looks confusing. But that can be left for coding standards
to resolve like always.

So I am generally thinking positively about extension methods.

Fwd: '' 寧做老妖精，不做老太婆 ''

活一天美一天，永不放棄。

霜催雪壓增顏色！

以此互勉，相期百歲，如何？

>
>>>> 不錯的文章，不知是否適用於老頭子?
>>>>> 寧做老怪物, 不做糟老頭，

Sunday, June 28, 2015

Digest for comp.lang.c++@googlegroups.com - 22 updates in 6 topics

comp.lang.c++@googlegroups.com

Google Groups

Topic digest
View all topics

Avoid 'int', 'long' and 'short'... - 9 Updates
style guides on "#undef" - 4 Updates
Gratuitous buffer flushing - 3 Updates
Gratuitous buffer flushing - 3 Updates
concrete classes - 1 Update
Checking if a linked list is circular with smart pointers - 2 Updates

Avoid 'int', 'long' and 'short'...

"Öö Tiib" <ootiib@hot.ee>: Jun 28 07:10AM -0700

On Sunday, 28 June 2015 16:53:35 UTC+3, Rosario19 wrote:
> so wuold be ok for | too and not etc
> etc

> where is the problem with endianess of the number?

Rosario, we did talk about keeping data in some portable binary
format. We have portable binary formats for to achieve that one
computer saves it to disk or sends over internet and other computer
reads or receives it and both understand it in same way.

Different computers may keep the numbers in their own memory with
different endianness. So when computer reads or writes the bytes
of binary format then it must take care that those are ordered
correctly. That is what we call taking care about endianness in
portable format.

BGB <cr88192@hotmail.com>: Jun 28 12:17PM -0500

On 6/28/2015 8:43 AM, Öö Tiib wrote:
> particularly funny since neither C nor C++ contain standard way for
> detecting endianness compile-time. There are some libraries that use
> every known non-standard way for that and so produce minimal code.

yeah.

I prefer fixed endianess formats, personally.

granted, probably most everyone else does as well, as most formats are
this way. just a few formats exist where the endianess depends on
whichever computer saved the file, with magic numbers to detect when
swapping is needed. I find these annoying.

some of my tools (script language and C compiler, as an extension) have
the ability to specify the endianess for variables and pointer types
(so, you can be sure the value is stored as either big or little endian,
regardless of native endianess), and implicitly also makes it safe for
misaligned loads/stores.

namely, if you know a value needs to be a particular way, then chances
are the person is willing to pay whatever CPU cycles are needed to make
it that way.

typically, for compile-time stuff, there is a mess of #define's and
#ifdef's for figuring out the target architecture and other things,
picking appropriate type-sizes and setting values for things like
whether or not the target supports misaligned access, ...

I guess it could be nicer if more of this were standardized.

> then it feels reasonable at least to consider 4 bit wide entries. The
> processors crunch numbers at ungodly speeds but it is 4 times shorter
> table than one with 16 bit wide entries.

could be, but the table entries in this case were fairly unlikely to be
that much below 16 bits (so 8 bit or smaller would not have been useful).

these were basically offsets within a TLV lump. where you would have one
TLV lump which contains lots of payload data (as an array of
variable-sized sub-lumps packed end-to-end), and a table to say where
each sub-lump is within that bigger lump (to better allow random access).

in most of the cases, the lumps were between kB or maybe up to a few MB,
so 16 or 24 bits are the most likely cases, and always using 32-bits "to
be safe" would be a waste.

it would be fairly easy to compress them further, but this would require
decoding them before they could be used, which was undesirable in this case.

> case? OTOH storage for texts can be significant if there are lot of texts
> or lot of translations. Number of PC software let to download and install
> translations separately or optionally.

yeah, probably should have been clearer.

this was for string literals/values in a VM.

in the predecessor VM, M-UTF-8 had been used for all the string literals
(except the UTF-16 ones), which mostly worked (since direct
per-character access is fairly rare), but it meant doing something like
"str[idx]" would take 'O(n)' time (and looping over a string
per-character would be O(n^2)...).

in the use-case for the new VM, I wanted O(1) access here (mostly to
make things more predictable, *), but also didn't want the nearly pure
waste that is UTF-16 strings.

however, the language in question uses UTF-16 as its logical model (so,
from high-level code, it appears as if all strings are UTF-16). in the
language, strings are immutable, so there is no issue with the use of
ASCII or similar for the underlying storage.

in C, it isn't really an issue mostly as C makes no attempt to gloss
over in-memory storage, so you can just return the raw byte values or
similar.

*: the VM needs to be able to keep timing latencies bounded, which
basically weighs against doing anything in the VM where the time-cost
can't be easily predicted in advance. wherever possible, all operations
need to be kept O(1), with the operation either being able to complete
in the available time-step (generally 1us per "trace"), or the VM will
need to halt and defer execution until later (blocking is not allowed,
and any operations which may result in unexpected behaviors, such as
halting, throwing an exception, ... effectively need to terminate the
current trace, which makes them more expensive).

for some related reasons, the VM is also using B-Trees rather than hash
tables in a few places (more predictable, if slower, than hashes, but
less memory waste than AVL or BST variants). likewise, because of their
structure, it is possible to predict in advance (based on the size of
the tree) approximately how long it will take to perform the operation.

> keeping the text dictionaries Huffman encoded all time. If to keep texts
> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
> are no need for archaic tricks like Windows-1252 or Code Page 437.

granted, but in this case, it is mostly for string literals, rather than
bulk text storage.

Windows-1252 covers most general use-cases for text (and is fairly easy
to convert to/from UTF-16, as for most of the range the characters map 1:1).
CP-437 is good mostly for things like ASCII art and text-based UIs.

for literals, it will be the job of the compiler to sort out which
format to use.

bulk storage will tend to remain in compressed UTF-8.
though a more specialized format could be good.

I had good results before compressing short fragments (such as character
strings) with a combination of LZ77 and MTF+Rice Coding, which for small
pieces of data did significantly better than Deflate or LZMA. however,
the MTF makes it slower per-character than a Huffman-based option.

basically, options like Deflate or LZMA are largely ineffective for
payloads much under 200-500 bytes or so, but are much more effective as
payloads get bigger.

JiiPee <no@notvalid.com>: Jun 28 08:01PM +0100

On 26/06/2015 20:39, Mr Flibble wrote:
> ... #include <cstdint> instead!

> /Flibble

Just watching Scott Meyers videos. He seems to also always use :
int a =9;

not
fastint32 a = 9;

if int was wrong, surely they would not teach people using int, right? :)

JiiPee <no@notvalid.com>: Jun 28 08:15PM +0100

On 28/06/2015 20:01, JiiPee wrote:

> not
> fastint32 a = 9;

> if int was wrong, surely they would not teach people using int, right? :)

also note that Scott recommends to use
auto a = 9;

so using auto. so letting the ccomputer to deside the type !!!
What will you say about this??? :)

woodbrian77@gmail.com: Jun 28 12:56PM -0700

On Saturday, June 27, 2015 at 6:47:38 PM UTC-5, Öö Tiib wrote:

> All programs that use sounds or images technically use binary formats
> but those are abstracted far under some low level API from programmers.
> I did not mean that.

I didn't mean that either.

Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 10:13PM +0200

On 28-Jun-15 9:15 PM, JiiPee wrote:
> auto a = 9;

> so using auto. so letting the ccomputer to deside the type !!!
> What will you say about this??? :)

Using `auto` to declare a variable without an explicit type communicates
well to the compiler but not to a human reader. Also it's longer to
write and read than just `int`. And one can't generally adopt this as a
convention, e.g. it doesn't work for a variable without initializer, so
it's not forced by a convention.

Therefore I consider it an abuse of the language.

As to why, I guess that Scott has to use all kinds of fancy features
just to grab and keep interest from the audience. And maybe so that
novices can ask "what's the `auto`, huh?", so that he can explain it.
Explaining things and giving advice is after all how he makes a living.

Cheers & hth.,

- Alf

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

JiiPee <no@notvalid.com>: Jun 28 09:22PM +0100

On 28/06/2015 21:13, Alf P. Steinbach wrote:
>> What will you say about this??? :)

> Using `auto` to declare a variable without an explicit type
> communicates well to the compiler but not to a human reader.

In Visual Studio you can hoover the mouse and see the real type quite
easily, also works on other compilers.

> Also it's longer to write and read than just `int`. And one can't
> generally adopt this as a convention, e.g. it doesn't work for a
> variable without initializer, so it's not forced by a convention.

think about a funktion returning the size of elements in and
container... you could wrongly put:
unsigend long getSize();

if it actually returs a 64 bit integer. auto would find the right type
straight away.

> Therefore I consider it an abuse of the language.

in many places it makes coding safer because the auto always finds the
correct type. You can get bugs by putting a wrong type... and people
have done that when reading forums.

> just to grab and keep interest from the audience. And maybe so that
> novices can ask "what's the `auto`, huh?", so that he can explain it.
> Explaining things and giving advice is after all how he makes a living.

with comples types it can increase safety, because auto always gets it
right. We might get the type wrong which migh cause bugs.

But auto definitely is not always best for sure even if somebody likes it.

JiiPee <no@notvalid.com>: Jun 28 09:23PM +0100

On 28/06/2015 21:13, Alf P. Steinbach wrote:

> Using `auto` to declare a variable without an explicit type
> communicates well to the compiler but not to a human reader. Also it's
> longer to write and read than just `int`.

but if you take an everage value of all types the auto would win big
time. on average auto makes typenames much shorter if all types are
considered.

jt@toerring.de (Jens Thoms Toerring): Jun 28 10:33PM

> > communicates well to the compiler but not to a human reader.

> In Visual Studio you can hoover the mouse and see the real type quite
> easily, also works on other compilers.

Please keep in mind that VS is an IDE with an attached compiler
(beside a lot of other things). So this won't work "on other
compilers", since a compiler is a program to comile code and
not something you can "hoover over" with the mouse. You may
be surprised, but not everyone is using an IDE (for various
reasons) - or even a graphical user interface - all of the
time (and thus a mouse or something similar)...

> unsigend long getSize();

> if it actually returs a 64 bit integer. auto would find the right type
> straight away.

If you define a variable and assign to it the return value
of a function then it's relatively clear what the type will
be - it can be easily found out by looking at the function
declaration. But something like

auto a = 0;

is a bit different: you have to very carefully look at that
'0' to figure out if this will end up being an 'int' or per-
haps something else? And it can be prone to getting the wrong
type by vorgetting (or mis-typing) some character after the
'0' that makes the variable have a different type. There's
definitely a readability issue.

> in many places it makes coding safer because the auto always finds the
> correct type. You can get bugs by putting a wrong type... and people
> have done that when reading forums.

Yes, but cases like

int a = 0f;

are places where this isn't the case. 'auto' is very usefulf
in cases like

for ( auto it = xyz.begin(); it != xyz.end(); ++i )

instead of maybe

for ( std::pair< std::vector< std::pair< int, char const * >, double >, std::vector< std::string > >::iterator it = xyz.begin( ); it != xyz.end( ); ++i )

since you will be aware of the type of 'xyz', but

auto a = 0ull;

is different since it makes the type of 'a' hard to recognize
at a glance. And you may not forget anything of the 'ull' bit
at the end or you'll get something you never wanted and thus
don't expect. It actually creates a new class of possible bugs.

Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de

style guides on "#undef"

David Brown <david.brown@hesbynett.no>: Jun 28 09:53PM +0200

On 28/06/15 21:19, Stefan Ram wrote:

> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code

Use guide C - avoid macros unless they really are the clearest and best
way to solve the problem at hand. But don't use #undef except in
/really/ special code, as it leads to confusion - macros should normally
have exactly the same definition at all times in the program, or at
least within the file.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 10:24PM +0200

On 28-Jun-15 9:19 PM, Stefan Ram wrote:

> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code

Ordinary include guards are incompatible with guide A. Well I could
agree with a preference for #pragma once instead of include guards (and
just don't support any e.g. IBM compiler that doesn't support the
pragma), but /requiring/ that one doesn't use include guards is IMHO to
go too far. It's more work and less clear, but if someone wants to, hey.

As an example that's incompatible with guide B, in Windows desktop
programming one will normally, nowadays, defined UNICODE before
including <windows.h>. The definition doesn't matter, just that it's
defined. But if it is defined in code and there is a previous definition
one will get a sillywarning with e.g. g++ or Visual C++. And a simple
solution is to #undef it, like this:

#undef UNICODE
#define UNICODE
#include <windows.h>

And this is very normal code.

Rules to be mechanically followed are generally not compatible with C++
programming, which requires Some Intelligence Applied™.

Therefore I think that neither guide referred to and quoted above, can
be of very high quality.

Cheers & hth.,

- Alf
[Sorry, yet again I inadvertently applied Google Groups experience and
hit "Reply". I'm currently searching for the "Unsend" button.]

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

"Öö Tiib" <ootiib@hot.ee>: Jun 28 01:28PM -0700

On Sunday, 28 June 2015 22:19:56 UTC+3, Stefan Ram wrote:

> #undef should not normally be needed. Its use can lead
> to confusion with respect to the existence or meaning of
> a macro when it is used in the code

I use macros for things that are impossible without macros.
These are mostly things for better runtime debug diagnostics or
traces.
Examples:
I can't optionally use compiler-specific extensions without macros.
I can't get current source code file name, function name, line number
or compiling time without macros.
I can't both stringize and evaluate part of code without macros.

Otherwise I avoid macros. The ones that I use I define in general
configuration header that is included everywhere and I never #undef any
of those.

"Öö Tiib" <ootiib@hot.ee>: Jun 28 01:43PM -0700

On Sunday, 28 June 2015 23:24:36 UTC+3, Alf P. Steinbach wrote:
> just don't support any e.g. IBM compiler that doesn't support the
> pragma), but /requiring/ that one doesn't use include guards is IMHO to
> go too far.

IBM XL C/C++ certainly supports pragma once. AFAIK only Oracle Solaris
Studio does not support it from C++ compilers still under active
maintenance.
Forbidding include guards is still perhaps going too far with style
guide.

Gratuitous buffer flushing

ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 12:48AM

>Can anybody tell why would somebody want to flush the stream with end?

Usually, ::std::cout is flushed before ::std::cin is used
for reading or before the program exits, so one would want
to flush it, when this does not suffice.

See also: »::std::ios_base::unitbuf«, »::std::cin.tie()«.

ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 03:34PM

Just for fun I'd like to point out that the term of
»concrete class« might have changed in Lippman's
»C++ primer«.

The edition of 2005 still defines:

»A concrete class is a class that exposes,
rather than hides, its implementation.«.

This seems to comply with Stroustrups notion of »concrete
types« (and »concrete class« in that context).

But a more recent 5th edition of the »C++ primer« now seems
to use »concrete class« in the other sense of »a class that
is not an abstract class«, although it possibly does not
give an explicit definition for this term anymore.

A class that is not concrete but owns ressources sometimes
is called a »resource handle«. I would use this term for
::std::unique_ptr, but not for ::std::string, because in
the case of the former handling the resource is the primary
task, but in the case of the latter the resources is just
a means to be a variable-length (mutable) string.

What kind of classes are out there?

POD class
primitive class
regular class
trivially copyable type
trivial type
standard-layout type
canonical class
concrete class (a term with at least two different meanings)
abstract class
literal class
resource handle class
class with value semantics
class with reference semantics
constexpr class

Any other kind that comes to your mind?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 28 07:19PM

guide A:

Don't use macros! OK, (...) treat them as a last resort.
(...) And #undef them after you've used them, if possible.

guide B:

#undef should not normally be needed. Its use can lead
to confusion with respect to the existence or meaning of
a macro when it is used in the code

Gratuitous buffer flushing

"Öö Tiib" <ootiib@hot.ee>: Jun 28 08:21AM -0700

On Sunday, 28 June 2015 05:56:09 UTC+3, Richard wrote:

> printf("%s\n", s);
> fflush(stdout);

> ...and so-on.

Indeed because it adds clutter and we are lazy. If we had no 'endl' then
we would rarely write such code in C++ as well:

std::cout << i << '\n' << std::flush;
std::cout << s << '\n' << std::flush;

> If we wouldn't flush the buffer on every line in C, or in any other
> language that supported buffered I/O (C#, Java, etc.), why are we
> chronically doing this in C++?

We are not. We typically do not use <iostream> for massive (so it
does affect performance) text I/O and on case when we do then we
avoid 'operator<<' whatsoever since that thing trashes performance
even more terribly than superfluous flushing.

We do use the streams primarily for slow human-readable I/O and
even that is primarily for debugging. Now in debugging context
I have been actually annoyed that the damn 'printf' did not flush
it before it crashed or broke into breakpoint. Therefore it makes
sense in code that demonstrates some feature or crash to novice
to use 'endl' liberally because novice may want to step it in
debugger.

> I submit it is simply because people are immitating what they see
> around them without thinking about it.

You are correct that people do lot of things without thinking too
lot about it. Otherwise it is hard to get things done timely.
The particular topic is example of something that is good that you
brought up since I tend to use '\n' and 'std::endl' in mix but have
long stopped thinking about why I do it exactly like I do.

Rosario19 <Ros@invalid.invalid>: Jun 28 06:20PM +0200

On Sun, 28 Jun 2015 08:21:57 -0700 (PDT), 嘱 Tiib wrote:

>avoid 'operator<<' whatsoever since that thing trashes performance
>even more terribly than superfluous flushing.

>We do use the streams primarily for slow human-readable I/O and

i'm not agree
standard input and output can be use with trhu pipes
for connect programs

"Öö Tiib" <ootiib@hot.ee>: Jun 28 09:48AM -0700

On Sunday, 28 June 2015 19:20:43 UTC+3, Rosario19 wrote:

> i'm not agree
> standard input and output can be use with trhu pipes
> for connect programs

How that contradicts with what I wrote above? I do not understand where
is difference.

"Primarily" does not mean "always" but it means "for most part" and
"mainly". I wrote above even separately about the cases like the
one that you pointed out: "when we do then we avoid 'operator<<'
whatsoever since that thing trashes performance even more terribly
than superfluous flushing."

concrete classes

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 28 11:56AM -0400

On 6/28/2015 11:34 AM, Stefan Ram wrote:
> class with reference semantics
> constexpr class

> Any other kind that comes to your mind?

Empty class (sometimes used to denote a type that is different from any
other type in your program).

V
--
I do not respond to top-posted replies, please don't ask

Checking if a linked list is circular with smart pointers

Paul <pepstein5@gmail.com>: Jun 28 07:53AM -0700

On Thursday, June 25, 2015 at 2:28:35 PM UTC+1, Öö Tiib wrote:
> so you should keep. Iterators should not manage the object
> they navigate. Smart pointers (that automatically manage)
> are therefore very bad iterators.

This is my revised code which uses smart pointers. I also coded another direct way of testing for cycles by seeing if the pointers repeat. Does this seem ok? Thanks a lot for your feedback.

#include <cstdio>
#include <unordered_set>
#include <vector>
#include <algorithm>
#include <iostream>
#include <memory>

struct Node
{
int data;
std::shared_ptr<Node> next;

};

// A fast pointer and a slow pointer are both initiated at the head.
// Circular if the slow pointer is ever ahead of the fast pointer.
bool isCycle(std::shared_ptr<Node> head)
{
auto slowPointer = head;
auto fastPointer = head;

while(fastPointer && fastPointer->next && fastPointer->next->next)
{
slowPointer = slowPointer->next;
fastPointer = fastPointer->next->next;

if(fastPointer == slowPointer || fastPointer->next == slowPointer)
return true;
}

return false;
}

// A direct algorithm to tell if a cycle is present by seeing if a pointer address repeats.
bool isCycleDirect(std::shared_ptr<Node> head)
{
std::unordered_set<std::shared_ptr<Node>> nodePointers;

while(head)
{
// If trying to insert something already inserted, then must contain cycles.
if(nodePointers.find(head) != nodePointers.end())
return true;

nodePointers.insert(head);
head = head->next;
}

return false;
}

// Test against the expected results.
void testCycle(std::shared_ptr<Node> head, bool expected)
{
printf(isCycle(head) == expected ? "Results as expected\n" : "This test case failed\n");
}

// Set up tests for small numbers of nodes
void smallTests()
{
std::shared_ptr<Node> emptyList;
testCycle(emptyList, false);

std::shared_ptr<Node> List1(new Node);
std::shared_ptr<Node>ListCircular2(new Node);
std::shared_ptr<Node>ListNonCircular2(new Node);
std::shared_ptr<Node>ListCircular3(new Node);
std::shared_ptr<Node>ListNonCircular3(new Node);

List1->next = nullptr;
List1->data = 1;
testCycle(List1, false);

ListCircular2 = List1;
ListCircular2 -> next = ListCircular2;
testCycle(ListCircular2, true);

ListNonCircular2 = ListCircular2;
ListNonCircular2->next = std::shared_ptr<Node>(new Node);
ListNonCircular2->next->data = 2;
ListNonCircular2->next->next = nullptr;
testCycle(ListNonCircular2, false);

ListNonCircular3 = ListNonCircular2;
ListNonCircular3->next->next = std::shared_ptr<Node>(new Node);
ListNonCircular3->next->next->data = 3;
ListNonCircular3->next->next->next = nullptr;
testCycle(ListNonCircular3, false);

ListCircular3 = ListNonCircular3;
ListCircular3->next->next->next = ListCircular3;
testCycle(ListCircular3, true);

}

int main()
{
smallTests();
return 0;

}

Paul

"Öö Tiib" <ootiib@hot.ee>: Jun 28 08:40AM -0700

On Sunday, 28 June 2015 17:53:42 UTC+3, Paul wrote:
> > they navigate. Smart pointers (that automatically manage)
> > are therefore very bad iterators.

> This is my revised code which uses smart pointers. I also coded another direct way of testing for cycles by seeing if the pointers repeat. Does this seem ok? Thanks a lot for your feedback.

You decided to use smart pointers as iterators in 'isCycle'. I already did
try to explain why smart pointers are terrible iterators. If you don't
make or accustom a class for iterator then raw pointer is still better
than smart pointer.

Your 'isCycleDirect' seems quite expensive to make 'unordered_set' of
whole list. You should perhaps try and compare the two with list of
million of entries.

Your 'smallTests' is still broken in sense that it leaks memory.

soft and program

Tuesday, June 30, 2015

Digest for comp.lang.c++@googlegroups.com - 25 updates in 7 topics

Monday, June 29, 2015

Digest for comp.lang.c++@googlegroups.com - 13 updates in 5 topics

Fwd: '' 寧做老妖精，不做老太婆 ''

Sunday, June 28, 2015

Digest for comp.lang.c++@googlegroups.com - 22 updates in 6 topics

Blog Archive

About Me