soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Improving runtime by preventing a hashtable from growing - 17 Updates
Almost Always Auto? (was Re: Substitution failure can be an error) - 5 Updates
[QT creator on "nix"] - getting a strict 8 bit (1 byte) array with no padding - 3 Updates

Improving runtime by preventing a hashtable from growing

"Öö Tiib" <ootiib@hot.ee>: Nov 24 01:57AM -0800

On Sunday, 24 November 2019 01:02:33 UTC+2, Paul wrote:

> Thanks for your interest.
> My experimental results don't agree with your post.
> ***However, there is definitely the possibility that my testing is flawed.***

Note that 100 millions and 10 millions are different counts.
Also the randomness can be biased, IIRC then RAND_MAX was 32767 in
some Visual Studio. What it is in yours? With such tiny RAND_MAX
the winner algorithm would count into
std::array<int,RAND_MAX+1> allCounts; hash table is wasteful there.

Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 11:12AM +0100

> The std::sort of vector of 10 millions of really random integers
> tends to be about twice faster than counting equal elements of it
> into hash table. ...

LOL.

"Öö Tiib" <ootiib@hot.ee>: Nov 24 03:26AM -0800

On Sunday, 24 November 2019 01:02:33 UTC+2, Paul wrote:

> Thanks for your interest.

Can you report results of that on your platform:

#include <algorithm> // generate and sort
#include <chrono> // chrono
#include <iostream> // cout
#include <random> // random_device and mt19937
#include <unordered_map> // unordered_map
#include <vector> // vector

// note that std::random_device does nothing good on lot of platforms
// here it is just used for toy example that it is
static std::random_device rnd_device;
static std::mt19937 engine {rnd_device()};
static std::uniform_int_distribution<int> dist {0, INT_MAX};

int gen()
{
return dist(engine);
};

size_t hashcount(const std::vector<int>& vec)
{
// For each int in the vector, count how many times it occurs.
std::unordered_map<int, int> allCounts;

for (int v : vec)
++allCounts[v];

return allCounts.size();
}

int main()
{
std::vector<int> vec(10000000);
std::generate(begin(vec), end(vec), gen);

auto t1 = std::chrono::high_resolution_clock::now();
hashcount(vec);
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << "hash " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " ";

std::vector<int> vec2(10000000);
std::generate(begin(vec2), end(vec2), gen);

t1 = std::chrono::high_resolution_clock::now();
std::sort(begin(vec2),end(vec2));
t2 = std::chrono::high_resolution_clock::now();
std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " ";

return 0;
}

Paul <pepstein5@gmail.com>: Nov 24 03:48AM -0800

On Sunday, November 24, 2019 at 10:12:28 AM UTC, Bonita Montero wrote:
> > tends to be about twice faster than counting equal elements of it
> > into hash table. ...

> LOL.

Yes, I too was very surprised by this claim, and my initial surprise
is backed up by my experiment. I didn't feel confident enough to
challenge the claim so directly, though.

Paul

Paul <pepstein5@gmail.com>: Nov 24 03:56AM -0800

On Sunday, November 24, 2019 at 9:57:20 AM UTC, Öö Tiib wrote:
> some Visual Studio. What it is in yours? With such tiny RAND_MAX
> the winner algorithm would count into
> std::array<int,RAND_MAX+1> allCounts; hash table is wasteful there.

I started testing with a length of 10 million and tested with
a length of 100 million at a later stage.
Carelessly, I didn't update the comments and all defaults to reflect
this change but this is basically a documentation error.

RAND_MAX is indeed 32767 in my version of Visual Studio.
My intent is to treat the processes of testing the algos and designing the
algos as separate.
The vector generation with std::rand() is part of the testing, not
part of the algorithm.
Changing allCounts as you suggest would tie the algorithms to the testing.
In test driven development, people do indeed tie the algorithms to the
testing.
But I don't want to do it that way.
The algorithms should be independent of the methods used to test them.

Paul

"Öö Tiib" <ootiib@hot.ee>: Nov 24 04:13AM -0800

On Sunday, 24 November 2019 13:56:39 UTC+2, Paul wrote:
> testing.
> But I don't want to do it that way.
> The algorithms should be independent of the methods used to test them.

Then use random engines like std::mt19937 for generating test data.
These behave more similarly between platforms than std::rand and
so avoid confusion. Usage of std:rand with so low max and cycle size
does cause odd artifacts and effects so people do avoid it
even for dice rolling in entertainment.

"Öö Tiib" <ootiib@hot.ee>: Nov 24 04:27AM -0800

On Sunday, 24 November 2019 13:49:05 UTC+2, Paul wrote:

> Yes, I too was very surprised by this claim, and my initial surprise
> is backed up by my experiment. I didn't feel confident enough to
> challenge the claim so directly, though.

Note also that the claim keeps being true and your experiments used
narrow range of test values like I predicted.

Paul <pepstein5@gmail.com>: Nov 24 04:43AM -0800

On Sunday, November 24, 2019 at 12:14:01 PM UTC, Öö Tiib wrote:
> so avoid confusion. Usage of std:rand with so low max and cycle size
> does cause odd artifacts and effects so people do avoid it
> even for dice rolling in entertainment.

Good advice, here.
Thanks.

Paul

Paul <pepstein5@gmail.com>: Nov 24 04:44AM -0800

On Sunday, November 24, 2019 at 12:27:24 PM UTC, Öö Tiib wrote:
> > challenge the claim so directly, though.

> Note also that the claim keeps being true and your experiments used
> narrow range of test values like I predicted.

Yes, your claim is vindicated by your code that I tested on my platform
as requested.
I will report the results in my reply to the post of yours that contains
your code.

Paul

Paul <pepstein5@gmail.com>: Nov 24 04:48AM -0800

On Sunday, November 24, 2019 at 11:26:13 AM UTC, Öö Tiib wrote:
> std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " ";

> return 0;
> }

The console output was hash 5559216 sort 1263252
which vindicates your initial response.

It's an interesting example of big O analysis not being the whole story.
Although sorting is O(n * log (n) ), sorting seems best in this context
even though linear methods are available.

Paul

Paul <pepstein5@gmail.com>: Nov 24 05:40AM -0800

On Sunday, November 24, 2019 at 11:26:13 AM UTC, Öö Tiib wrote:
> std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " ";

> return 0;
> }

The line allCounts.reserve(vec.size()); gives a significantly improved
performance.
But this basically means that hashing becomes 3 times slower than sorting
instead of 4 times slower.

Paul

Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 03:04PM +0100

>> tends to be about twice faster than counting equal elements of it
>> into hash table. ...

> LOL.

On my old Ryzen 7 1800X, sorting of 10'000'000 integters takes more
than twice the time than counting with VC++2019 and nearly twice the
time than counting with g++.

#include <iostream>
#include <vector>
#include <unordered_set>
#include <limits>
#include <algorithm>
#include <random>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
size_t const N_ELEMENTS = 10'000'000;
random_device rd;
uniform_int_distribution<int> uid( numeric_limits<int>::min(),
numeric_limits<int>::max() );
vector<int> v;
unordered_set<int> s( N_ELEMENTS );
time_point<high_resolution_clock> start;
double seconds;
v.resize( N_ELEMENTS );
for( int &e : v )
e = uid( rd ),
s.insert( e );
start = high_resolution_clock::now();
size_t uniq = 0;
for( unordered_set<int>::iterator it = s.begin(); it != s.end(); )
{
++uniq;
int e = *it++;
for( ; it != s.end() && *it == e; ++it );
}
seconds = duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count() / 1.0E9;
cout << "unique elements: " << uniq << endl;
cout << "time to count: " << seconds << "s" << endl;
start = high_resolution_clock::now();
sort( v.begin(), v.end() );
seconds = duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count() / 1.0E9;
cout << "time to sort: " << seconds << "s" << endl;
}

And I reserved as equal buckets a there are values, which should speed
up counting anyway.

Paul <pepstein5@gmail.com>: Nov 24 06:34AM -0800

On Sunday, November 24, 2019 at 2:04:38 PM UTC, Bonita Montero wrote:
> }

> And I reserved as equal buckets a there are values, which should speed
> up counting anyway.

Thanks for this.
But your code seems to lose the context of the original task.
Fix r and fix a vector.
The task is to find the size of the largest subsequence of the vector
so that the range of that subsequence <= r.
As far as I know, non-sorting methods need to maintain a correspondence
in the form of key-value pairs {v, number of occurrences of v}
This is why Oo (sorry but I don't know how to create the diacritical marks)
and I both used unordered_map instead of unordered_set.
I am not surprised that unordered_set operations are quicker.

Oo's comment about relative speeds of hashing/sorting should be understood
in the context of the entire thread -- I don't think that he or she
intended the comment to be read as a standalone.

Paul

Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 03:42PM +0100

> But your code seems to lose the context of the original task.

I only wanted to falsify Öö Tiib statement.

Paul <pepstein5@gmail.com>: Nov 24 07:51AM -0800

On Sunday, November 24, 2019 at 2:42:43 PM UTC, Bonita Montero wrote:
> > But your code seems to lose the context of the original task.

> I only wanted to falsify Öö Tiib statement.

It's very easy (and IMO silly) to falsify statements by ignoring the context, as you did.
Example: A. "I prefer white wine to red wine."
B. "Not me. I only ever drink red wine."
C, after observing B taking ibuprofen pills with water:
"B said 'I only ever drink red wine, but I've just seen B drinking water.
B is a liar."

Paul

Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 04:53PM +0100

>> I only wanted to falsify Öö Tiib statement.

> It's very easy (and IMO silly) to falsify statements by ignoring the context, as you did.

I didn't respond to the OP but to Öö Tiib.
And at this place my response was in the correct context.

Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 24 04:08PM

On Sat, 2019-11-23, Paul wrote:
>> mean? What does "stats" mean, for that matter?

>> Since max(subseq) <= max(seq) and min(subseq) >= min(seq), it seems to
>> me the answer is always "the full sequence" or "there isn't one".

I see now that I misread a part of your problem definition that /was/
clear. I read "... longest subsequence which has range >= r" which
led to the uninteresting problem.

Sorry! I must have been half asleep, or something.

> However, that might be confusing because there's a set-theoretic definition
> of "range" and a statistical definition of "range" and these definitions
> are very different. I mean the statistical definition of "range".

Ah, ok. I was unaware of "stats" as an abbreviation of "statistics"
(not that I remember how "range" is used in statistics anyway). It
would have been easier to explain it as "range(seq) = max(seq) -
min(seq)".

Anyway: I know understand the problem and see why it's fun/interesting to
solve.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Almost Always Auto? (was Re: Substitution failure can be an error)

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:26AM +0100

On 22.11.2019 22:46, Vir Campestris wrote:
> who owns our coding standards :(

> The argument is that given "x=4.2" you _know_ that x is a double,
> there's no point in saying so.

It requires /reading/ with analysis and comprehension as opposed to just
looking at.

And it fails for

auto hm = 0x1'0000'0000;

This type depends on the compiler.

> look at the declaration. Which if I'm unlucky is one of several
> overloads with template type deduction from the parameters.

> But IMHO reading matters more. You'll only write it once.

I prefer that the type is generally specified explicitly in a
declaration, either on the left side or explicitly in the initializer.

Exception: types that are effectively or formally unspecified, such as
iterator type or the type of an iostream manipulator.

Example, display the ASCII code:

#include <ctype.h> // ::isprint
#include <iomanip> // std::setw
#include <iostream>
#include <limits.h> // CHAR_MIN, UCHAR_MAX

using Byte = unsigned char;

auto is_noncontrol_ascii_char( const int code )
-> bool
{ return (code > CHAR_MIN and code < UCHAR_MAX and ::isprint( Byte( code
) ); }

auto main()
-> int
{
using std::cout, std::endl, std::left, std::setw;
constexpr auto& hex_digits = "0123456789abcdef";

const auto field = setw( 4 );
cout << left;

// Column headers.
cout << field << "";
for( int x = 0; x < 16; ++x ) { cout << field << hex_digits[x]; }
cout << endl;
cout << endl; // Spacer line to make the header row
stand out as such.

// Main table with a row header column to the left.
for( int y = 0; y < 8; ++y ) {
cout << field << hex_digits[y];
for( int x = 0; x < 16; ++x ) {
const int code = 16*y + x;
const auto ch = char( is_noncontrol_ascii_char( code )?
code: ascii::del );
cout << field << ch;
}
cout << endl;
}
}

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:28AM +0100

On 24.11.2019 00:26, Alf P. Steinbach wrote:
> cout << endl;
> }
> }

Oops. ">=", "<=". Oh well.

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:45AM +0100

On 24.11.2019 00:26, Alf P. Steinbach wrote:
> code that wouldn't even compile

Good idea to compile before posting, yes!

So here's the example, with a few instances of `auto` declarations of
things where either the type is irrelevant, or already explicit:

#include <ctype.h> // ::isprint
#include <iomanip> // std::setw
#include <iostream>
#include <limits.h> // CHAR_MIN, UCHAR_MAX

using Byte = unsigned char;

auto is_noncontrol_ascii_char( const int code )
-> bool
{ return (code >= CHAR_MIN and code <= UCHAR_MAX and ::isprint( Byte(
code ) ) ); }

auto main()
-> int
{
using std::cout, std::endl, std::left, std::setw;
constexpr auto& hex_digits = "0123456789abcdef";

const auto field = setw( 4 );
const auto ascii_del = char( 127 );

cout << left;

// Column headers.
cout << field << "";
for( int x = 0; x < 16; ++x ) { cout << field << hex_digits[x]; }
cout << endl;
cout << endl; // Spacer line to make the header row
stand out as such.

// Main table with a row header column to the left.
for( int y = 0; y < 8; ++y ) {
cout << field << hex_digits[y];
for( int x = 0; x < 16; ++x ) {
const int code = 16*y + x;
const auto ch = char( is_noncontrol_ascii_char( code )?
code: ascii_del );
cout << field << ch;
}
cout << endl;
}
}

- Alf

boltar@nowhere.co.uk: Nov 24 10:47AM

On Sun, 24 Nov 2019 00:26:14 +0100
>auto main()
> -> int

And the point of writing that instead of

int main()

is what exactly? Other than showing off of course.

Melzzzzz <Melzzzzz@zzzzz.com>: Nov 24 01:37PM

> And the point of writing that instead of

> int main()

> is what exactly? Other than showing off of course.

He doesn't like C++ ;)

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

[QT creator on "nix"] - getting a strict 8 bit (1 byte) array with no padding

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 24 12:54AM +0100

On 23/11/2019 21:27, Keith Thompson wrote:
> type names u_int{8,32}_t. It doesn't introduce u_int_{8,32}, and I
> don't see that name in the code you quoted (other than in your union
> definition) or anywhere under /usr/include on my system.

yes I miss-spelled forgetting the final _t (but the pieces
of headers pasted mentioned it)

> meet your requirements, and there is no good reason to use non-standard
> types with similar names.

> Do you have some reason to prefer to use u_int_8 rather than uint8_t?

no no reason at all. I just noticed this subtle difference
after having rewritten always :)
But It will take short to do a mass-replace and change the
included header

>> I guess the variations stands for the same concept

> Yes. Names other than uint8_t are non-standard and will cause your code
> not to compile in many environments.

well I must have mis-expressed again when I said I was
looking for a portable way, I actually meant I needed a way
that allowed proper data exchange from front-end gambas side
and C++ engine.

I'll never deploy the program anywhere else. So as long as
it compiles here, fine enough

>>> by default.)

>> it gives me warnings indeed, but seems to compile nevertheless

> What warnings?

that anonymous structs (or unions) are non standard and
supported only with GCC later than some version.

I'm using the "native" (= preinstalled) GCC here on MX (QT
itself did not ship with any own compiler in the free
version). I have installed the later build-essentials from
"stable" repository. I'm not very willing to update from
testing or backport.
Maybe if I found some flatpak or appimage version, more
isolated.

> If you compile with "gcc -std=c11" or with a
> sufficiently new version of gcc, there shouldn't be any.

I'm looking to assess the GCC version here (on Debian its a
bit older still).

back
here I have GCC 4:6.3.0-4
(and supporting packages seem 6.3.0-18)
anche il CPP riporta la stessa signature.

I found a 9.4-2 MX version and I'm adding it too.
I don't know if QT ide will automatically detect it (and I'm
sure I won't be able to configure it, otherwise)

> Always pay attention to warnings.

It was reading the "padding" warning that all this odessy
began :)

>> I thought : I give you an hint : TRY to pack. If you really
>> cannot, there is nothing else I could do

> There's no guarantee that all compilers will understand your "hint",

I just use GCC, not others

>> access to separate channels I use named members.

> What operations do you need on AllChannels that you can't perforom on
> the struct? You can assign structs.

uhm ... memberwise or all-at-once ?
I thought struct assignment (normally I wrote operator =)
should pass through memberwise copying.
So the aim was to copy everything in one assignment only
instead of four of them.
But if struct can be assigned smartly enough then the union
is not needed.

Also I duplicated pixel addressing. One slower way through
operator () (int row, int column) and a massive one
dimensional linear addressing for mass processing of all pixels.

>> 1 byte sizes, and the only big member starts at 0 offset)

> The drawback of #pragma pack is that it's non-standard. Given the
> concern you've expressed about portability,

sorry I realize I had been unwittingly misleading. I was
just thinking of interoperate gambas code and library code.
As long as it works on my two couple of machines that would
be enough. I have Linux MX on both, BTW, among other things

> I would think that would
> be important to you. And given that it will almost certainly do
> nothing, I don't know why you're so insistent on using it.

oh, sorry at last for my often unclear English. I tend to
sow misunderstandings everywhere due to this :\

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

Keith Thompson <kst-u@mib.org>: Nov 23 05:06PM -0800

>> definition) or anywhere under /usr/include on my system.

> yes I miss-spelled forgetting the final _t (but the pieces
> of headers pasted mentioned it)

To be very clear, if you need an unsigned integer type of at least 8
bits, there is almost no good reason not to use uint8_t. There is no
good reason to use some system-specific (or even undocumented)
equivalent.

"Almost" means you might have to consider older implementations that
don't support <stdint.h> / <cstdint>. Any version of g++ you're likely
to encounter should support C++11 or later -- older versions might
require an explicit option. Even older versions are likely to support
<stdint.h> or <cstdint> as an extension.

> and C++ engine.

> I'll never deploy the program anywhere else. So as long as
> it compiles here, fine enough

If you use uint8_t it will compile.

>> What warnings?

> that anonymous structs (or unions) are non standard and
> supported only with GCC later than some version.

It's always better to copy-and-paste diagnostic messages rather than
trying to summarize them.

BTW, I made a serious mistake here. C supports anonymous structs as
of C11. C++ does not support them at all. g++ does support them as
an extension. With "g++ -std=c17 -pedantic", I get "warning: ISO C++
prohibits anonymous structs [-Wpedantic]". My apologies for that.

If you're only using gcc, you can use anonymous structs in C++,
but your code will be non-portable.

[...]

> instead of four of them.
> But if struct can be assigned smartly enough then the union
> is not needed.

The semantics of struct assignment (assuming you haven't overloaded
operator=) are equivalent to the semantics of copying memberwise,
but in the case of a struct containing 4 uint8_t members, no sane
compiler will do anything other than a 32-bit copy. That's assuming
the struct has an alignment that permits a 32-bit copy.

If copying the entire structure is the only operation you care about,
keep it simple: just define the structure and use ordinary structure
assignment.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 24 12:10PM +0100

On 24/11/2019 02:06, Keith Thompson wrote:

>> yes I miss-spelled forgetting the final _t (but the pieces
>> of headers pasted mentioned it)

> To be very clear, if you need an unsigned integer type of at least 8

well strictly speaking I need a type of EXACTLY 8 bits, not more

>> supported only with GCC later than some version.

> It's always better to copy-and-paste diagnostic messages rather than
> trying to summarize them.

well, I did not copy the warning when the compiler produced it.
I would have to re-edit the source (or restore the older
version) to just replicate the output ... maybe messing sth :\

> of C11. C++ does not support them at all. g++ does support them as
> an extension. With "g++ -std=c17 -pedantic", I get "warning: ISO C++
> prohibits anonymous structs [-Wpedantic]". My apologies for that.

no problem !!! this is not an helpdesk, I get the good
offered for free with good intention. That's more than enough !

> If you're only using gcc, you can use anonymous structs in C++,
> but your code will be non-portable.

yes only GCC.
BTW : as I feared, QT (even after a system reboot ...
actually power went off tonight, here it poured dogs and
cats ! 160 mm of rain in 24 hours, a disaster) did not
automatically detect the new version from MX repo. it just
still see the original version.
I'll see to this later If i find how to configure it (and
locate it first)

> but in the case of a struct containing 4 uint8_t members, no sane
> compiler will do anything other than a 32-bit copy. That's assuming
> the struct has an alignment that permits a 32-bit copy.

I would think that the existence of such monolythic member
should ensure the alignment.
I should look the asm generated, but I can't be bothered :) :)

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Sunday, November 24, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me