soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Improving runtime by preventing a hashtable from growing - 5 Updates
[QT creator on "nix"] - "outer" padding within arrays of structures - was getting a strict 8 bit (1 byte) array with no padding - 2 Updates
[QT creator on "nix"] - getting a strict 8 bit (1 byte) array with no padding - 9 Updates
Almost Always Auto? (was Re: Substitution failure can be an error) - 2 Updates
Where are the exception-objects allocated? - 1 Update

Improving runtime by preventing a hashtable from growing

Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 23 09:38AM

On Fri, 2019-11-22, Paul wrote:
> Given a sequence in the form of a vector<int> and a non-negative integer r,
> find the length of the longest subsequence which has range <= r.
> By "range", I mean the stats definition of max - min.

Surely that problem must be possible to express in a more
understandable way. What does "the stats definition of max - min"
mean? What does "stats" mean, for that matter?

Since max(subseq) <= max(seq) and min(subseq) >= min(seq), it seems to
me the answer is always "the full sequence" or "there isn't one".

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paul <pepstein5@gmail.com>: Nov 23 04:40AM -0800

On Saturday, November 23, 2019 at 9:38:42 AM UTC, Jorgen Grahn wrote:
> mean? What does "stats" mean, for that matter?

> Since max(subseq) <= max(seq) and min(subseq) >= min(seq), it seems to
> me the answer is always "the full sequence" or "there isn't one".

Thanks for your interest.
The answer depends on both the vector and r.
The vector can be considered as a sequence with vec[0], vec[1] being the
first two terms for example.
As a sequence it has lots of subsequences.
Consider all the subsequences S which have the property that max(S) - min(S) <= r.
The problem is to find the maximum length of S for all subsequences S which
have that property.

Another way to say it is that we want the maximum length of all subsequences
with range <= r.
However, that might be confusing because there's a set-theoretic definition
of "range" and a statistical definition of "range" and these definitions
are very different. I mean the statistical definition of "range".

Paul Epstein

Paul <pepstein5@gmail.com>: Nov 23 04:48AM -0800

On Saturday, November 23, 2019 at 12:41:08 PM UTC, Paul wrote:
> However, that might be confusing because there's a set-theoretic definition
> of "range" and a statistical definition of "range" and these definitions
> are very different. I mean the statistical definition of "range".

As an example, remember that my function is called hashLongest
hashLongest({2,1,3,1}, 1) = 3. [2, 1 ,1] is a subsequence which has
range 1. That subsequence has length 3. If any subsequence has range <=1
then its length will always be <= 3. That is why 3 is the answer for this example.

Paul

"Öö Tiib" <ootiib@hot.ee>: Nov 23 02:20PM -0800

On Friday, 22 November 2019 19:31:22 UTC+2, Paul wrote:

> The code is below.
> I solved it by hashing as this is much faster than sorting for long vectors.
> (I tested for vectors of size >= 10 million).

Perhaps your vectors were biased to have only narrow
range of integers in those and searched for ranges were very low (like
1 or 2) otherwise your solution is clearly sub-optimal solution of the
given problem.

The std::sort of vector of 10 millions of really random integers
tends to be about twice faster than counting equal elements of it into
hash table. Also searching for longest subsequence in sorted vector is
O(length) linear while your search from hashtable is O(length*range)
rectangular.

Paul <pepstein5@gmail.com>: Nov 23 03:02PM -0800

On Saturday, November 23, 2019 at 10:20:57 PM UTC, Öö Tiib wrote:
> hash table. Also searching for longest subsequence in sorted vector is
> O(length) linear while your search from hashtable is O(length*range)
> rectangular.

Thanks for your interest.
My experimental results don't agree with your post.
***However, there is definitely the possibility that my testing is flawed.***
Accordingly, I will post my entire code, including testing.
I will report the results for r = 100 although many other values of r
give similar results.

The vector has a size of 100 million.
It is generated by 100 million calls to std::rand().
The sorting algorithm code and hashtable code will be posted below.
The two algorithms were tested against the same vector.
The longest subsequence with a range of <= 100 had length 310012
The hashing algorithm took 2.4 seconds and the sorting algorithm
took 8.3 seconds.
The IDE was a free version of Visual Studio

// ************************ ALGOS AND TEST ******************************
// Given a sequence expressed as a vector, and given a non-negative integer r:
// Find the maximum length of all subsequences which have a range <= r.
// range is defined to be the maximum - minimum as in statistics.
// This solution should have O(rN)
// Originally, this was in a coding test with r = 1.

#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <utility>
#include <climits>

int hashLongest(const std::vector<int>& vec, int r)
{
// For each int in the vector, count how many times it occurs.
std::unordered_map<int, int> allCounts;

for (int v : vec)
++allCounts[v];

// For each int k in the vector, count how many values of x occur
// such that x >= k and x <= k + r
std::unordered_map <int, int> leftCounts;
// The maximum value of leftCounts is the result

int Max = 0;
for (auto a : allCounts)
{
for (int i = 0; i <= r; ++i)
if ((long long)a.first + i > INT_MAX)
break;
else // if (allCounts.find(a.first + i) != allCounts.end())
leftCounts[a.first] += allCounts[a.first + i];

Max = std::max(Max, leftCounts[a.first]);
}

return Max;
}

// Previous sorting approach to solve the same problem.
// length is continually incremented.
int sortLongest(std::vector<int>& vec, int r)
{
std::sort(vec.begin(), vec.end());
int length = 0;
for (int behind = 0; behind < vec.size(); ++behind)
{
int ahead = behind + length; // Maximal length uncovered so far.
// After sorting, see if lower point of the maximal length subinterval is within r of the uppermost point
// If so we can extend the length and move to the next point for testing.
while (ahead < vec.size() && (long long)vec[behind] + r >= vec[ahead])
++ahead, ++length;
}
return length;
}

// Randomly generate a testing vector
// Default to a length of ten million.
std::vector<int> generate(int length = 100'000'000)
{
std::vector<int> vec(length);
for (int& i : vec)
i = std::rand();

return vec;
}

// A given vector is tested using either the sorting method
// or the hashing method.
// A bool is true for the hashing method and false otherwise.
// Returned is a pair whose first value is the maximum length
// according to the problem constraints, and whose second value
// is the number of microseconds taken.
std::pair<int, int> longest(std::vector<int>& vec, int r, bool hash)
{
auto start = std::chrono::high_resolution_clock::now();
int maxLength = hash ? hashLongest(vec, r) : sortLongest(vec, r);
auto stop = std::chrono::high_resolution_clock::now();
int time = std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();
return { maxLength, time };
}

// A display function to display the information from a run
void display(bool hash, std::pair<int, int> info, int length = 10'000'000, int r = 1)
{
std::cout << "Experiment done on a randomly generated vector of length " << length;
std::cout << "\nAn attempt was made to find the maximum length of a subsequence of range " << r;
std::cout << "\nThe method used was " << (hash ? "hashing" : "sorting");
std::cout << "\nThe result is " << info.first << " and the time taken in microseconds is " << info.second << std::endl;
}

// Test both methods.
int main()
{
std::vector<int> vec = generate();

std::pair<int, int> hashResult = longest(vec, 100, true);
display(true, hashResult, vec.size(), 100);

std::pair<int, int> sortResult = longest(vec, 100, false);
display(false, sortResult, vec.size(), 100);

}

[QT creator on "nix"] - "outer" padding within arrays of structures - was getting a strict 8 bit (1 byte) array with no padding

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 23 01:17PM +0100

I have realized I don't know an aspect of padding vs packing
management.

In the other thread I have been explained of the aspects of
padding/packing INSIDE a given structure/class/union and
similar.

But now I realize I know nothing for sure about packing in
between different element within arrays of such
structure/class/union and similar.

for a start I haven't found a documentation about the
#pragma pack directive applied to a whole array

is it possible (and how) to suggest the compiler to pack
items ? (*)
Are there differences in the outcome of such hints depending
on actual size of the packed structures ?
I am figuring out a packed struct of ODD total size, when
allocated in an array, it could lead to either non packed
array or to addresses in ODD/EVEN pattern (is it possible ?
Non asking if efficient, just possible).

secondly, how much is it, reasonably, the no-padding
threshold in an array of class/struct/union objects ?

There is some criterion, preferably less restrictive than
"powers of two" size of objects" ?

I feel quite sure that 64 bit multiple are packed without
any suggestion.
But, for example, 32 bit ? 16 bit ?

The hardware per se has no problem : in fact it can manage
COMPACT arrays of native types (int, short int) without
memory waste. I would hope that the same could hold for
structured types of the same size, but not so sure.
What is the truth ?

Could packing (assuming it is possible to force the compiler
to do it) lead to addressing problems (of the contained data
member) ?
I am not speaking about SLOWING problems, but strictly
crashes or silent wrong addressing or ... dunno !

TY

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

Keith Thompson <kst-u@mib.org>: Nov 23 12:35PM -0800

> But now I realize I know nothing for sure about packing in
> between different element within arrays of such
> structure/class/union and similar.

Packing for arrays is not a thing.

Every object type has a constant size, which you can query using sizeof.
Each element of an array is allocated the size of the element's type.

For example, given:

some_type arr[N];

it's guaranteed that sizeof (arr) == N * sizeof (some_type). (We can
use that to determine the number of elements in an array: sizeof arr /
sizeof arr[0]).

If a non-standard #pragma pack has been applied to some_type itself,
that might change the value of sizeof (some_type), but it does not
change the relationship between sizeof (some_type) and sizeof
(some_type[N]).

The size of any type is a multiple of its alignment, which is what
makes arrays possible.

If some_type is

struct some_type {
long big;
char small;
};

then there is likely to be padding after small, precisely so that in an
array of some_type elements each big sub-element is correctly aligned.
The padding is within the structure. There is no padding between array
elements.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[QT creator on "nix"] - getting a strict 8 bit (1 byte) array with no padding

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 23 12:25AM +0100

Il 22/11/19 20:11, James Kuyper ha scritto:
>> I have abandoned the unsigned Bit_Field : 8 solution as soon
>> as you (a lot of people) stated that u_int_8 MUST have

> uint8_t

here the macro or typedef is

u_int8_t actually

> any C99+ implementation that fails to support uint8_t, and on any C
> platform where unsigned char doesn't have 8 bits, you can't do what you
> want to do anyway.

ok, that's a fact

> the fact that you are trying to pack something that cannot possibly be
> bigger than a byte, and therefore cannot have an alignment requirement
> bigger than 1, that renders the pragma useless and unnecessary.

when I say "in that case" I am not extrapolating, I am
contextualizing to the specific case

> ...
>> u_int_32 AllChannels;

> uint_32

maybe QT headers decorate differently

here it is u_int32_t

>> the files are packed.

> OK - which fields are positioned differently from the way they would be
> positioned if not packed?

as I said, I do not have a working complete program yet, and
I hope to get to a scenario where this wrong situation is
avoided at all

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

James Kuyper <jameskuyper@alumni.caltech.edu>: Nov 22 11:56PM -0500

On 11/22/19 6:25 PM, Soviet_Mario wrote:

>> uint8_t

> here the macro or typedef is

> u_int8_t actually

Everyone who's been talking with you about this subject has been talking
about std::uint8_t from <cstdint>. If you're talking about something
else, you need to identify what it is and where is it is declared.

In gcc, and in many other compilers as well, there's a command line
option (-E in gcc) that produces output after pre-processing, but before
any other of the translation phases, but with #line directives inserted
to make sure that if you compile the output from -E, any resulting error
messages will refer to the correct file.
Compile one of the modules that uses u_int8_t using -E. Search the
output for the first occurrence of u_int8_t, which should be a typedef
declaration. What file is referred to by the last #line directive
preceding that declaration?

...
>>> u_int_32 AllChannels;

>> uint_32

> maybe QT headers decorate differently

A search of the QT online reference manual for u_int8_t turns up nothing:
<https://doc.qt.io/qt-5/search-results.html?q=%22u_int8_t%22>

Nor do I get anything from u_int_32:
<https://doc.qt.io/qt-5/search-results.html?q=%22u_int_32%22>

Whatever is the source of this typedef, I don't think it's QT.

> as I said, I do not have a working complete program yet, and
> I hope to get to a scenario where this wrong situation is
> avoided at all

I think you can avoid it perfectly without doing anything at all about
it, because it can't occur. I recommend not wasting any more time
worrying about it until you can come up with a plausible reason for
doing so.

Keith Thompson <kst-u@mib.org>: Nov 22 10:04PM -0800

Soviet_Mario <SovietMario@CCCP.MIR> writes:
[...]

> without PRAGMA, do I have any explicit or implicit warranty
> that my union be 32 bit long ? If so, PRAGMA is useless. If
> not, it seems it might be useful.

Fixing the spelling of uintN_t (n=8,32) and adding a tag to the union so
you can use it to define objects:

union u {
struct {
uint8_t Red, Green, Blue, Alpha;
};
uint32_t AllChannels;
};

(You have an anonymous struct, which is standard but wasn't added until
C11, so older compilers might not support it, or might not support it
by default.)

You're guaranteed that the Red and AllChannels members start at the same
location, the very beginning of the union. You're guaranteed that Red,
Green, Blue, and Alpha are at increasing offsets in the order you
defined them.

It's very very likely that the struct and the union will both be
exactly 32 bits. A compiler is not prohibited from adding padding
between members or after the last one, but there's unlikely to be
any good reason to do so in this case. The most likely way this
could fail is if a compiler insisted on making the structure 64 bits.

You can easily add a compile-time or run-time assertion that
sizeof (struct u) == 4 && CHAR_BIT == 8
so your program won't compile, or at least won't run, in the
unlikely event that it's used with an implementation that doesn't
meet your requirements.

There is *no* guarantee about which of the four uint8_t members
correspond to which part of AllChannels. Google "endianness" if
you're not familiar with this. It's very common for Red to overlap
either the high-order byte or the low-order byte of AllChannels.
The other 22 permutations are rare or nonexistent, but Google
"PDP-11 middle-endian" for an odd case.

If you're doing bitwise operations on the entire 32-bit word without
caring about the order of the bytes, then that might not be an issue.

Attempting to use #pragma pack is very likely to do nothing *at best*.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 23 11:12AM +0100

Il 23/11/19 05:56, James Kuyper ha scritto:

> Everyone who's been talking with you about this subject has been talking
> about std::uint8_t from <cstdint>. If you're talking about something
> else, you need to identify what it is and where is it is declared.

I'm using QT who seems to like to replicate anything its own
way even when avoiding specific Q-prefixes.

so, HERE, including
#include <sys/types.h>

I find this type (everyone might guess it is equivalent to
the other), u_int8_t, which finally stems from a typedef to
UNSIGNED CHAR.

> any other of the translation phases, but with #line directives inserted
> to make sure that if you compile the output from -E, any resulting error
> messages will refer to the correct file.

I'm not so confident with QT environment to manually set
options to the compiler as yet, I just use the ide-forms to
set or unset some warning levels (I keep enabled almost all
even if the outcome is very verbose)

> Compile one of the modules that uses u_int8_t using -E. Search the
> output for the first occurrence of u_int8_t, which should be a typedef
> declaration.

it is an unsigned char, I saw it DIRECTLY opening the
included header file (sys/types.h)

this type is also replicated in another header
<stdint.h> actually, ending up always in unsigned char

> <https://doc.qt.io/qt-5/search-results.html?q=%22u_int8_t%22>

> Nor do I get anything from u_int_32:
> <https://doc.qt.io/qt-5/search-results.html?q=%22u_int_32%22>

sorry, but not an issue.

> Whatever is the source of this typedef, I don't think it's QT.

omg ... are you saying I am just lieing for the fun of it ???
what the heck am I supposed to know about which kind of
headers has included this install ?
I am using the FREE version from a .DEB package (which in
turn launched an updated and installed tons of "kits" by
itself").
I just opened the headers and never dreamed about editing
them. Just did a search.

the header file reports
/* Copyright (C) 1991-2016 Free Software Foundation, Inc.
This file is part of the GNU C Library.

The GNU C Library is free software; you can redistribute
it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any
later version.

The GNU C Library is distributed in the hope that it
will be useful,
but WITHOUT ANY WARRANTY; without even the implied
warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */

/*
* POSIX Standard: 2.6 Primitive System Data Types
<sys/types.h>
*/

and the macros and typedefs for such sized ints seem to
apply for GCC >= 2.7 (but I might misunderstand)

#if !__GNUC_PREREQ (2, 7)

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

Soviet_Mario <SovietMario@CCCP.MIR>: Nov 23 11:26AM +0100

Il 23/11/19 07:04, Keith Thompson ha scritto:
>> not, it seems it might be useful.

> Fixing the spelling of uintN_t (n=8,32) and adding a tag to the union so
> you can use it to define objects:

again ...
Keith, I did not edit the header manually, and sized ints
HERE are spelled as I wrote.

...omissis...

#ifdef __USE_MISC
/* Old compatibility names for C types. */
typedef unsigned long int ulong;
typedef unsigned short int ushort;
typedef unsigned int uint;

soft and program

Saturday, November 23, 2019

Digest for comp.lang.c++@googlegroups.com - 19 updates in 5 topics

No comments:

Blog Archive

About Me