Monday, January 6, 2020

Digest for comp.lang.c++@googlegroups.com - 22 updates in 6 topics

Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 05:39AM +0100

> all your test sizes for some data sets -- specifically when the data are
> already sorted. The factor is now smaller (no longer always greater
> than 2) but still between 1.5 and 2.
 
The discussion wasn't about presorted datasets and that's not usual.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 05:50AM +0100

> I've found the stable_sort in the header algorithm.
> I did not fully understand its internals ...
 
stable_sort is usually mergesort because mergesort is usually
the fastest stable sort algorithm.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 06:55AM +0100

> I've found the stable_sort in the header algorithm.
> I did not fully understand its internals ...
 
As I said stable_sort is usually mergesort.
I developed a parallel mergesort some month ago.
I stripped it to single-threaded and here's the code:
 
#include <iostream>
#include <iterator>
#include <vector>
#include <random>
#include <functional>
#include <memory>
#include <cassert>
 
template<typename It,
typename Cmp = std::less<typename
std::iterator_traits<It>::value_type>,
typename Allocator = std::allocator<typename
std::iterator_traits<It>::value_type>>
class merge_sort
{
public:
merge_sort( It start, It end, Cmp const &cmp = Cmp(), Allocator
const &alloc = Allocator() );
 
private:
Cmp m_cmp;
template<typename UpIt, typename BufIt>
void recursion( UpIt start, UpIt end, BufIt buf );
template<typename UpIt, typename BufIt>
void merge( UpIt start, BufIt leftBuf, BufIt rightBuf, BufIt bufEnd );
};
 
template<typename It, typename Cmp, typename Allocator>
merge_sort<It, Cmp, Allocator>::merge_sort( It start, It end, Cmp const
&cmp, Allocator const &alloc ) :
m_cmp( cmp )
{
using namespace std;
using T = typename iterator_traits<It>::value_type;
size_t bs = 0;
// calculate buffer-/stack-size
for( size_t split = end - start; split > 1; bs += split, split -=
split / 2 );
vector<T, Allocator> buf( bs, T(), alloc );;
recursion( start, end, buf.begin() );
}
 
template<typename It, typename Cmp, typename Allocator>
template<typename UpIt, typename BufIt>
void merge_sort<It, Cmp, Allocator>::recursion( UpIt start, UpIt end,
BufIt buf )
{
using namespace std;
size_t n = end - start;
if( n < 2 )
return;
copy( start, end, buf );
size_t right = n / 2,
left = n - right;
BufIt leftBuf = buf,
leftBufEnd = buf + left,
rightBuf = leftBufEnd,
bufEnd = buf + n;
recursion( leftBuf, leftBufEnd, bufEnd );
recursion( rightBuf, bufEnd, bufEnd );
merge( start, leftBuf, rightBuf, bufEnd );
}
 
template<typename It, typename Cmp, typename Allocator>
template<typename UpIt, typename BufIt>
void merge_sort<It, Cmp, Allocator>::merge( UpIt start, BufIt leftBuf,
BufIt rightBuf, BufIt bufEnd )
{
assert(leftBuf < rightBuf && rightBuf < bufEnd);
BufIt leftBufEnd = rightBuf;
for( UpIt wrtBack = start; ; )
if( m_cmp( *leftBuf, *rightBuf ) )
{
*wrtBack++ = *leftBuf;
if( ++leftBuf == leftBufEnd )
{
// faster for small number of elements than std::copy
do
*wrtBack++ = *rightBuf;
while( ++rightBuf != bufEnd );
return;
}
}
else
{
*wrtBack++ = *rightBuf;
if( ++rightBuf == bufEnd )
{
do
*wrtBack++ = *leftBuf;
while( ++leftBuf != leftBufEnd );
return;
}
}
}
 
int main()
{
using namespace std;
size_t const SIZE = 1'000'000;
vector<int> vSort( SIZE, 0 );
mt19937_64 rg;
uniform_int_distribution<int> uid( numeric_limits<int>::min(),
numeric_limits<int>::max() );
for( int &v : vSort )
v = uid( rg );
merge_sort( vSort.begin(), vSort.end() );
}
 
Maybe you would be able to understand what's going on.
"Öö Tiib" <ootiib@hot.ee>: Jan 05 11:54PM -0800

On Monday, 6 January 2020 06:40:02 UTC+2, Bonita Montero wrote:
> > already sorted. The factor is now smaller (no longer always greater
> > than 2) but still between 1.5 and 2.
 
> The discussion wasn't about presorted datasets and that's not usual.
 
Already sorted or close to sorted data tends to be more likely than
other permutations. But main issue is why you constantly write such
blatant falsehoods? Ben specifically wrote "with sorted data":
 
Bonita Montero wrote:
>> sort as faster (by a factor of at least 2) for all sizes tested (you
>> go up to 20). ...
 
> That's not true. You're a liar.
 
You are clearly liar yourself, Ben did not lie.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 08:59AM +0100

>> The discussion wasn't about presorted datasets and that's not usual.
 
> Already sorted or close to sorted data tends to be more likely than
> other permutations.
 
Give me a URL to a paper that proves this.
"Öö Tiib" <ootiib@hot.ee>: Jan 06 01:02AM -0800

On Monday, 6 January 2020 09:59:18 UTC+2, Bonita Montero wrote:
 
> > Already sorted or close to sorted data tends to be more likely than
> > other permutations.
 
> Give me a URL to a paper that proves this.
 
Miserable attempt of hiding being caught being liar noted.
 
It is just quite common with real world data that it was entered in sorted order
or that some other part of system already sorted it.
<https://en.wikipedia.org/wiki/Sorting_algorithm#Efficient_sorts>
"Second, the algorithms often perform poorly on already sorted data or almost
sorted data – these are common in real-world data, and can be sorted in O(n)
time by appropriate algorithms."
Jorgen Grahn <grahn+nntp@snipabacken.se>: Jan 06 09:34AM

On Mon, 2020-01-06, Öö Tiib wrote:
 
>> > Already sorted or close to sorted data tends to be more likely than
>> > other permutations.
 
>> Give me a URL to a paper that proves this.
...
 
> It is just quite common with real world data that it was entered in sorted order
> or that some other part of system already sorted it.
> <https://en.wikipedia.org/wiki/Sorting_algorithm#Efficient_sorts>
 
It's rather obvious if you think about it. If a sequence of records
doesn't appear at random, it's probably going to be sorted according
to the record's primary key, because it makes sense for the data
source to keep the records in some order.
 
This has other consequences, too. A system I once worked with
accepted tens of thousands of network connections/sessions, and stored
them as a std::map. This worked fine if clients connected one by one,
but if the next gateway (there was such a thing) restarted, it
reestablished the sessions in key order, and performance suffered
because the std::map got rebalanced repeatedly. We had to add a
workaround for that (I forget what the workaround was).
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 10:36AM +0100

> "Second, the algorithms often perform poorly on already sorted data or almost
> sorted data – these are common in real-world data, and can be sorted in O(n)
> time by appropriate algorithms."
 
That's not a proof on the heuristics of chosen sorting algorithms.
"Öö Tiib" <ootiib@hot.ee>: Jan 06 03:33AM -0800

On Monday, 6 January 2020 11:36:12 UTC+2, Bonita Montero wrote:
> > sorted data – these are common in real-world data, and can be sorted in O(n)
> > time by appropriate algorithms."
 
> That's not a proof on the heuristics of chosen sorting algorithms.
 
So what? As system designer you should have good idea what kind of
data from what kinds of sources it processes. So asking for formal
statistical proofs indicates cluenesness.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 12:59PM +0100


> So what? As system designer you should have good idea what kind of
> data from what kinds of sources it processes. So asking for formal
> statistical proofs indicates cluenesness.
 
If you imagine for what purposes the one or the other algoritm might be
suitable you don't find out how often the one or the other is actually
needed in real projects. So you're clueness here as you suggest it is
possible to find out this the way you describe.
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 06 10:54PM

>> already sorted. The factor is now smaller (no longer always greater
>> than 2) but still between 1.5 and 2.
 
> The discussion wasn't about presorted datasets and that's not usual.
 
You made the discussion about (a) a sweeping statement of yours that was
untrue, and (b) your calling me a liar for make a true statement.
 
--
Ben.
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 06 07:09AM -0800

Let's say my library is one source file and one header file. It contains one global variable with external linkage, and one function with external linkage, as follows:
 
// Contents of gotham.hpp
 
extern int Func(int);
 
extern int g_index;
 
// Contents of gotham.cpp
 
int g_index = 2;
 
int Func(int const i)
{
return i / 2;
}
 
I can combine this library into one header file like this:
 
// Contents of gotham.hpp
 
inline int &Func_Containing_gIndex(void)
{
static int s_index = 2;
 
return g_index;
}
 
static int &g_index = Func_Containing_gIndex();
 
inline int Func(int const i)
{
return i / 2;
}
 
To what extent can any library be reduced to one header file? One thing I see so far: This won't work if either the value of "g_index" or the address of "g_index" is required as a constexpr.
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 06 07:47AM -0800

On Monday, January 6, 2020 at 3:09:45 PM UTC, Frederick Gotham wrote:
 
> static int s_index = 2;
 
> return g_index;
> }
 
 
That should be:
 
return s_index;
"Öö Tiib" <ootiib@hot.ee>: Jan 06 08:26AM -0800

On Monday, 6 January 2020 17:09:45 UTC+2, Frederick Gotham wrote:
 
> To what extent can any library be reduced to one header file?
 
Fully. As of C++17 "The ​inline specifier can be applied to variables as
well as to functions."
And inline only means that it may be defined in header without ODR
(one definition rule) violations.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jan 06 08:13PM +0100

On 06.01.2020 16:09, Frederick Gotham wrote:
 
> static int &g_index = Func_Containing_gIndex();
 
> inline int Func(int const i) { return i / 2; }
 
> To what extent can any library be reduced to one header file?
 
To the extent that its implementation doesn't have to use dirty headers
(e.g. `<windows.h>` comes to mind) that one doesn't want to expose, that
one needs the possibility to distribute updates to object code without
client code having to be recompiled, to the extent that company or
project rules permit header only modules, and to the extent that one
doesn't need to keep the source unavailable in order to protect of
intellectual property rights.
 
I grabbed some of that wording from <url:
https://accu.org/index.php/journals/482>, an old article that can be
interesting now just because some of the issues discussed in earnest
then are now so irrelevant, i.e. how we have progressed.
 
Anyway, in some cases one can work around the header dependency issue.
E.g. for the mentioned header, with 64-bit programming it's practically
doable, but inconvenient, to just declare in ones own headers the API
functions that one uses. Since there is a single common machine code
level calling convention, and smarter linkers now than 20 years ago,
there is little or no chance of one's tools fouling it up, so I've done
it a number of times -- but only in exploratory, experimental code.
 
 
> One thing I see so far: This won't work if either the value of
> "g_index" or the address of "g_index" is required as a constexpr.
 
But then neither would the original.
 
 
- Alf
Robert Wessel <robertwessel2@yahoo.com>: Jan 05 09:07PM -0600

On Sun, 5 Jan 2020 20:11:17 +0100, David Brown
>less used, and any attempts will be very biased based on the area of
>experience of the guesser. With that proviso, I'd be surprised if
>complex types really were used more than VLAs.
 
 
I'm not going to take much of a position on the usage of the native C
complex type, given the slow pickup of C99, it really never got a huge
amount of traction, but it is there. On the other hand, plenty of
folks have done complex arithmetic in C my rolling their own
functions.
 
 
>correct that you need to be careful about stack use before using them,
>you don't need more care than you should be taking for any other kind of
>non-variable local array, or for any other kind of memory allocation.
 
 
For very small systems, especially ones where recursive calls are also
difficult to support, VLAs may well present some challenges.
 
In the more ordinary case, I think VLAs doe present an additional
challenge to stack management, since the frame size required by a
function is now at the mercy of its callers. That at least
theoretically creates some new paths for storage overlays and the
like.
 
I'm not sure it's as big a deal as some people made it out to be, but
there are a few new pitfalls there.
 
 
>going to make a compiler that is C11 compliant but not C99 compliant?
>Who is going to be capable of making a C11 compiler but find supporting
>VLA's or complex numbers a major issue?
 
 
My point exactly. I don't even understand why complex was optional in
C99 for freestanding implementations.
 
 
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jan 05 08:40PM -0800

> On Sun, 5 Jan 2020 20:11:17 +0100, David Brown
> <david.brown@hesbynett.no> wrote:
[...]
>>VLA's or complex numbers a major issue?
 
> My point exactly. I don't even understand why complex was optional in
> C99 for freestanding implementations.
 
Freestanding implementations don't have to provide library functions,
and complex numbers aren't very useful without the functions declared in
<complex.h> such as creal() and cimag().
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
[Note updated email address]
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Jorgen Grahn <grahn+nntp@snipabacken.se>: Jan 06 06:39PM

On Sun, 2020-01-05, Richard Damon wrote:
> On 12/31/19 12:46 PM, Bonita Montero wrote:
...
> #if 0
>

No comments: