soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Why is sorting a pointer-array to strings slower than sorting the strings themselfes ? - 2 Updates
binary_search accoring to its name - 3 Updates

Why is sorting a pointer-array to strings slower than sorting the strings themselfes ?

Bonita Montero <Bonita.Montero@gmail.com>: Nov 29 09:26PM +0100

Why is sorting a pointer-array to strings slower than sorting the
strings themselfes ? I've checked that with that code:

#include <iostream>
#include <vector>
#include <fstream>
#include <string>
#include <algorithm>
#include <random>
#include <concepts>
#include <chrono>
#include <execution>

using namespace std;
using namespace chrono;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
using vc_t = vector<char>;
using vc_it = vc_t::iterator;
vector<char> block;
{
ifstream ifs;
ifs.exceptions( ios_base::badbit );
ifs.open( argv[1], ios_base::binary );
ifs.seekg( 0, ios_base::end );
streampos size = ifs.tellg();
if( size > (size_t)-1 || !size )
return EXIT_FAILURE;
ifs.seekg( 0, ios_base::beg );
block.resize( (size_t)size );
ifs.read( &*block.begin(), (streamsize)size );
}
cout << "read" << endl;
auto parse = [&]<typename Callback>( Callback callback ) -> size_t
requires
requires( Callback callback, char const *str, size_t len )
{
{ callback( str, len ) };
}
{
size_t n = 0;
for( vc_it cit = block.begin(); cit != block.end(); )
{
vc_it lineBegin = cit;
for( ; cit != block.end() && (unsigned char)*cit > ' '; ++cit );
if( cit != lineBegin )
++n, callback( &*lineBegin, cit - lineBegin );
for( ; cit != block.end() && (unsigned char)*cit <= ' '; ++cit );
}
return n;
};
vector<string> strings;
size_t sumLen = 0;
strings.reserve( parse( [&]( char const *, size_t len ) { sumLen +=
len; } ) );
if( !strings.capacity() )
return EXIT_FAILURE;
cout << "counted, avg length: " << (double)(ptrdiff_t)sumLen /
(ptrdiff_t)strings.capacity() << endl;
parse( [&]( char const *str, size_t len ) { strings.emplace_back( str,
len ); } );
cout << "parsed" << endl;
sort( strings.begin(), strings.end() );
strings.erase( unique( strings.begin(), strings.end() ), strings.end() );
cout << "uniqued" << endl;
strings.shrink_to_fit();
block.resize( 0 ), block.reserve( 0 );
auto randomize = []<typename RandomIt>( RandomIt begin, size_t n )
requires random_access_iterator<RandomIt>
{
mt19937_64 mt;
uniform_int_distribution<size_t> uidIdx( 0, n - 1 );
for( size_t i = 0, j; i != n; ++i )
{
while( (j = uidIdx( mt )) == i );
swap( begin[i], begin[j] );
}
};
randomize( strings.begin(), strings.size() );
auto start = high_resolution_clock::now();
sort( strings.begin(), strings.end() );
double ns = (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (ptrdiff_t)strings.size();
cout << "string-sort: " << ns << endl;
vector<string *> pStrings;
pStrings.reserve( strings.size() );
for( string &str : strings )
pStrings.emplace_back( &str );
randomize( pStrings.begin(), pStrings.size() );
start = high_resolution_clock::now();
sort( pStrings.begin(), pStrings.end(),
[]( string *left, string *right ) { return *left < *right; } );
ns = (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (ptrdiff_t)strings.size();
cout << "pString-sort: " << ns << endl;
}

I've read 1.4 million strings with an average lengh of 9 characters,
so most strings should fall under the short string optimization and
swapping the string-objects would have a higher cost than swapping
two pointers or a pointer and a length. And sorting string-pointers
is actually nearly twice times slower than sorting the string-objects.
Why is that ?

Marcel Mueller <news.5.maazl@spamgourmet.org>: Nov 29 10:54PM +0100

Am 29.11.21 um 21:26 schrieb Bonita Montero:
> Why is sorting a pointer-array to strings slower than sorting the
> strings themselfes ? I've checked that with that code:
[...]
> so most strings should fall under the short string optimization and
> swapping the string-objects would have a higher cost than swapping
> two pointers or a pointer and a length.

Swapping a few bytes is not the performance killer. Cache misses are the
performance killer. Every time you touch a memory address not in the
cache you get the latency of the RAM. And this latency has not improved
significantly in the last two decades.

> And sorting string-pointers
> is actually nearly twice times slower than sorting the string-objects.
> Why is that ?

Using pointers touches more different memory locations. The array and
the string storage. With that large array almost any (random) access is
a cache miss.

Marcel

binary_search accoring to its name

Juha Nieminen <nospam@thanks.invalid>: Nov 29 05:57AM

> Assuming than a three way comparison is often a bit more expensive, it
> might even be less efficient for some collections.

And not only does it require a slower three-way comparison, it also uses two
conditionals instead of the one that std::binary_search/std::lower_bound
uses. Depending on the situation the additional conditional may actually
make the code slightly slower (especially since it's likely that the CPU
won't predict it correctly in about half the iterations, on average).
Unpredictable conditionals are performance killers.

Bonita Montero <Bonita.Montero@gmail.com>: Nov 29 08:26AM +0100

Am 29.11.2021 um 06:57 schrieb Juha Nieminen:

> And not only does it require a slower three-way comparison, ...

Check the generated code: the result of the three way comparison is
in the flags, depending on the type.

> ..., it also uses two conditionals instead of the one that
> std::binary_search/std::lower_bound uses.

The ==-conditional is predictable almost any time since it matches
only one time in the loop. But you're right: in most cases this
code is slower than with lower bound.
But I've developed a lower_bound alternative which is slightly
faster than the lower-bound alternative:

template<typename RandomIt, typename T, typename Pred>
inline
RandomIt xlower_bound( RandomIt begin, RandomIt end, T const &key, Pred
pred )
requires std::random_access_iterator<RandomIt>
&&
requires( Pred pred, decltype(*RandomIt()) &elem, T const &key )
{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;
assert(end >= begin);
size_t hit = end - begin, lower = 0, upper = hit, mid;
while( lower != upper )
{
mid = lower + (upper - lower) / 2;
if( pred( begin[mid], key ) >= 0 )
hit = mid,
upper = mid;
else
lower = mid + 1;
}
return begin + hit;
}

On a vector of 1, 4 million string_view-objects I get
about 3% more performance on my computer with clang 13.

Bonita Montero <Bonita.Montero@gmail.com>: Nov 29 02:02PM +0100

This is a small benchmark:

auto bench = [&]<typename SearchFn>( SearchFn searchFn ) -> double
requires requires( SearchFn searchFn, vsv_it iterator, string_view
const &key )
{
{ searchFn( iterator, iterator, key ) } -> convertible_to<vsv_it>;
}
{
size_t sumExist = 0;
auto start = high_resolution_clock::now();
for( size_t it = ITERATIONS; it; --it )
sumExist += searchFn( strings.begin(), strings.end(),
strings[uidIndex( mt )] ) != strings.end();
double ns = (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / ITERATIONS;
aSumExist += sumExist;
return ns;
};
double lb = bench( []( vsv_it begin, vsv_it end, string_view const &key
) -> vsv_it
{
vsv_it it = lower_bound( begin, end, key );
if( it == end && *it != key )
return end;
return it;
} );
cout << "lb: " << lb << endl;
double xlb = bench( []( vsv_it begin, vsv_it end, string_view const
&key ) -> vsv_it
{
vsv_it it = xlower_bound( begin, end, key,
[]( string_view &elem, string_view const &key ) -> strong_ordering {
return elem <=> key; } );
if( it == end && *it != key )
return end;
return it;
} );
cout << "xlb: " << xlb << endl;
double xbs = bench( []( vsv_it begin, vsv_it end, string_view const &key )
{
return xbinary_search( begin, end, key,
[]( string_view &elem, string_view const &key ) -> strong_ordering {
return elem <=> key; } );
} );
cout << "xbs: " << xbs << endl;

vsv_it is vector<string_view>::iterator

144,420,73 sorted string_view items, avgerage length: 8.68825 characters
lower_bound: 1835.47ns
xlower_bound: 1741.33ns
xbinary_search: 1845.06

I think with such a large dataset the time isn't decided upon the
algorithm but upon the random memory-accesses.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Monday, November 29, 2021

Digest for comp.lang.c++@googlegroups.com - 5 updates in 2 topics

No comments:

Blog Archive

About Me