soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

what is the fastest way to store a list of unique pointers ? - 13 Updates
Anyway to uuencode and uudecode a string in C++ - 3 Updates
what is the fastest way to store a list of unique pointers ? - 3 Updates
Visual C++ 2008 - Using a pure virtual destructor. Any side effects I should be aware of ? - 4 Updates

what is the fastest way to store a list of unique pointers ?

Manfred <invalid@invalid.add>: Aug 12 01:38AM +0200

On 08/12/2017 01:27 AM, Alf P. Steinbach wrote:

>> I think that std::set is the closest to the current behaviour
>> (std::map using only keys).

> Can't say that I understand what you're thinking of here.
keys in a std::map are ordered, aren't they?
Since the OP is not using the map values, but only the keys, then I
think that such usage of the map is practically equivalent to using a
std::set.

>> And I think it is faster than unordered_set too, although I have to
>> say I did not check.

> Not this either. Since when did hashing become slower than a tree search?
You may be right here, as I said, I did not check.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 12 03:37AM +0200

On 12.08.2017 01:38, Manfred wrote:
>>> (std::map using only keys).

>> Can't say that I understand what you're thinking of here.
> keys in a std::map are ordered, aren't they?

Oh. Well consider these points:

• The order of raw pointers is pretty arbitrary.

• `std::map` was not chosen for its good fit to the problem: they're not
using the mapping. I.e. there is AFAICS no reason to believe the order
is significant even if the keys were of a kind with a natural order.

> Since the OP is not using the map values, but only the keys, then I
> think that such usage of the map is practically equivalent to using a
> std::set.

Yes.

>>> say I did not check.

>> Not this either. Since when did hashing become slower than a tree search?
> You may be right here, as I said, I did not check.

An implementation of `unordered_set` can be slower than `set` for a
given use case, but it's unlikely.

The reason is that a search in a tree structure, while logarithmic, is
on average much slower than an on average constant time lookup.

Cheers & hth.,

- Alf

Marcel Mueller <news.5.maazl@spamgourmet.org>: Aug 12 08:44AM +0200

On 12.08.17 00.11, Lynn McGuire wrote:
> What is the fastest way to store a list of unique pointers ?

Fastest for scalable sets are in general hash sets. In C++ they are
implemented by unordered_set<>.

set<> itself uses Red-Black-Tree or something similar. Although the
complexity is O(log n) this is in practice amazingly inefficient because
of the large memory overhead (in case of small content) and the bad
locality (degrading cache efficiency). I almost never use this
containers (including map<>).

A reasonable compromise to the ordered standard containers are B-Trees.
Almost every database uses them. Unfortunately there is still no B-Tree
implementation in the standard.

> We were storing a list of unique pointers in a std::vector list. But
> searching it is slow before deciding to add a new pointer.

This is the worst choice. You ended up with complexity O(n²).

> always "add" the pointer to the list but never have to look it up. But,
> the resulting value is always just set to 1. The resulting std::map
> uses about 10% of the time that the std::vector did.

Use unordered_set<> as there is no need to keep the pointers in some
particular order as well as there is no need for an associative container.

Marcel

Melzzzzz <Melzzzzz@zzzzz.com>: Aug 12 06:57AM

> A reasonable compromise to the ordered standard containers are B-Trees.
> Almost every database uses them. Unfortunately there is still no B-Tree
> implementation in the standard.

There is nothing that prevents implementing std::map/set as btree.

--
press any key to continue or any other to quit...

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Aug 12 10:29AM +0100

On 12/08/2017 07:44, Marcel Mueller wrote:
> of the large memory overhead (in case of small content) and the bad
> locality (degrading cache efficiency). I almost never use this
> containers (including map<>).

One can use a pool allocator with set/map to improve allocation speed
and cache locality; boost has fast pool allocator for this purpose.

/Flibble

"Öö Tiib" <ootiib@hot.ee>: Aug 12 03:23AM -0700

On Saturday, 12 August 2017 12:29:40 UTC+3, Mr Flibble wrote:
> > containers (including map<>).

> One can use a pool allocator with set/map to improve allocation speed
> and cache locality; boost has fast pool allocator for this purpose.

Yes, but this tells us that for example when we have 42 000 'Growl*' some
of what share value then you suggest that set with pool allocator like
'std::set<Growl*, std::less<Growl*>, boost::fast_pool_allocator<Growl*>>'
is faster to fill with those than just 'std::set<Growl*>'.
The real question (that OP seemingly had) is if it is also faster
than 'std::unordered_set<Growl*>'.

Manfred <noname@invalid.add>: Aug 12 01:52PM +0200

On 08/12/2017 03:37 AM, Alf P. Steinbach wrote:

> • `std::map` was not chosen for its good fit to the problem: they're not
> using the mapping. I.e. there is AFAICS no reason to believe the order
> is significant even if the keys were of a kind with a natural order.

Whatever the ordering for raw pointers, still std::map keys are ordered
- their intent has been since the beginning of time that of fast search.
I agree with you that the OP is misusing it, but, with respect to fast
lookup, the choice makes sense.

> given use case, but it's unlikely.

> The reason is that a search in a tree structure, while logarithmic, is
> on average much slower than an on average constant time lookup.

That is typically true for large sets. Thanks for the clarification.

Manfred <noname@invalid.add>: Aug 12 04:04PM +0200

On 08/12/2017 03:40 PM, Stefan Ram wrote:
> vector, because the implementations of the sets are so slow.
> But my measurements seem to confirm the naïve assumption:
> That unordered_set is fastest.

This obviously depends on the usage of the container: for the general
case of random or sequential access, then vector wins.
When it comes to more complex operations like searching, then it is
understandable that other types of container work better.

What Bjarne often shows is that in unexpected cases, like insertion,
that list would naively be better equipped for, actually vector wins -
so the recommendation is to use vector as a default, but this does not
mean always.

Marcel Mueller <news.5.maazl@spamgourmet.org>: Aug 12 04:26PM +0200

On 12.08.17 08.57, Melzzzzz wrote:
>> Almost every database uses them. Unfortunately there is still no B-Tree
>> implementation in the standard.

> There is nothing that prevents implementing std::map/set as btree.

NACK!

The guarantee that iterators to untouched elements remain valid on
insert/delete makes a reasonable B-Tree implementation impossible.

Marcel

Marcel Mueller <news.5.maazl@spamgourmet.org>: Aug 12 04:26PM +0200

On 12.08.17 11.29, Mr Flibble wrote:
>> containers (including map<>).

> One can use a pool allocator with set/map to improve allocation speed
> and cache locality; boost has fast pool allocator for this purpose.

In this case only the memory overhead of roughly one order of magnitude
(for small objects) remains.
And well, if I always have to use a pool allocator to keep the
performance reasonable the question arises whether it is the right
container.
Since I usually do not place large objects into a container std::set and
std::map is of no particular use for me.

Marcel

Melzzzzz <Melzzzzz@zzzzz.com>: Aug 12 03:21PM

> NACK!

> The guarantee that iterators to untouched elements remain valid on
> insert/delete makes a reasonable B-Tree implementation impossible.

Well, that makes restrictions to implementations, for sure.
I tested BTree implementation in Rust against my AVL tree .
Iteration through btree is order of magnitude faster then binary tree.
Insertion and deletion is about twice as fast in btree.
That is just because btree is cache friendly.

--
press any key to continue or any other to quit...

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Aug 12 06:33PM +0100

On 12/08/2017 15:26, Marcel Mueller wrote:
> container.
> Since I usually do not place large objects into a container std::set and
> std::map is of no particular use for me.

Anyone who claims that std::set and std::map is no use to them is
obviously Doing It Wrong (TM). They are essential containers and should
be used when it is appropriate to use them. If you don't know when it
is appropriate to use them or you think that it is never appropriate to
use them then you are newb.

/Flibble

Paavo Helde <myfirstname@osa.pri.ee>: Aug 13 12:10AM +0300

On 12.08.2017 19:56, Stefan Ram wrote:
> retrieving ::std::set 713,790.944

> . This means that when we actually find some numbers
> again, now ::std::unordered_set is fastest.

It appears that your definition of "sorted_vector" is useless (a poor
implementation of std::set). A proper "sorted_vector" is what you fill
by e.g. push_back() first, then sort once and keep (at least mostly)
constant afterwards. This should get rid of the apparent 100x overhead
in filling it and make the numbers more comparable.

This is not to say that it might beat std::unordered_set, the latter may
still outwin because it does not have to keep the data in a given order.

Cheers
Paavo

Anyway to uuencode and uudecode a string in C++

kushal bhattacharya <bhattacharya.kushal4@gmail.com>: Aug 11 09:54PM -0700

On Saturday, August 12, 2017 at 12:46:54 AM UTC+5:30, Paavo Helde wrote:
> UTF-8 charset declaration.

> hth
> Paavo

as in the base 64 spec it states that the encoding format is used for transport preservation purposes used mainly in mail systems

Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 12 07:14AM

On Fri, 2017-08-11, Ian Collins wrote:
> On 08/11/17 05:43 PM, kushal bhattacharya wrote:
...
>> socket too.So thats whyi would prefer some good encoding mechanism
>> here

> Sockets are typically used to transfer binary data,

That could be misinterpreted; lots of common protocols use text.
But yes, e.g. a TCP socket has always been two streams of bytes.

> why do you see special characters as a problem?

I think his problem is really about protocol design. If he wants to
do more than send a single string over a single TCP socket, with no
verification, then he has to do something, e.g. send a content-length
like HTTP does, or define an end marker (and escape it in the data)
like SMTP does.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paavo Helde <myfirstname@osa.pri.ee>: Aug 12 11:46PM +0300

On 12.08.2017 7:54, kushal bhattacharya wrote:

>> hth
>> Paavo

> as in the base 64 spec it states that the encoding format is used for transport preservation purposes used mainly in mail systems

Yes, for transport via e-mail systems ca 30 years ago. So tell me again
why your C++ data is sent to PHP via 30-year old e-mail systems?

Cheers
Paavo

what is the fastest way to store a list of unique pointers ?

ram@zedat.fu-berlin.de (Stefan Ram): Aug 12 01:18AM

>The first idea that would spring to my mind would be a
>sorted vector with a binary search, because it should

I measured the runtime of this loop:

for( unsigned long l = 0; l < 10'000'000; ++l )
container.insert( rand() );

. unordered_set was the fastest and set the slowest
with my custom "sorted_vector" in between:

std::unordered_set<unsigned long> 337,000.000
sorted_vector<unsigned long> 1'294,000.000
std::set<unsigned long> 2'024,000.000

. Did not measure retrieval because OP asked for "store".

(It is possible that some noise have been added to
the times and/or all times were multiplied with the
same factor to protect my privacy.)

ram@zedat.fu-berlin.de (Stefan Ram): Aug 12 01:40PM

>std::unordered_set<unsigned long> 337,000.000
>sorted_vector<unsigned long> 1'294,000.000
>std::set<unsigned long> 2'024,000.000

Above, one can read the times for inserts.

Today, I measured the times for retrievals after
the container was filled with 10'000'000 random
numbers using:

::timer t; decltype(s)::const_iterator const e = s.end(); int i;

for( unsigned long l = 0; l < 10'000'000; ++l )
i = i +( e == s.find( rand() ));

t.print(); escape( &i );

. And the results were very similar to the results
for inserts.

Since parts of the vector have to be shifted during
inserts, I would have expected lookups on the
sorted_vector to be much faster than inserts, but
this was not observable (they were only slightly
faster).

Both Herb and Chandler (and to some extend also Bjarne)
suggest, as far as I have understood them, to always use a
vector, because the implementations of the sets are so slow.
But my measurements seem to confirm the naīve assumption:
That unordered_set is fastest.

ram@zedat.fu-berlin.de (Stefan Ram): Aug 12 04:56PM

>case of random or sequential access, then vector wins.
>When it comes to more complex operations like searching, then it is
>understandable that other types of container work better.

You are right. In my case, I used »random()« but I was not
aware of RAND_MAX still being 32767 or so and therefore,
the final section of the sorted_vector rarely got shifted.

I now use a random-number generator that generates random
numbers up to ::std::numerics_limits<unsigned long>::max()
or so and get these results:

filling ::sorted_vector 79'893,747.427
retrieving ::sorted_vector 238,574.119
filling ::std::unordered_set 360,994.757
retrieving ::std::unordered_set 248,835.371
filling ::std::set 741,265.933
retrieving ::std::set 722,563.192

. Here are the results with »::randomize( 0 );« added
to find the same numbers again:

filling ::sorted_vector 83'803,941.903
retrieving ::sorted_vector 266,760.153
filling ::std::unordered_set 362,649.427
retrieving ::std::unordered_set 161,849.880
filling ::std::set 754,459.535
retrieving ::std::set 713,790.944

. This means that when we actually find some numbers
again, now ::std::unordered_set is fastest.

There also might have been another error in my code
yesterday, because I reported different results yesterday.

My source code is not cleaned-up, but just so that you can
check my results (estimated runtime between 1 and 20 minutes):

#include <initializer_list>
#include <chrono>
#include <functional>
#include <iostream>
#include <vector>
#include <unordered_set>
#include <set>
#include <algorithm>
#include <climits>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <ios>
#include <random>

using namespace ::std::literals;

static void escape( volatile void * p )
{ asm volatile( "" : : "g"(p) : "memory" ); }

static void escape( void * p ) /* by Chandler Carruth */
{ asm volatile( "" : : "g"(p) : "memory" ); }

template< typename T >
::std::string format( T value )
{ double privacy_factor = 1.0; /* default: 1.0 */
double privacy_noise = 0.0; /* default: 0.0 */
::std::string s; int i = 0; int n = 0;
long double v = value;
v = v * privacy_factor;
v = v *( 1.0 + privacy_noise * ( rand() /( 1.0 + RAND_MAX )- 0.5 ));
value = v;
if( value )
while( value )
{ if( i &&( i % 3 == 0 ))
{ s = " .,';:-=%#$"[ i/3 ] + s; ++n; }
/* above there is a bug: i/3 might be larger than the string */
++i; ++n;
s = "0123456789"[ value % 10 ] + s;
value /= 10; }
else s = "0";
while( n++ < 20 )s = " " + s;
return s; }

struct timer
{ using timepoint = ::std::chrono::high_resolution_clock::time_point;
::timer::timepoint t1;
::timer::timepoint t2;
decltype( ::timer::t2 - ::timer::t1 )sum{};
timer(){ this->reset(); }
void reset()
{ this->t1 = ::std::chrono::high_resolution_clock::now();
this->t2 = t1;
sum = t2 - t1; }
void print( ::std::string const s, int const text_width, int const number_width )
{ this->t2 = ::std::chrono::high_resolution_clock::now();
this->sum += this->t2 - this->t1;
::std::cout << ::std::setw( text_width )<< ::std::left << s << ::std::setw( number_width )<< ::std::right << format(( sum ).count()) << '\n'; }};

template <class T>
struct sorted_vector
{ ::std::vector< T >vector;
using iterator = typename ::std::vector< T >::iterator ;
using const_iterator = typename ::std::vector< T >::const_iterator ;
iterator begin() { return vector.begin(); }
iterator end() { return vector.end(); }
const_iterator begin() const { return vector.begin(); }
const_iterator end() const { return vector.end(); }
explicit sorted_vector(): vector{} { }
iterator insert( const T & t )
{ iterator i = lower_bound( begin(), end(), t );
if( i == end() || t < *i )
{ vector.insert( i, t ); }
return i; }
const_iterator find( const T& t ) const
{ const_iterator i = lower_bound( begin(), end(), t );
return i == end() || t < *i ? end() : i; }};

auto const seed
{ ::std::chrono::system_clock::now().time_since_epoch().count() };

auto linear_congruential_engine
{ ::std::linear_congruential_engine< unsigned long, 40014uL,0uL,2147483563uL >( seed ) };

auto dice_distribution
{ ::std::uniform_int_distribution< unsigned long >{ 0, ULONG_MAX }};

void randomize( unsigned long const value )
{ linear_congruential_engine.seed( value ); }

unsigned long dice()
{ return dice_distribution( linear_congruential_engine ); }

template< typename T >
static inline void fill( T & container )
{ ::randomize( 0 );
for( unsigned long l = 0; l < 1'000'000; ++l )container.insert( dice() );
escape( &container ); }

template< typename T >
static inline void retrieve( T const & container )
{ typename T::const_iterator const e = container.end();
int i = 0;
::randomize( 0 );
for( unsigned long l = 0; l < 1'000'000; ++l )i = i+( e == container.find( dice() ));
escape( &i ); }

template< typename T >
static inline void measure( ::std::string const name, int const text_width, int const number_width )
{ ::timer timer;
T container;
timer.reset();
fill( container );
timer.print( "filling "s + name, text_width, number_width );
timer.reset(); retrieve( container ); timer.print( "retrieving "s + name, text_width, number_width ); }

void measurement()
{ int const text_width = 32; int const number_width = 14;
measure< ::sorted_vector< unsigned long >>( "::sorted_vector", text_width, number_width );
measure< ::std::unordered_set< unsigned long >>( "::std::unordered_set", text_width, number_width );
measure< ::std::set< unsigned long >>( "::std::set", text_width, number_width );
::std::cout << '\n'; }

int main()
{ for( int i = 0; i < 3; ++i )measurement(); }

Visual C++ 2008 - Using a pure virtual destructor. Any side effects I should be aware of ?

Manfred <invalid@invalid.add>: Aug 12 01:30AM +0200

On 08/11/2017 05:42 PM, R.Wieser wrote:
> I'm calling an external (to the VC++ program) object (stored in a
> dynamically loaded DLL) which methods are based on the COM model, meaning
> that the first argument is supposed to be a 'this' reference.

I suspect that you are a bit misled about what you are trying to do -
just a suspect as you did not post any code sample.
I suggest you post some sample of both ends of the interface you are
trying to work with.

The COM model you mention has a number of layers, most of which are
Microsoft-specific, and your reference about the 'this' pointer (not
reference) looks like there is some misunderstanding going on here.

"R.Wieser" <address@not.available>: Aug 12 11:52AM +0200

Kalle,

> I meant, the C++ standard does not define the format or
> existence of vtables,

Ah, yes. I found that rather odd: the mechanism by which virtual methods
interface with the actual class do not seem to be described/defined.

But even as that is so, the mechanism is proven to work on both windows as
well as Linux (where the programs external object is placed in an .so file),
and has been doing so for at least the last decade.

So the VTable mechanism (even if not named as such) is not quite as unstable
(prone to changes for whatever reason at whatever time) as it might appear
at first glance ... :-)

But yes, that concern is still lingering in the back of my mind. Thanks
for reminding me. :-( :-)

Regards,
Rudy Wieser

-- Origional message:
Kalle Olavi Niemitalo <kon@iki.fi> schreef in berichtnieuws
87zib5q3x0.fsf@Niukka.kon.iki.fi...

"R.Wieser" <address@not.available>: Aug 12 12:32PM +0200

Manfred,

> I suspect that you are a bit misled about what you are trying
> to do.

I'm listening, please continue.

But I don't think you quite understood the situation here: The mechanism
*already works*. I'm just trying to "neat it up" a bit.

My current thread is about if I can use destructor declared as pure virtual
to call the destructor method in the external object. As it turns out, I
can't use "__stdcall" when declaring the classes virtual destructor *, as it
does not cause the "this" reference/pointer to be transferred as the first
argument.
No problem though, I'm just keep using what I already got. :-)

*actualy I can, it just doesn't do anything there. :-(

> I suggest you post some sample of both ends of the interface
> you are trying to work with.

You already got the vc++ end. Anything wrong with it ? How ?

> and your reference about the 'this' pointer (not reference) looks
> like there is some misunderstanding going on here.

Poteto, potato. I think you will be able to understand from the example I
posted in my initial message in the previous thread how I'm using it, and
thereby what I ment. If not ...

To be honest, I have no idea when you guys call it a reference, and when a
pointer. To me the two are the same.

https://stackoverflow.com/questions/57483/what-are-the-differences-between-a
-pointer-variable-and-a-reference-variable-in

Oh, nice. So in VC++ a reference is just a special, limited kind of
pointer. I guess I ment reference than.

Regards,
Rudy Wieser

-- Origional message:
Manfred <invalid@invalid.add> schreef in berichtnieuws
omlelm$g80$1@gioia.aioe.org...
> On 08/11/2017 05:42 PM, R.Wieser wrote:
> > I'm calling an external (to the VC++ program) object (stored in a
> > dynamically loaded DLL) which methods are based on the COM model,
meaning

Manfred <noname@invalid.add>: Aug 12 02:23PM +0200

On 08/12/2017 12:32 PM, R.Wieser wrote:

>> I suspect that you are a bit misled about what you are trying
>> to do.

> I'm listening, please continue.

It looks as if you are trying to coerce two different C++ runtime
implementations in the same process space - which would be trouble (it
is undefined behaviour at best).

> But I don't think you quite understood the situation here: The mechanism
> *already works*. I'm just trying to "neat it up" a bit.

The question is if it works by coincidence or because it is correct.
Based on the information provided, both can be the case.

> argument.
> No problem though, I'm just keep using what I already got. :-)

> *actualy I can, it just doesn't do anything there. :-(

It is not clear if you are trying to destroy an object by explicitly
calling its destructor, which could also be trouble.

>> I suggest you post some sample of both ends of the interface
>> you are trying to work with.

> You already got the vc++ end. Anything wrong with it ? How ?

It depends on what is on the other end, how the objects are allocated on
both ends, what is the C++ runtime implementation on both ends, and what
more about the ABI.

> Poteto, potato. I think you will be able to understand from the example I
> posted in my initial message in the previous thread how I'm using it, and
> thereby what I ment. If not ...

It's not poteto potato. The 'this' pointer (not reference) is passed by
the C++ implementation, according to the rules dictated by the ABI.
Instead of talking about ECX register and the like, you should make
clear what is the ABI specified by your implementation (which must be
the same on both ends, by the way)

> -pointer-variable-and-a-reference-variable-in

> Oh, nice. So in VC++ a reference is just a special, limited kind of
> pointer. I guess I ment reference than.
I guess you guessed wrong...

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Saturday, August 12, 2017

Digest for comp.lang.c++@googlegroups.com - 23 updates in 4 topics

No comments:

Blog Archive

About Me