Wednesday, December 21, 2016

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

Marcel Mueller <news.5.maazl@spamgourmet.org>: Dec 21 12:31AM +0100

On 20.12.16 23.37, Stefan Ram wrote:
> explains how they made facebook much faster by replacing
> ::std::string with their fbstring, which uses a very
> efficient small-string representation.
 
As always it depends. For general purpose use the standard containers
are just fine. For spacial cases individual implementations often
perform better.
 
In fact I am missing a BTree in most container libraries, including C++.
So I implemented my own one in different languages.
 
Furthermore, I missed an efficient, strongly thread-safe, immutable and
reference counted string implementation. So again I implemented my own one.
 
And the intrusive_ptr does not provide strong thread safety too. My own
int_ptr does not share this restriction without a larger memory
footprint (just one slot for the intrinsic reference counter by simply
deriving from a base class and one slot for each pointer instance).
 
And even more C++ standard containers do not benefit from type erasure.
So the code for containers of smart pointers to different types cannot
share the implementation although their memory layout is binary
compatible. So I ended up by writing some container with a non template
base class and a type safe template wrapper for intrusive reference
counting as well.
 
It is almost always possible to improve things. But there are also
drawbacks in some use cases. The standard containers prefer to be
general purpose with as less as possible pitfalls. Performance is not
the only criterion.
 
 
Marcel
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Dec 20 11:43PM

On 20/12/2016 23:16, Alf P. Steinbach wrote:
> a buffer unsharing, it's unshared precisely because the possibility of a
> reference looms on the horizon – but in spite of being obvious bollocks,
> just a self-contradiction, it's upvoted 86 times.
 
Invalidating references is a problem, consider:
 
std::string s = "foo";
const char& r = s[0];
{
std::string s2 = s; // s now shared
s[0] = "c"; // COW happens
} // s2 destroyed
assert(r == 'c'); // bad UB: r will now refer to garbage
 
Even if we don't consider the UB case above one may also consider 'r' to
be "invalidated" if it stops referring to s[0] and instead refers to
s2[0] because COW happened.
 
/Flibble
Marcel Mueller <news.5.maazl@spamgourmet.org>: Dec 21 12:51AM +0100

On 21.12.16 00.43, Mr Flibble wrote:
 
> Even if we don't consider the UB case above one may also consider 'r' to
> be "invalidated" if it stops referring to s[0] and instead refers to
> s2[0] because COW happened.
 
You just pointed out why the Java concept of an immutable String class
and a mutable StringBuilder is superior. I prefer this concept wherever
possible.
 
 
Marcel
Jerry Stuckle <jstucklex@attglobal.net>: Dec 20 07:17PM -0500

On 12/20/2016 6:43 PM, Mr Flibble wrote:
> be "invalidated" if it stops referring to s[0] and instead refers to
> s2[0] because COW happened.
 
> /Flibble
 
But in a correct implementation, r would still refer to s[0] after the
COW. Why would it point to s2[0]? s[0] has not moved - just gotten a
new value (after the copy).
 
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Dec 21 01:55AM +0100

On 21.12.2016 00:43, Mr Flibble wrote:
> s[0] = "c"; // COW happens
> } // s2 destroyed
> assert(r == 'c'); // bad UB: r will now refer to garbage
 
Yes, that's a problem in two ways:
 
• It was (is?) a bug in the g++ C++03 implementation of COW strings, and
I suspect that's the main reason for the political aspect of this
particular ungoodness.
 
Or at least I think that's the reason why people get so wrought up about
it: public unthinking herd behavior happens in general when a group of
fanboys perceive negativity about their sacred COW.
 
• It prevents a COW implementation that is /both/ really efficient and
correct.
 
So it's an important case, and IMHO a good argument for immutable strings.
 
With a correct COW implementation "COW happens" (i.e. unsharing of the
in this case static buffer happens) at the first indexing, i.e. this is
not a problem for a correct COW implementation. It's only a problem for
g++'s implementation. But first of all the O(1) requirement for indexing
prevents a correct implementation in C++11 and later, and secondly, even
if the/a correct implementation were permitted by the C++ standard, the
case that you exemplify means it would generally do copying from string
literal that would not occur in an ideal COW.
 
 
> Even if we don't consider the UB case above one may also consider 'r' to
> be "invalidated" if it stops referring to s[0] and instead refers to
> s2[0] because COW happened.
 
Yep, but with a correct COW implementation that cannot happen.
 
 
Cheers!,
 
- Alf
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Dec 21 01:32AM

On 21/12/2016 00:55, Alf P. Steinbach wrote:
> if the/a correct implementation were permitted by the C++ standard, the
> case that you exemplify means it would generally do copying from string
> literal that would not occur in an ideal COW.
 
It seems to me that if you stop sharing on the first indexing you would
have to also stop sharing if any iterators are acquired or if c_str() is
called or if a basic_string_view is obtained.. it is starting to sound a
bit unusable so I am not sure I agree with your analysis that that is
doing COW "correctly".
 
/Flibble
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Dec 21 03:56AM +0100

On 21.12.2016 02:32, Mr Flibble wrote:
 
> It seems to me that if you stop sharing on the first indexing you would
> have to also stop sharing if any iterators are acquired or if c_str() is
> called or if a basic_string_view is obtained..
 
Yes, almost.
 
The basic_string_view design is particularly unfortunate: it's made
after C++11 and still it returns inefficient and functionality-limiting
references to characters, rather than just the values.
 
 
> it is starting to sound a bit unusable
 
It's not an optimal improvement compared to a different more
COW-supporting string class design. But that doesn't mean unusable. It
means the /improvement/ from a non-COW implementation is just less than
optimal, and that depending on the operations mix it can be some percent
slower or some percent faster, instead of ~always faster.
 
However, even within the standard's limited support there are still ways
to access the contents of a COW-implemented std::basic_string without
incurring unsharing of the buffer, including (what pops up from my
association circuits right now):
 
• Output to a stream.
 
• Indexing where the result is just used as an rvalue `Char`.
 
• The range based `for` loop.
 
I think supporting indexing with use as rvalue might possible with a
pure library implementation, by detecting the return type via `operator
auto()` (or in C++11 and earlier `operator T` with `T` a template
parameter), but if that turns out to have some problem, e.g. overload
resolution?, then it's certainly possible via the special status of the
standard library: that it can rely on compiler support and compiler
intrinsics and knowledge of the particular concrete compiler behavior,
e.g. as the standard library's `offsetof` macro necessarily does.
 
The range based `for` loop apparently requires compiler support, but
again, the standard library has that support.
 
If basic_string_view had been reasonably designed then that would also
be a way to access the contents of a COW-implemented string without
incurring a possible unsharing.
 
This is already pretty rich functionality for accessing the contents.
 
But a main thing lacking is a way to obtain a /temporary pointer/ to the
buffer, in order to pass to a function requiring such pointer. Sort of
like `c_str()` except the temporary pointer would also be invalidated by
any operation that obtained a possible reference to the string, except
for the `char` value indexing case noted above. A new version of the
standard could just introduce such an operation, so it's possible for
the standard to turn 180 degrees again and support COW.
 
 
> so I am not sure I agree with your analysis that that is
> doing COW "correctly".
 
As the C++03 standard put it, in its §21.3/6 (but this was non-normative
text in an inline note): "A reference counted implementation must have
the same semantics as a non-reference counted implementation."
 
That effectively means /as if/ a separate buffer is used for each instance.
 
I.e. "correct" is defined by the external behavior always conforming to
the standard's requirements, modulo requirements that are implicitly
ignored in a given context (such as discussing whether this is possible
without the C++11 O(1) requirement on indexing). And the standard
doesn't require an implementation to be optimal! This notion of
correctness, the one specified by the standard, is the one that's
relevant in a question and answer about what the standard permits.
 
 
Cheers!,
 
- Alf
Daniel <danielaparker@gmail.com>: Dec 20 08:50PM -0800

On Tuesday, December 20, 2016 at 9:56:48 PM UTC-5, Alf P. Steinbach wrote:
 
> The basic_string_view design is particularly unfortunate: it's made
> after C++11 and still it returns inefficient and functionality-limiting
> references to characters, rather than just the values.
 
I'm happy to have basic_string_view if only to allow an object that owns text
to expose it without having an allocator embedded in it. That offers some
efficiency over a forced copy just because a container's and a client's
allocators differ.
 
But it seems strange to me that a new string related class would continue to
have those find functions that search over bytes or words, depending on
template parameter. That's not helpful unless you're still in ansii world. I
want to know the length of a string in codepoints, and I want to be able to
access the ith codepoint. We need a new char traits that captures string
semantics. But at least compare mostly works.
 
Daniel
woodbrian77@gmail.com: Dec 20 08:59PM -0800

On Tuesday, December 20, 2016 at 5:31:15 PM UTC-6, Marcel Mueller wrote:
> compatible. So I ended up by writing some container with a non template
> base class and a type safe template wrapper for intrusive reference
> counting as well.
 
If you included links to your projects it would be
more interesting.
 
> drawbacks in some use cases. The standard containers prefer to be
> general purpose with as less as possible pitfalls. Performance is not
> the only criterion.
 
The containers are the weakest part of the STL in my opinion.
 
 
Brian
Ebenezer Enterprises - So far G-d has helped us.
http://webEbenezer.net
 
Paavo Helde <myfirstname@osa.pri.ee>: Dec 21 09:05AM +0200

On 21.12.2016 0:37, Stefan Ram wrote:
> explains how they made facebook much faster by replacing
> ::std::string with their fbstring, which uses a very
> efficient small-string representation.
 
Standard containers are general purpose. Of course, it might well happen
that in some special circumstances a special handcrafted solution would
work significantly more efficiently. The standard cannot foresee or
provide for all kind of special circumstances.
 
One is of course entitled to craft one's own container if the app speed
is not sufficient and the profiler shows the bottleneck is in a standard
container. Has not happen to me yet, but YMMV.
 
While on that topic, don't forget that nowadays a large part of
efficiency is related to the memory allocators. Unlike the prev item,
profiling has shown me bottlenecks in the memory allocator several
times. Using a better or more suitable memory allocator with a standard
container might easily have much larger impact to the efficiency than
anything what can be done at the container level itself.
Juha Nieminen <nospam@thanks.invalid>: Dec 21 07:18AM

> 2016 he says that because of the impossibility of the
> small-size optimization even ::std::vector is unusable when
> one wants efficient code.
 
"Efficient" in what way?
 
Sure, if you are adding and removing elements from containers,
or even creating and destroying containers, in a tight inner
loop millions of times per second, you have to be aware of how
costly it is to allocate and deallocate memory in C-based
languages (and thus you should be aware when such containers
do such allocations and deallocations).
 
But that's not the only situation where the word "efficient"
could be applied. It could also be applied to efficiency of
searching, or memory usage, or clarity and safety of your code.
Even how much time you have to spend to write and refactor
your code.
 
If you have the need to copy std::vector instances around millions
of times, then sure, implement your own more efficient variant.
I find myself in need to do that extremely rarely, and instead
I just require vectors to be accessible very efficiently, which
they are. And the convenience of std::vector over a custom
implementation is vastly greater.
Bo Persson <bop@gmb.dk>: Dec 21 05:26PM +0100

On 2016-12-21 08:18, Juha Nieminen wrote:
> I just require vectors to be accessible very efficiently, which
> they are. And the convenience of std::vector over a custom
> implementation is vastly greater.
 
Right!
 
Chandler Carruth can spend time making the program run 1% faster,
because that saves the company millions of dollars on their server farms.
 
For the rest of us "efficient" most often means "getting the app out
*this* year". Not running 1% faster next year.
 
 
Bo Persson
Bonita Montero <Bonita.Montero@gmail.com>: Dec 21 06:02PM +0100

Am 21.12.2016 um 17:26 schrieb Bo Persson:
 
> For the rest of us "efficient" most often means "getting
> the app out *this* year". Not running 1% faster next year.
 
It should be very rare that a company tunes for the last 1%.
Even Twitter and eBay use Java on its servers which hasn't
the repute to be the most efficient language (i.e. the VM).
 
--
http://facebook.com/bonita.montero/
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Dec 21 07:01PM +0100

On 21.12.2016 03:56, Alf P. Steinbach wrote:
> pure library implementation, by detecting the return type via `operator
> auto()` (or in C++11 and earlier `operator T` with `T` a template
> parameter),
 
Sorry I was not thinking there, or in other words, I was momentarily
stupid: one needs a name for the return type in order to discriminate
the cases. So `auto` is out. An old-fashioned template is the only pure
library way as I see, if it works to implement the observable behavior
exactly (or anyway in a proof of concept implementation):
 
> intrinsics and knowledge of the particular concrete compiler behavior,
> e.g. as the standard library's `offsetof` macro necessarily does.
> [snip]
 
 
Cheers!,
 
- Alf
Bo Persson <bop@gmb.dk>: Dec 21 07:22PM +0100

On 2016-12-21 18:02, Bonita Montero wrote:
 
> It should be very rare that a company tunes for the last 1%.
> Even Twitter and eBay use Java on its servers which hasn't
> the repute to be the most efficient language (i.e. the VM).
 
Yes, but Facebook and Google might. When a 1% saving on the electricity
bill alone means millions of dollars, it is worth it.
 
Chandler Carruth and lots of benchmarks could surely trim Goggle's code
by that much.
 
An average Joe Coder would likely not benefit from writing his own
linked list or a std::vector replacement. It will just take a lot of
work to get something that will likely run slower.
 
The guys implementing the standard library for the major compilers are
not exactly newbies.
 
So "not using the standard containers" is not the best advice to achive
"efficiency". Except in *very* special cases.
 
 
Bo Persson
legalize+jeeves@mail.xmission.com (Richard): Dec 21 06:43PM

[Please do not mail me a copy of your followup]
 
Marcel Mueller <news.5.maazl@spamgourmet.org> spake the secret code
 
>As always it depends. For general purpose use the standard containers
>are just fine. For spacial cases individual implementations often
>perform better.
 
...and this isn't unique to C++.
 
C# and Java have standard general containers. Performance gains can
be made by switching to custom containers that are specific to your
use case.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Bonita Montero <Bonita.Montero@gmail.com>: Dec 21 07:43PM +0100

Am 21.12.2016 um 19:22 schrieb Bo Persson:
 
> Yes, but Facebook and Google might. When a 1% saving on the electricity
> bill alone means millions of dollars, it is worth it.
 
You don't know this. It's just your assumption.
Homo oeconomicus is just an illusion.
 
--
http://facebook.com/bonita.montero/
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Dec 21 07:56PM +0100

On 21.12.2016 19:22, Bo Persson wrote:
 
> [snip]
> So "not using the standard containers" is not the best advice to achive
> "efficiency". Except in *very* special cases.
 
One such case is returning a referencing string-view. Now that's soon
officially part the C++ standard library (I believe, I haven't checked
the details) but a year or two or three ago I answered an SO question a
CPython versus C++ string performance: the OP was amazed how much faster
CPython was. But CPython strings are simple reference counted beasts, so
all that code involved was references. After really fighting to have the
question /undeleted/ (!) [fanboys at work again, jeez we don't like
these facts the OP mentions, let's delete], I posted C++ code that did
the same as the OP's example, only efficiently by returning string
views, i.e. references instead of copying strings, and happily that was
faster than Python -- it needn't be, but it was. :)
 
Later the code I posted was "corrected" by an associative someone, a
someone who recognized patterns but had no understanding, who introduced
a bug, and it was only by chance that I became aware of it.
 
That last part is relevant because it shows that by using custom data
structures one may gain e.g. efficiency, or readability, but one may
thereby become an open target for maintenance bug introductions.
 
 
Cheers!,
 
- Alf
legalize+jeeves@mail.xmission.com (Richard): Dec 21 07:00PM

[Please do not mail me a copy of your followup]
 
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com> spake the secret code
 
>That last part is relevant because it shows that by using custom data
>structures one may gain e.g. efficiency, or readability, but one may
>thereby become an open target for maintenance bug introductions.
 
...it's also a reason for developing such code with a comprehensive
set of unit tests. Maintenance developers will then have confidence
that they haven't broken anything by rerunning the tests.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Popping mad <rainbow@colition.gov>: Dec 21 11:16AM

Why is canvas_index not being assigned properly in the initiation list
unless I do an assignment inside the backets
 
 
/*
*
=====================================================================================
*
* Filename: test.cpp
*
* Description: Threading Experiment
*
* Version: 1.0
* Created: 12/18/2016 12:46:51 PM
* Revision: none
* Compiler: gcc
*
* Author: Ruben Safir (mn), ruben@mrbrklyn.com
* Company: NYLXS Inc
*
*
=====================================================================================
*/
 
#include <iostream>
#include <thread>
 
namespace testing{
std::thread t[10];
class PIC
{
public:
PIC():beg{&source[0]}, end{&source[99]} , canvas_index
{canvas}
{

canvas_index = canvas;
for(int i = 0; i < 10;i++)
{
t[i] = std::thread([this]{ readin(beg); }
);
std::cout << i << ": Making a thread" <<
std::endl;
beg += 10;
}
};

~PIC()
{
std::cout << "In the destructor" << std::endl;
for(int i=0; i<10; i++)
{
t[i].join();
std::cout << i << ": Joining a thread" <<
std::endl;
}
 
};
void readin(char * start)
{
for( int i = 0; i<10; i++ )
{
*canvas_index = *start;
std::cout << i << ": Copy " <<
reinterpret_cast<char>(*start) << std::endl;
std::cout << i << ": Copied to
canvas_index " << reinterpret_cast<char>(*canvas_index) << std::endl;
canvas_index++;
start++;
}
 
};
 
char * get_canvas()
{
return canvas;
}
 
private:
char * beg;
char * end;
char * canvas_index;
char * canvas = new char[100];
char source[100] = {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'
};
//int index;
};
}//end namespace
 
int main(int argc, char** argv)
{
testing::PIC fido;
for(int i = 0; i<100;i++)
{
std::cout << i << " Canvas Position " << fido.get_canvas()
[i] << std::endl;
}
 
return 0;
}
Paavo Helde <myfirstname@osa.pri.ee>: Dec 21 01:50PM +0200

On 21.12.2016 13:16, Popping mad wrote:
> PIC():beg{&source[0]}, end{&source[99]} , canvas_index
> {canvas}
> {
[...]
> };
[...]
> private:
[...]
> char * canvas_index;
> char * canvas = new char[100];
[...]
 
canvas_index is initialized before canvas, because it is declared before
canvas in the class definition. Thus, it is initialized with garbage.
"Fred.Zwarts" <F.Zwarts@KVI.nl>: Dec 21 01:47PM +0100

"Paavo Helde" schreef in bericht
news:O5mdnY-xfJ0S8cfFnZ2dnUU78IvNnZ2d@giganews.com...
>[...]
 
>canvas_index is initialized before canvas, because it is declared before
>canvas in the class definition. Thus, it is initialized with garbage.
 
The compilers I use, issue a warning if the order of the initializations is
different from the order of the declarations. I wonder which compiler was
used, that did not warn.
"Fred.Zwarts" <F.Zwarts@KVI.nl>: Dec 21 01:51PM +0100

"Fred.Zwarts" schreef in bericht news:o3dtkv$1goc$1@gioia.aioe.org...
 
>The compilers I use, issue a warning if the order of the initializations is
>different from the order of the declarations. I wonder which compiler was
>used, that did not warn.
 
Probably, my reaction was too fast. I now see that canvas is not initialized
in the initialization list, so that the compiler cannot detect the
non-matching order easily.
ruben safir <ruben@mrbrklyn.com>: Dec 21 04:09AM -0500

if you make a lamda with a method from a class, can that method see the
valuable values encapsulated with the object of that class, or can it
only see global variables?
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Dec 21 01:27PM +0100

On 21.12.2016 10:09, ruben safir wrote:
> if you make a lamda with a method from a class, can that method see the
> valuable values encapsulated with the object of that class, or can it
> only see global variables?
 
To use non-static data members in a lambda in a class' code, you have to
capture `this`.
 
Cheers & hth.,
 
- Alf
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: