Friday, January 23, 2015

Digest for comp.lang.c++@googlegroups.com - 10 updates in 3 topics

jononanon@googlemail.com: Jan 23 02:24PM -0800

Hi there,
 
how would you implement an efficient search for the first bit conforming to a particular bit_state (true or false) in a std::bitset<N> ?
 
Here's my desired prototype:
 
template<size_t N>
size_t find(bitset<N> bs, size_t idx, bool state)
 
// Search for index i such that i >= idx and bs[i] == state
 
/* if no such index is found, return i = Bitset::npos
 
namespace Bitset {
size_t npos = numeric_limits<size_t>::max();
};
*/
jononanon@googlemail.com: Jan 23 02:32PM -0800

> size_t npos = numeric_limits<size_t>::max();
> };
> */
 
 
 
Here's my attempt:
 
 
//////////////////////////////////////
#include <bitset>
#include <stdexcept>
#include <limits>
#include <climits>
 
using namespace std;
 
namespace Bitset {
size_t npos = numeric_limits<size_t>::max();
};
 
unsigned find_in_ulong(unsigned long lon, bool state)
/* only call this function if lon does definately have a bit that is set to state */
{
constexpr unsigned char mask_byte_u = -1;
 
const unsigned long fail_byte(state
? 0U /* none set */
: mask_byte_u /* all set */);
 
unsigned shift = 0;
do {
if ((lon & mask_byte_u) != fail_byte) {
for (unsigned i = 0; i < CHAR_BIT; ++i) {
if ((lon & 1) == state)
return shift + i;
lon >>= 1U;
}
// should never reach this line
}
lon >>= CHAR_BIT;
shift += CHAR_BIT;
} while (lon);
throw runtime_error("state not found in find_in_ulong");
}
 
template<size_t N>
size_t find(bitset<N> bs, size_t idx, bool state)
{
if (idx >= N)
return Bitset::npos; // not found
 
bs >>= idx;
 
constexpr unsigned num_mask_bits = CHAR_BIT * sizeof(unsigned long);
constexpr unsigned long mask_long_u = -1UL;

const bitset<N> mask_long(mask_long_u);

const unsigned long fail_long(state
? 0U /* none set */
: mask_long_u /* all set */);
while (idx < N) {
unsigned long lon = (bs & mask_long).to_ulong();
if (lon != fail_long)
return idx + find_in_ulong(lon, state);
bs >>= num_mask_bits;
idx += num_mask_bits;
}
return Bitset::npos; // not found
}
 
//////////////////////////////////////
 
 
I reckon that the performance is fairly decent in terms of speed.
 
But I depend on a library implementation of std::bitset which has a clever and lightning-fast to_ulong() function. This is only possible if the design and implementation of std::bitset has a bool flag, which shows if high-bits outside of the unsigned long range are set (and the exception overflow_error is hence called); and if not shoots out that ulong as fast as possible. In other words: to_ulong() should not waste any processor cycles with checking bits!
jononanon@googlemail.com: Jan 23 02:39PM -0800


> //////////////////////////////////////
 
> I reckon that the performance is fairly decent in terms of speed.
 
> But I depend on a library implementation of std::bitset which has a clever and lightning-fast to_ulong() function. This is only possible if the design and implementation of std::bitset has a bool flag, which shows if high-bits outside of the unsigned long range are set (and the exception overflow_error is hence called); and if not shoots out that ulong as fast as possible. In other words: to_ulong() should not waste any processor cycles with checking bits!
 
A possible speed bottleneck, is the masking of the following: (bs & mask_long)
 
Why does std::bitset not have a way of chopping off unneeded higher bits, and just returning a u_long of the lower bits (without considering any higher bits that are set)? A function like: chop_ulong()
 
That would allow one to leverage more performance, wouldn't it ?
Martijn Lievaart <m@rtij.nl.invlalid>: Jan 23 09:23AM +0100

On Wed, 21 Jan 2015 13:10:35 -0800, Öö Tiib wrote:
 
> that dynamic memory management itself, not the virtual functions (or
> destructors). Memory managers are improved over time but their work is
> still way slower than additional indirection from virtual call.
 
Even although specific allocators can be very efficient, this is true. I
was thinking about stack allocation though.
 
> some that I have seen were originally written instead of virtuals (or
> had grown over time into replacement of virtuals) and virtuals did
> improve performance on such cases in my tests.
 
Oh yes, in general they do. However:
 
1) In general is not always. It may even change over time, see f.i. the
costs of exceptions.
 
2) Good design is much more important than micro optimization, at least
at first. That should guide your decision whether to use virtuals, not
cargo cult about performance.
 
3) "virtuals did improve performance on such cases in my tests." That's
exactly my point. Profile.
 
M4
"Öö Tiib" <ootiib@hot.ee>: Jan 23 04:50AM -0800

On Friday, 23 January 2015 10:25:15 UTC+2, Martijn Lievaart wrote:
 
> Oh yes, in general they do. However:
 
> 1) In general is not always. It may even change over time, see f.i. the
> costs of exceptions.
 
If you read what I wrote then it was that alternatives to virtuals have been
commonly slower. Later I wrote that switch cases I have measured
were always slower.
 
> 2) Good design is much more important than micro optimization, at least
> at first. That should guide your decision whether to use virtuals, not
> cargo cult about performance.
 
Usage of virtual functions can make design more complex and harder to follow
or maintain. One should not use virtual functions without need. Objective-C
for example is inherently difficult programming language in that sense
because there all member functions are technically virtuals.
 
However ... I have not seen example of large dinosaur switch-case that is
better to maintain. I did not measure the speed because I did care. Major
performance bottle-neck is most probably elsewhere. I have measured
performance to demonstrate to cargo cult "no virtuals gang" defending
that dinosaur that they are wrong and virtuals run tiny bit faster.
 
> 3) "virtuals did improve performance on such cases in my tests." That's
> exactly my point. Profile.
 
Lately I optimize only for readability, robustness and maintainability during
project. I profile at end and whole program (not sliced out algorithms)
together with stress testing with real field data. It is always been about 10%
of code-base that runs 90% of runtime. So optimising during project
for something else but for readability, robustness and maintainability is
preliminary on 90% of cases.
Martijn Lievaart <m@rtij.nl.invlalid>: Jan 23 10:11PM +0100

On Fri, 23 Jan 2015 04:50:51 -0800, Öö Tiib wrote:
 
> If you read what I wrote then it was that alternatives to virtuals have
> been commonly slower. Later I wrote that switch cases I have measured
> were always slower.
 
I was not referring to what YOU wrote, but to what was written earlier in
this thread by others. In fact I think we're in vehement agreement. :-)
 
> of code-base that runs 90% of runtime. So optimising during project for
> something else but for readability, robustness and maintainability is
> preliminary on 90% of cases.
 
This is so important, I left it in just to repeat it once more :-)
 
M4
Ian Collins <ian-news@hotmail.com>: Jan 24 11:14AM +1300

嘱 Tiib wrote:
> of code-base that runs 90% of runtime. So optimising during project
> for something else but for readability, robustness and maintainability is
> preliminary on 90% of cases.
 
That is the best approach. If you have real field data and your tools
support it, profile feedback optimisation can often gain you a
performance boost simply by changing your build options.
 
--
Ian Collins
legalize+jeeves@mail.xmission.com (Richard): Jan 23 10:22PM

[Please do not mail me a copy of your followup]
 
Martijn Lievaart <m@rtij.nl.invlalid> spake the secret code
>> something else but for readability, robustness and maintainability is
>> preliminary on 90% of cases.
 
>This is so important, I left it in just to repeat it once more :-)
 
+100
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Jan 22 07:31PM -0500

Should the code below print 1 or -1? I think -1, but the standard
library implementations I tried printed 1. Am I misreading the standard?
 
Thanks in advance,
-Pavel
 
#include <iostream>
#include <sstream>
 
using namespace std;
 
int main(int, char*[]) {
const string s(1, 'A');
istringstream iss(s);
char c;
iss >> c;
cout << iss.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) << endl;
return 0;
}
"Öö Tiib" <ootiib@hot.ee>: Jan 23 03:13AM -0800

On Friday, 23 January 2015 02:31:40 UTC+2, Pavel wrote:
> cout << iss.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) << endl;
> return 0;
> }
 

Why 'std::stringbuf::seekoff' should fail when it is requested to move 'gptr' (that is
at end of input sequence) by 0?
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: