Monday, January 30, 2017

Digest for comp.lang.c++@googlegroups.com - 20 updates in 7 topics

Paul <pepstein5@gmail.com>: Jan 30 02:11AM -0800

On Sunday, January 29, 2017 at 7:15:20 PM UTC, Paavo Helde wrote:
...
> > }
 
> These multiple map find and lookup operations could be replaced by a
> single insert().
...
 
Thanks a lot for your feedback. I'm still not sure how to replace the
find and lookup operations by a single insert().
 
Could you (or someone else) possibly say a bit more about how to do this?
 
Many thanks,
Paul
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jan 30 12:00PM +0100

On 29.01.2017 15:31, Paul wrote:
> }
> return letters.front();
> }
 
I would do this:
 
[code]
#include <iostream>
#include <string> // std::basic_string
#include <iterator> // std::(begin, end)
#include <locale.h> // setlocale
#include <unordered_set> // std::unordered_multiset
 
#define TXT( s ) L ## s
 
using Char = decltype( TXT( ' ' ) );
using String = std::basic_string<Char>;
 
auto first_unique_char_in( String const& s )
-> Char
{
std::unordered_multiset<Char> const chars( begin( s ), end( s ) );
for( Char const ch : s )
{
if( chars.count( ch ) == 1 ) { return ch; }
}
return Char{ 0 };
}
 
auto main()
-> int
{
using namespace std;
setlocale( LC_ALL, "" );
wcout << first_unique_char_in( TXT( "Blah dah Bladihdah!" ) ) << endl;
}
[/code]
 
Cheers & hth.,
 
- Alf
Paul <pepstein5@gmail.com>: Jan 30 05:22AM -0800

On Monday, January 30, 2017 at 11:01:52 AM UTC, Alf P. Steinbach wrote:
> [/code]
 
> Cheers & hth.,
 
> - Alf
 
Thanks, Alf.
I'm a bit concerned that this appears to O(N * log(N)) where N is the
length of the string. I'm not saying that it's slower than my code,
by any means. However, solutions are always frowned upon if their
theoretical time complexities are suboptimal.
 
I have a more novice-friendly version of your idea below:
 
Paul
 
char findFirstUnique(const std::string& str)
{
std::unordered_multiset<char>word(str.begin(), str.end());
for(char c : str)
if(word.count(c) == 1)
return c;
 
std::cout << "No unique characters\n";
return 0;
}
Paavo Helde <myfirstname@osa.pri.ee>: Jan 30 04:24PM +0200

On 30.01.2017 12:11, Paul wrote:
> ...
 
> Thanks a lot for your feedback. I'm still not sure how to replace the
> find and lookup operations by a single insert().
 
E.g.
 
typedef std::unordered_map<char, std::list<char>::iterator> Map_t;
Map_t Map;
std::list<char> letters;
 
for (char letter : str) {
Map_t::iterator iter;
bool inserted;
 
std::tie(iter, inserted) =
Map.insert(std::make_pair(letter, letters.end()));
 
if (inserted) {
letters.push_back(letter);
iter->second = --letters.end();
} else if (iter->second != letters.end()) {
letters.erase(iter->second);
iter->second = letters.end();
}
}
Paavo Helde <myfirstname@osa.pri.ee>: Jan 30 04:38PM +0200

On 30.01.2017 15:30, Stefan Ram wrote:
> { if( rand() < RAND_MAX / 100 )break;
> s.push_back( set[ rand() %( sizeof( set )- 1 )] ); }
> v.push_back( s ); }}
 
If I am able to interpret this line noise correctly, then you are
claiming that a simple O(N*N) algorithm is performing better than a
O(N*log N) algorithm which is using more complex data structures.
 
This is the expected result, *for small N*. Increase your data size to
e.g. 10 times the L3 cache size, e.g. 80 MB, and test again.
scott@slp53.sl.home (Scott Lurndal): Jan 30 04:27PM


> I wrote another implementation which I called »first_unique_char_of«
> and a microbenchmark to show that under some circumstances it might
> seem to be faster.
 
It's completely unreadable.
Gareth Owen <gwowen@gmail.com>: Jan 30 07:04PM

> length of the string. I'm not saying that it's slower than my code,
> by any means. However, solutions are always frowned upon if their
> theoretical time complexities are suboptimal.
 
 
// Doesn't use a complicated data structure, but guaranteed O(n)
// Assumes CHAR_BIT is 8
char findFirstUnique(const std::string&str){
std::array<int,256> arr;
if(str.length() == 0) return 0;
for(int i=0;i < str.length(); ++i){
unsigned char ch = str[ch];
if(arr[ch] == 0) arr[ch] = i;
else arr[ch] = INT_MAX;
}
int minval=INT_MAX;
int minch = 0;
for(int i=0;i < 256; ++i){
if(arr[i] > 0 && arr[i] < INT_MAX) {
minval = arr[i];
minch = (char)i;
}
}
return minch;
}
Ian Collins <ian-news@hotmail.com>: Jan 31 08:14AM +1300

On 01/31/17 05:27 AM, Scott Lurndal wrote:
>> and a microbenchmark to show that under some circumstances it might
>> seem to be faster.
 
> It's completely unreadable.
 
I thought at first glance it was a post from that Relf bloke...
 
--
Ian
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jan 30 10:58PM +0100

On 30.01.2017 14:22, Paul wrote:
 
> Thanks, Alf.
> I'm a bit concerned that this appears to O(N * log(N)) where N is the
> length of the string.
 
Where on Earth did you get that idea?
 
Cheers!,
 
- Alf
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jan 30 11:10PM +0100

On 30.01.2017 20:04, Gareth Owen wrote:
>> theoretical time complexities are suboptimal.
 
> // Doesn't use a complicated data structure, but guaranteed O(n)
> // Assumes CHAR_BIT is 8
 
Note that the original question was about the case where the character
type is biggish, not `char` but perhaps `wchar_t` or `char32_t`.
 
 
> char findFirstUnique(const std::string&str){
> std::array<int,256> arr;
 
I don't think std::array has a defined default constructor. The original
Boost implementation was based on `array` being a POD so that it could
be initialized with C++03 curly braces. I think that's so still after
C++11, and if so then due to the assumption in the code below of zero
array values, you have Undefined Behavior.
 
 
> }
> return minch;
> }
 
 
Cheers!,
 
- Alf
Brett Dong <brett.browning.dong@gmail.com>: Jan 30 10:12PM +0800

I have a project that takes about five minutes to build with clang on
Linux (make -j4), optimization enabled (-Os). But it takes me ten
minutes and even more to build with Visual Studio 2015. I tried all the
ways on the Internet that can help reduce build time with Visual Studio:
turned off "Whole Program Optimization", turned off Link Time Code
Generation in the linker (/LTCG:OFF), even turned off any optimizations
in the linker (two options /OPT:REF and /OPT:ICF are turned off), and
turned off PDB debug information generation, turned on incremental
linking (/INCREMENTAL), turned on parallel compiling (/MP). After all
those efforts, it still takes me ten minutes to build the project. Twice
as long as the time clang uses! Why could there be so much difference?
 
I'm using a SSD, so IO shouldn't be a bottleneck. I'm using a 2-core
4-thread Broadwell-U processor FYI.
Paavo Helde <myfirstname@osa.pri.ee>: Jan 30 04:29PM +0200

On 30.01.2017 16:12, Brett Dong wrote:
> as long as the time clang uses! Why could there be so much difference?
 
> I'm using a SSD, so IO shouldn't be a bottleneck. I'm using a 2-core
> 4-thread Broadwell-U processor FYI.
 
Make sure you are using precompiled headers.
Ian Collins <ian-news@hotmail.com>: Jan 31 08:12AM +1300

On 01/31/17 03:12 AM, Brett Dong wrote:
> as long as the time clang uses! Why could there be so much difference?
 
> I'm using a SSD, so IO shouldn't be a bottleneck. I'm using a 2-core
> 4-thread Broadwell-U processor FYI.
 
Visual studio is rubbish at parallel builds: even their own people
recommend using a third party tool!
 
https://blogs.msdn.microsoft.com/visualstudio/2015/11/30/improving-your-build-times-with-incredibuild-and-visual-studio-2015/
 
I've used Incredibuild on single systems and distributed servers and it
works well.
 
--
Ian
legalize+jeeves@mail.xmission.com (Richard): Jan 30 06:23PM

[Please do not mail me a copy of your followup]
 
alexo <alessandro.volturno@libero.it> spake the secret code
 
>Too many "magic" numbers.
 
He is using ANSI escape sequences, which are well documented.
 
<https://en.wikipedia.org/wiki/ANSI_escape_code>
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
ram@zedat.fu-berlin.de (Stefan Ram): Jan 30 01:30PM

>auto first_unique_char_in( String const& s )
 
I wrote another implementation which I called »first_unique_char_of«
and a microbenchmark to show that under some circumstances it might
seem to be faster.
 
#include <cassert>
#include <chrono>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <iterator>
#include <ostream>
#include <string>
#include <vector>
#include <unordered_set>
 
using namespace ::std::literals;
 
static void escape( void * p )
{ asm volatile( "" : : "g"(p) : "memory" ); }
 
static char first_unique_char_in( ::std::string const& s )
{ ::std::unordered_multiset< char >const chars( begin( s ), end( s ));
for( char const ch : s )
if( chars.count( ch ) == 1 )return ch;
return char{ 0 }; }
 
static char first_unique_char_of( ::std::string const& s )
{ auto const z = s.size();
auto i = z; for( i = 0; i < z; ++i )
{ char const ch = s[ i ]; bool found = false;
auto j = z; for( j = 0; j < z; ++j )
if( i != j && s[ j ]== ch )found = true;
if( !found )return ch; }
return char{ 0 }; }
 
static void fill( ::std::vector< ::std::string >& v )
{ static const char set[] =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz !";
for( int i = 0; i < 1000; ++i )
{ ::std::string s{};
while( true )
{ if( rand() < RAND_MAX / 100 )break;
s.push_back( set[ rand() %( sizeof( set )- 1 )] ); }
v.push_back( s ); }}
 
int main ()
{ ::std::srand( static_cast< unsigned int >( ::std::time( nullptr )));
::std::vector< ::std::string >v; fill( v );
{ ::std::chrono::high_resolution_clock::time_point t1;
::std::chrono::high_resolution_clock::time_point t2;
decltype( t2 - t1 )sum{};
t1 = ::std::chrono::high_resolution_clock::now();
escape( &t1 );
for( auto const & s: v )
{ auto result = first_unique_char_in( s );
escape( &result ); }
t2 = ::std::chrono::high_resolution_clock::now();
escape( &t2 );
sum += t2 - t1;
::std::cout << "dt = " <<( sum ).count() << '\n'; }
{ ::std::chrono::high_resolution_clock::time_point t1;
::std::chrono::high_resolution_clock::time_point t2;
decltype( t2 - t1 )sum{};
t1 = ::std::chrono::high_resolution_clock::now();
escape( &t1 );
for( auto const & s: v )
{ auto result = first_unique_char_of( s );
escape( &result ); }
t2 = ::std::chrono::high_resolution_clock::now();
escape( &t2 );
sum += t2 - t1;
::std::cout << "dt = " <<( sum ).count() << '\n'; }}
ram@zedat.fu-berlin.de (Stefan Ram): Jan 30 01:39PM

>if( i != j && s[ j ]== ch )found = true;
 
I now microoptimized the above line to
 
if( s[ j ]== ch && i != j ){ found = true; break; }
 
. And the result is that now »first_unique_char_of«
here is significanly faster, by a factor better than 5,
which in some cases is better than 10
ram@zedat.fu-berlin.de (Stefan Ram): Jan 30 02:16PM

>>if( i != j && s[ j ]== ch )found = true;
>if( s[ j ]== ch && i != j ){ found = true; break; }
 
I now added
 
for( auto const & s: v )
assert( first_unique_char_in( s )== first_unique_char_of( s ));
 
to have at least some verification.
 
I then added statistics for the lenght of the strings:
 
{ double l = 0; double c = 0;
for( auto const & s: v ){ l += s.length(); ++c; }
::std::cout << "average = " <<( l / c )<< '\n'; }
 
and it turned out that the average length is indeed
that number which happens to be 100 below:
 
if( rand() < RAND_MAX / 100 )break;
 
. I changed that to 1000 and 10000, and »first_unique_char_of«
still was faster by a factor better than 5, when I changed
it to 10 (and increased the size of the vector) the factor
became better than about 50.
Jeff-Relf.Me <@.>: Jan 30 03:04AM -0800

woodbrian77@gmail.com: Jan 29 07:23PM -0800

On Thursday, January 19, 2017 at 1:28:36 AM UTC-6, Öö Tiib wrote:
> constructors overloads some of what are templates themselves. So deducing
> two levels of template arguments from overloaded constructor call may
> mean some quite fragile and confusing heuristics.
 
I asked professor Spertus about this and he came up with this:
 
https://github.com/mspertus/p0433/commit/9e4643943eeee39b2cb4873026dad381d58cd50c
 
Using that, my program compiles and runs fine.
"Chris M. Thomasson" <invalid@invalid.invalid>: Jan 29 04:48PM -0800

>> A lot of the original crew hasn't shown up as often or at all anymore
>> though.
 
> I miss James Kanze.
 
Big time. Thank you for sparking the memories. :^)
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: