Wednesday, February 1, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 01:32AM +0100

On 31.01.2017 15:46, Stefan Ram wrote:
 
> That means, in this case, »first_unique_char_of« is more
> than 1000 times faster. (Unless I made an error, so feel
> free to reproduce this.)
 
You've apparently made some errors of methodology, I /think/, but still
this is a fascinating result and a very good question. :)
 
First, the methodology:
 
• To measure performance, always turn on optimization, e.g. `g++ -O3`.
 
Otherwise there is an abstraction cost for function calls, to make
sure that one can delve into them in a debugger. This favors non-
abstracted code. The cost is usually a fixed factor, but a huge one.
 
• Always use reasonable data for what you test.
 
For example, testing with a string of 200 000 '#' would be
unreasonable, because one function would use just a couple of
machine code instructions to find that single value '#', while
the other would delve into a complex data structure first.
 
Still with this methodology in place there's a completely weird
"breaking point" at about string size 200 000 on the machine I'm writing
this on, a really old Asus laptop running Windows 7. Before this
breaking point the O(n^2) brute force algorithm is unreasonably fast.
With larger strings it exhibits the expected O(n^2) sluggishness.
 
I discuss that below.
 
The code below is the code that I used to test. It bears witness of the
way I honed in on the truth, at first suspecting some really ungood
implementation of `std::unordered_multiset` with both g++ and Visual
C++. That turned out to not be the case but I let the workaround stand:
 
 
[code]
#include <iostream>
#include <iterator> // std::(begin, end)
#include <locale.h> // setlocale
#include <stddef.h> // ptrdiff_t
#include <stdlib.h> // rand, srand
#include <string> // std::basic_string
#include <set> // std::multiset
#include <time.h> // clock
#include <unordered_map> // std::unordered_map
#include <map>
#include <unordered_set> // std::unordered_multiset
 
namespace alf {
using Size = ptrdiff_t;
 
template< class Type >
class Unordered_multiset_
{
private:
std::unordered_map< Type, int > values_;
 
public:
auto count( Type const& value ) const
-> Size
{
auto const it = values_.find( value );
return (it == values_.end()? 0 : it->second);
}
 
template< class Iterator >
Unordered_multiset_( Iterator const first, Iterator const beyond )
{
//values_.reserve( beyond - first ); // Ungood, more slow.
for( Iterator it = first; it != beyond; ++it )
{
++values_[*it];
}
}
};
 
} // namespace alf
 
namespace stefan {
using tchar = wchar_t;
using tstring = ::std::basic_string< tchar >;
 
template< class Value >
//using Multiset_ = std::unordered_multiset< Value >;
using Multiset_ = alf::Unordered_multiset_<Value>;
 
static tchar first_unique_char_in( tstring const & s )
{
Multiset_< tchar > const chars( begin( s ), end( s ));
for( tchar const ch : s )
{
if( chars.count( ch ) == 1 ) { return ch; }
}
return tchar{ 0 };
}
 
static tchar first_unique_char_of( tstring const & s )
{
auto const z = s.size();
auto i = z;
for( i = 0; i < z; ++i )
{
tchar const ch = s[ i ];
bool found = false;
auto j = z;
for( j = 0; j < z; ++j )
{
if( s[ j ]== ch && i != j )
{
found = true; break;
}
}
if( !found )return ch;
}
return tchar{ 0 };
}
}
 
namespace alf {
using namespace std;
 
using Func = auto( stefan::tstring const& ) -> stefan::tchar;
 
void test(
char const funcname[],
Func* const func,
stefan::tstring const& s
)
{
time_t const start = clock();
auto const result = func( s );
(void) result;
time_t const end = clock();
double const duration = 1.0*(end - start)/CLOCKS_PER_SEC;
wcout << funcname << ": " << duration << endl;
}
}
 
#define TEST( f, s ) alf::test( #f, f, s )
 
auto data( int const n )
-> stefan::tstring
{
stefan::tstring s( n, '\0' );
srand( 42 );
for( int i = 0; i < n; ++i )
{
s[i] = static_cast<stefan::tchar>( rand() );
}
return s;
}
 
auto main( int, char** args )
-> int
{
int const n = [=]{ try{ return std::stoi( args[1] ); } catch(...) {
return 200'000; } }();
setlocale( LC_ALL, "" );
auto const s = data( n );
 
using namespace std;
wcout << "Testing string with " << n << " units." << endl;
TEST( stefan::first_unique_char_in, s );
TEST( stefan::first_unique_char_of, s );
}
[code]
 
 
As you can see I've reformatted your code, in order to be able to relate
it more easily to generated assembly.
 
Also note that a string of completely random values, as in this code,
would be very unkind to a set implementation optimized for ¹Zipf's law,
that the frequency of any value is inversely proportional to its rank in
the frequency table, e.g. that ASCII characters should be very much more
frequent than obscure old Chinese glyphs, say. So with such optimization
involved the completely random data would skew the results. But I think
it's good enough here.
 
 
[results]
[C:\my\forums\clc++\054]
> a
Testing string with 200000 units.
stefan::first_unique_char_in: 0.015
stefan::first_unique_char_of: 0
 
[C:\my\forums\clc++\054]
> a 400000
Testing string with 400000 units.
stefan::first_unique_char_in: 0.027
stefan::first_unique_char_of: 4.861
 
[C:\my\forums\clc++\054]
> a 800000
Testing string with 800000 units.
stefan::first_unique_char_in: 0.05
stefan::first_unique_char_of: 25.453
 
[C:\my\forums\clc++\054]
> _
[/results]
 
 
With 200 000 chars, the brute force algorithm uses ~0 time.
 
Above that, doubling from 400K to 800K yields a four time increase in
the now sluggish behavior, as expected from O(n^2).
 
To investigate the odd extreme performance of the double loop for a
string below 200 000 characters, I generated annotated assembly code
with these commands:
 
 
[commands]
[C:\my\forums\clc++\054]
> g++ -c compare.cpp -O3 -g
 
[C:\my\forums\clc++\054]
> objdump -d -M intel -S compare.o >compare.s
 
[C:\my\forums\clc++\054]
> _
[/commands]
 
 
The hypothesis was that maybe g++ was smart enough to recognize that
with a short enough string it could replace the inner loop with
something based on a table lookup. This was before I made the string
size a command line argument. I just used a fixed size string at the
start of the investigation, so the size was known to the compiler.
 
However, apparently g++ generates just very straightforward assembly
(I've manually removed some very long symbolic label comments):
 
 
[assembly]
static tchar first_unique_char_of( tstring const & s )
{
0: 4c 8b 41 08 mov r8,QWORD PTR [rcx+0x8]
auto const z = s.size();
auto i = z;
for( i = 0; i < z; ++i )
4: 4d 85 c0 test r8,r8
7: 74 39 je 42
9: 48 8b 09 mov rcx,QWORD PTR [rcx]
c: 45 31 c9 xor r9d,r9d
f: 44 0f b7 11 movzx r10d,WORD PTR [rcx]
tchar const ch = s[ i ];
bool found = false;
auto j = z;
for( j = 0; j < z; ++j )
{
if( s[ j ]== ch && i != j )
13: 4d 85 c9 test r9,r9
tchar const ch = s[ i ];
16: 42 0f b7 04 49 movzx eax,WORD PTR [rcx+r9*2]
if( s[ j ]== ch && i != j )
1b: 74 06 je 23
1d: 66 44 39 d0 cmp ax,r10w
21: 74 16 je 39
23: 31 d2 xor edx,edx
for( j = 0; j < z; ++j )
25: 48 83 c2 01 add rdx,0x1
29: 4c 39 c2 cmp rdx,r8
2c: 74 17 je 45
if( s[ j ]== ch && i != j )
2e: 66 39 04 51 cmp WORD PTR [rcx+rdx*2],ax
32: 75 f1 jne 25
34: 4c 39 ca cmp rdx,r9
37: 74 ec je 25
for( i = 0; i < z; ++i )
39: 49 83 c1 01 add r9,0x1
3d: 4d 39 c1 cmp r9,r8
40: 75 d1 jne 13
found = true; break;
}
}
if( !found )return ch;
}
return tchar{ 0 };
42: 31 c0 xor eax,eax
44: c3 ret
}
45: c3 ret
[/assembly]
 
 
There's no special optimization here.
 
My main hypothesis is therefore that the buffer of the string of 200 000
characters, which would be around 800 KB, fits in a processor cache,
while the buffer of just double that size doesn't.
 
But the effect seems to be too drastic for just that, at least a
millionfold improvement?
 
 
Cheers!, still baffled,
 
- Alf
 
References:
¹ https://simple.wikipedia.org/wiki/Zipf's_law
Ralf Goertz <me@myprovider.invalid>: Feb 01 08:22AM +0100

Am Wed, 1 Feb 2017 01:32:03 +0100
 
> int const n = [=]{ try{ return std::stoi( args[1] ); }
> catch(...) { return 200'000; } }();
 
I get a syntax error:
error: missing terminating ' character
return 200'000; } }();
^
 
 
Why don't you? Is it a locale issue? Or c++17?
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 08:37AM +0100

On 01.02.2017 08:22, Ralf Goertz wrote:
> return 200'000; } }();
> ^
 
> Why don't you? Is it a locale issue? Or c++17?
 
C++14 introduced digit group separators (you can ¹add an apostrophe
between any pair of digits), and binary literals.
 
Unfortunately my compilers choke on `200'000` as user input. I think
that the standard library wasn't updated to follow the core language
syntax. But I'm not sure, I haven't checked the standard.
 
 
Cheers!,
 
- Alf
 
References:
¹ <url: http://en.cppreference.com/w/cpp/language/integer_literal>
David Brown <david.brown@hesbynett.no>: Feb 01 09:16AM +0100

On 01/02/17 02:45, Stefan Ram wrote:
 
> -msse2 -march=native -Ofast -O3 -g -fgcse -fgcse-las
 
> (maybe I should still remove »-g« here, these are just my
> one-size-fits-all default settings.)
 
That's not a great set of flags - it shows a certain confusion about
what they actually do.
 
If you have "-march=native", the compiler will use all the instructions
it can based on the processor doing the compiling. "-msse2" is redundant.
 
"-Ofast" enables a wide set of optimisations, including everything in
"-O3" - so "-O3" is redundant. Note that it also turns off certain
strict compliance support, such as turning on "-ffast-math" that can
floating point significantly faster at the expense of IEEE standard
handling of rounding, ordering, NaNs, etc.
 
"-g" is fine - it does not limit code generation.
 
"-fgcse" is redundant as it is included in -O2
 
"-fgcse-las" is an odd choice. It is an optimisation pass that is not
enabled by any "-O" flag. The two common reasons for not including the
pass in a -"O" level are that the pass sometimes makes code less
efficient, or that it can generate code that does not work the way the
programmer might expect (since not all programmers fully understand the
language). In this case, code that writes and then reads data through
incompatible pointers might fail. Typically these are passes that have
relatively little benefit for most code.
 
In general, these extra flags are usually unnecessary. But if you are
trying to get the /very/ best code, you might want to run tests with a
combination of them to see what works best for the code in question.
There are quite a lot of such flags - I don't know why you single out
this one.
 
 
That all leaves you with:
 
-march=native -Ofast -g
 
as a better choice of default flags.
David Brown <david.brown@hesbynett.no>: Feb 01 09:16AM +0100

On 01/02/17 09:16, David Brown wrote:
 
> That all leaves you with:
 
> -march=native -Ofast -g
 
> as a better choice of default flags.
 
I forgot to add warnings:
 
-march=native -Ofast -g -Wall -Wextra
Robert Wessel <robertwessel2@yahoo.com>: Feb 01 03:25AM -0600

On Wed, 1 Feb 2017 08:37:57 +0100, "Alf P. Steinbach"
 
>Unfortunately my compilers choke on `200'000` as user input. I think
>that the standard library wasn't updated to follow the core language
>syntax. But I'm not sure, I haven't checked the standard.
 
 
The input functions are basically unchanged, and don't accept digit
separators (at least not as digit separators).
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 06:42AM -0800

> }
> return Char{ 0 };
> }
 
This is a nice solution. Maybe not the fastest, but very
likely fast enough, and very nicely simple and obviously
right.
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 09:33AM -0800

> this breaking point the O(n^2) brute force algorithm is unreasonably
> fast. With larger strings it exhibits the expected O(n^2)
> sluggishness.
 
Nice analysis. My compliments on an interesting posting.
 
> much more frequent than obscure old Chinese glyphs, say. So with such
> optimization involved the completely random data would skew the
> results. But I think it's good enough here.
 
The string being initialized to random values is important!
Continued below...
 
> cache, while the buffer of just double that size doesn't.
 
> But the effect seems to be too drastic for just that, at least a
> millionfold improvement?
 
What you may be seeing is a consequence of using rand() and the
particular value of RAND_MAX on that system. If RAND_MAX is large
(eg, 2**31-1), using rand() to initialize the string is very kind
to the first_unique_char_of() algorithm. The reason is, in most
cases the first unique character will be the first or second
character in the string. Scanning 200,000 characters, whether once
or twice, takes almost no time.
 
If you investigate different values of RAND_MAX (eg, by using
'rand() & 0xffff'), I think you find that there is a knee in the
curve in different places (for where the first unique value occurs,
and hence also the running time), depending on how many distinct
random values there are.
 
I suggest time trials using a different initialization method.
For example, a string of length 2n+1 might be initialized to (if
you see what I mean):
 
1..n, 1..n, n+1
 
whereupon you should be able to see evidence of quadratic behavior
with much smaller string sizes (eg, n = 100 and n = 200), and say
one million iterations of calling first_unique_char_of().
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 11:00AM -0800

> algorithm for. I assume that in many programs many strings are
> sized well below 100'000 and use less than 1'000 of all the code
> points of Unicode.
 
Still more than enough for quadratic behavior to be slower
than linear behavior, even if the linear behavior has a
constant factor of 100.
 
 
> I expected the brute force algorithm to be quite fast because it's
> inner loop has high cache locality and perfect predictability of
> its access pattern for the prefetcher.
 
That doesn't matter nearly as much as how many times that loop
is executed. In cases where the double loop algorithm is faster,
it is faster because the inner loop isn't executed very many
times.
 
 
> Sometimes, RAND_MAX is as small as 32767, so this might not fill
> the whole range of a tchar, which might have 32 bits, but then,
> realistic data might not fill it either.
 
Based on my tests, a smaller value of RAND_MAX will tend to be
worse for the double loop algorithm, because duplicate values
usually make it slow down.
 
>> millionfold improvement?
 
> When the string is small, the probability that some character is
> unique is high and the outer loop can exit early.
 
If the string is small (say, less than 25 characters), it won't
matter if an O( n**2 ) algorithm is used. But even relatively
short strings (a few hundred characters) can get bogged down,
depending mostly on the characters. The relevant question is
_where_ does the first unique character occur? If it's the
first character in the string, the double loop algorithm is
very fast. Somewhere up around the 20'th or 30'th character,
the double loop algorithm starts to lose to the algorithms
that go through the string a (small) fixed number of times.
 
> usually small. Fancy algorithms have big constants. Until
> you know that n is frequently going to be big, don't get fancy.
> - Rob Pike
 
This quote isn't really apt, because that isn't what's going on
here. The double loop algorithm is faster _on some inputs_, but
O( n**2 ) for other inputs, and even for shortish strings this
matters. If it were known that all the inputs were 25 characters
or less, the double loop algorithm would be fine. Above that, it
depends on the strings in question. Here we may need a fancier
(but still not all that fancy) algorithm not because of how large
n is but because of what the input values might be. Insertion
sort can be very fast on some inputs, but it still is a poor
choice for more than a hundred elements or so. The same kind
of reasoning applies to the find-first-unique question.
woodbrian77@gmail.com: Jan 31 06:53PM -0800

On Tuesday, January 31, 2017 at 8:06:42 AM UTC-6, Scott Lurndal wrote:
 
> >When string_view is 16 bytes it's probably a good idea
> >to use (lvalue) references with it.
 
> Not necessarily,
 
I did a little test on an i3 laptop and wasn't able to detect
a pattern. Sometimes the test that used references was slightly
faster and other times the test that used values was slightly
faster. So it may not make as much of difference as I thought.
I'll probably pass string_view by value for the time being.
 
 
 
Brian
Ebenezer Enterprises
http://webEbenezer.net
Juha Nieminen <nospam@thanks.invalid>: Feb 01 07:56AM

> It seems though that putting the pointer first might help
> in terms of preventing some padding in the type if the pointer
> is 8 bytes and the length member is 4.
 
Can you name a system where pointers are 8 bytes long but size_t is 4 bytes?
Robert Wessel <robertwessel2@yahoo.com>: Feb 01 03:21AM -0600

On Wed, 1 Feb 2017 07:56:39 +0000 (UTC), Juha Nieminen
>> in terms of preventing some padding in the type if the pointer
>> is 8 bytes and the length member is 4.
 
>Can you name a system where pointers are 8 bytes long but size_t is 4 bytes?
 
 
The old AS/400 C compiler had 16 byte pointers and 4 byte size_t's. In
the early days of that, you couldn't have any individual object bigger
than 64KB, so a 2 byte size_t would have been possible, but unless
memory is failing me, size_t was was an int and 4 bytes.
woodbrian77@gmail.com: Feb 01 08:58AM -0800

On Wednesday, February 1, 2017 at 1:56:47 AM UTC-6, Juha Nieminen wrote:
> > in terms of preventing some padding in the type if the pointer
> > is 8 bytes and the length member is 4.
 
> Can you name a system where pointers are 8 bytes long but size_t is 4 bytes?
 
No. I'm reconsidering 32 bit operating systems though in order
to get a more reasonable size_t.
 
 
 
Brian
Ebenezer Enterprises - "I was a hopeless fool, now I'm hopelessly
devoted to You."
 
https://duckduckgo.com/?q=love+broke+thru+tobymac&t=ffsb&ia=videos&iax=1&iai=44l9PRI4c2M
scott@slp53.sl.home (Scott Lurndal): Feb 01 05:32PM


>> Can you name a system where pointers are 8 bytes long but size_t is 4 bytes?
 
>No. I'm reconsidering 32 bit operating systems though in order
>to get a more reasonable size_t.
 
What's wrong with an 8-byte size_t?
 
Note that even IOS is deprecating 32-bit apps. Good luck with that.
woodbrian77@gmail.com: Feb 01 10:55AM -0800

On Wednesday, February 1, 2017 at 11:33:02 AM UTC-6, Scott Lurndal wrote:
 
> >No. I'm reconsidering 32 bit operating systems though in order
> >to get a more reasonable size_t.
 
> What's wrong with an 8-byte size_t?
 
I don't need support for such mammoth string lengths.
 
 
 
> Note that even IOS is deprecating 32-bit apps. Good luck with that.
 
https://www.gotquestions.org/God-is-in-control.html
 
The advent of the C++ Middleware Writer is further evidence
of G-d's sovereignty.
 
I didn't say that I plan to start using 32 bit operating systems
to support the C++ Middleware Writer.
 
 
Brian
Ebenezer Enterprises - "And one called out to another and said, "Holy,
Holy, Holy, is the L-RD of hosts, The whole earth is full of His glory."
Isaiah 6:3
 
http://webEbenezer.net
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 08:41AM +0100

Example of using macros to introduce more expressive (but weird)
pseudo-keywords in the Niklaus Wirth-languages direction:
 
[code]
// Encoding: UTF-8 with BOM.
// Copyright © 2017 Alf P. Steinbach
 
#include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*)
#include <nlohmann/json.hpp> // nlohmann::*
 
#include <iostream> // std::(cin, cout, <<, endl)
 
#include <p/cppx/expressive.hpp> // progrock::cppx::expressive::*
using_nested_namespacE( progrock, cppx );
 
namespace windowsx_h {
namespace nlm = nlohmann;
using_all_from_namespacE( cppx::expressive );
 
procedurE to_json( ref_<nlm::json> j, ref_<Function const> func )
{
namE o = func;
j = nlm::json
{
{ "args", o.args },
{ "base_name", o.base_name },
{ "result_type", o.result_type }
};
}
 
procedurE to_json( ref_<nlm::json> j, ref_<Argument const> arg )
{
namE o = arg;
j = nlm::json
{
{ "name", o.name },
{ "type", o.type }
};
}
 
procedurE to_json( ref_<nlm::json> j, ref_<Call_spec const> call_spec )
{
namE o = call_spec;
j = nlm::json
{
{ "arg_expressions", o.arg_expressions },
{ "result_cast", o.result_cast }
};
}
 
procedurE to_json( ref_<nlm::json> j, ref_<Message_cracker const>
cracker )
{
namE o = cracker;
j = nlm::json
{
{ "handler", o.handler },
{ "call_spec", o.call_spec }
};
}
 
procedurE to_json( ref_<nlm::json> j, ref_<Crackers const> crackers )
{
namE o = crackers;
j = nlm::json
{
{ "by_message", o.by_message }
};
}
} // namespace windowsx_h
 
using namespace std;
namespace nlm = nlohmann;
using_all_from_namespacE( cppx::expressive );
namespace app {
procedurE generate_json_for(
ref_<windowsx_h::Crackers const> crackers,
ref_<ostream> stream
)
{
nlm::json const j{ crackers };
stream << j.dump( 4 ) << endl;
}
 
procedurE cpp_main()
{
leT crackers = windowsx_h::parse( cin );
app::generate_json_for( crackers, cout );
}
} // namespace app
 
functioN main()
-> int
{
using cppx::Exit_code;
try
{
app::cpp_main();
return Exit_code::success;
}
catch( exception const& x )
{
cerr << "!" << x.what() << endl;
}
return Exit_code::failure;
}
[/code]
 
 
where
 
 
[code]
#pragma once
// #include <p/cppx/expressive/pseudo_keywords.hpp>
// Copyright © 2017 Alf P. Steinbach
 
#define functioN auto
#define procedurE void
 
#define leT auto const
#define namE auto&
#define readonly_namE auto const&
 
#define repeaT do{
#define untiL }while
 
#define for_eacH for
 
// TODO: make variadic
#define using_from_namespaceE( ns, thingy ) \
using ns::thingy
 
#define using_all_from_namespacE( ns ) \
using namespace ns;
 
#define using_nested_namespacE( ns, nested ) \
namespace nested = ns::nested
 
#define lambdA( ... ) [__VA_ARGS__]
[/code]
 
 
Cheers!, :-p
 
- Alf
Juha Nieminen <nospam@thanks.invalid>: Feb 01 07:58AM

> #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*)
> #include <nlohmann/json.hpp> // nlohmann::*
 
And this should be of interest to us why, exactly?
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 09:54AM +0100

On 01.02.2017 08:58, Juha Nieminen wrote:
>> #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*)
>> #include <nlohmann/json.hpp> // nlohmann::*
 
> And this should be of interest to us why, exactly?
 
Apparently you're speaking for you family. I guess they're not into C++
much, and I didn't mean my posting to be read by them. Regarding your
quote it's pretty irrelevant about anything, except if you don't know a
nice JSON parser and generator: then, Nils Lohmann's is ¹pretty OK.
 
Cheers & hth.,
 
- Alf
 
Notes:
¹ I had to fix his source just a tiny teeny little bit to make it
compile with Visual C++ 2015 update 2.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 05:30PM +0100

On 01.02.2017 08:41, Alf P. Steinbach wrote:
> leT crackers = windowsx_h::parse( cin );
> app::generate_json_for( crackers, cout );
> }
 
Instead of avoiding name clashes via that silly trailing uppercase
convention, I figured one could use a special leading character.
 
After all C++ supports a wide range of Unicode in identifiers.
 
However, the g++ compiler doesn't support that, even with option
`-fextended-identifiers`.
 
But both g++, Visual C++ and clang, and I suspect also most other C++
compilers, support `$` in identifiers. The `$` sign is not formally part
of the basic C++ source character set, I think because of politics: at
one time the `$` was changed to the über-silly completely and utterly
useless ¹"universal currency sign" `¤` in ASCII. And the name of ASCII
was changed too, I don't recall the details.
 
So, as a concrete example, instead of the above one can write,
 
$procedure cpp_main()
{
$let crackers = windowsx_h::parse( cin );
app::generate_json_for( crackers, cout );
}
 
as a Pascal/Modula/Oberon-like more-readable-for-novices C++, with
pretty clear meaning, which after macro replacement is
 
void cpp_main()
{
auto const crackers = windowsx_h::parse( cin );
app::generate_json_for( crackers, cout );
}
 
using keywords `void` and `auto` with multiple context-dependent
meanings which must be figured out.
 
The use of `$` here is non-standard, and consequently clang emits a
warning in pedantic mode unless the option
`-Wdollar-in-identifier-extension` is used. Formally, after emitting a
diagnostic for use of a language extension, a compiler can just go on
compiling and accepting the source code. And the standard doesn't define
what a diagnostic is, it doesn't impose any requirements on diagnostics,
so the requisite diagnostic can be e.g. that the compiler blinks a
single pixel on the screen for about 1/70th of a second, say. Anyway the
nice thing about non-standard is that it's extremely unlikely that
anyone has used these identifiers. Yet. They're up for grabs, with the 3
main compilers. :)
 
 
Cheers!,
 
- Alf
 
Links:
¹ <url: https://en.wikipedia.org/wiki/Currency_sign_(typography)>
red floyd <dont.bother@its.invalid>: Feb 01 08:42AM -0800

On 1/31/2017 11:58 PM, Juha Nieminen wrote:
>> #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*)
>> #include <nlohmann/json.hpp> // nlohmann::*
 
> And this should be of interest to us why, exactly?
 
Not to mention that the semantics of his untiL macro are wrong.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 05:45PM +0100

On 01.02.2017 17:42, red floyd wrote:
>>> #include <nlohmann/json.hpp> // nlohmann::*
 
>> And this should be of interest to us why, exactly?
 
> Not to mention that the semantics of his untiL macro are wrong.
 
Bah, who cares about exclamation marks!!!!!
 
Cheers?,
 
- Alf
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 01 06:08PM

On 01/02/2017 07:41, Alf P. Steinbach wrote:
> namespace nested = ns::nested
 
> #define lambdA( ... ) [__VA_ARGS__]
> [/code]
 
Doing such things makes C++ code harder to read not easier. Avoid
macros, period.
 
/Flibble
Daniel <danielaparker@gmail.com>: Feb 01 10:26AM -0800

On Wednesday, February 1, 2017 at 1:09:10 PM UTC-5, Mr Flibble wrote:
 
> Doing such things makes C++ code harder to read not easier. Avoid
> macros, period.
 
Except when you can't, of course.
ram@zedat.fu-berlin.de (Stefan Ram): Feb 01 01:45AM

>- To measure performance, always turn on optimization, e.g. -g++ -O3-.
 
I did use
 
-msse2 -march=native -Ofast -O3 -g -fgcse -fgcse-las
 
(maybe I should still remove »-g« here, these are just my
one-size-fits-all default settings.)
 
>- Always use reasonable data for what you test.
 
We would need to know from the OP what he has intended
this algorithm for. I assume that in many programs many
strings are sized well below 100'000 and use less than
1'000 of all the code points of Unicode.
 
>this on, a really old Asus laptop running Windows 7. Before this
>breaking point the O(n^2) brute force algorithm is unreasonably fast.
>With larger strings it exhibits the expected O(n^2) sluggishness.
 
I expected the brute force algorithm to be quite fast because
it's inner loop has high cache locality and perfect predictability
of its access pattern for the prefetcher.
 
>s[i] = static_cast<stefan::tchar>( rand() );
 
Sometimes, RAND_MAX is as small as 32767, so this might not
fill the whole range of a tchar, which might have 32 bits,
but then, realistic data might not fill it either.
 
>But the effect seems to be too drastic for just that, at least a
>millionfold improvement?
 
It is possible that the precision of the clock used is not
large enough to measure some small but nonzero durations.
 
When the string is small, the probability that some
character is unique is high and the outer loop can exit
early. The algorithm with the unordered_multiset would still
have to fill the unordered_multiset with /all/ characters
from the string in this case, and an unordered_multiset
might be implemented with memory from the heap that might
not have the locality that helps to exploit a cache nor
the predictability that helps to exploit a prefetcher.
 
An L1 cache reference might take 0.5 ns, an L2 cache reference
7 ns, while a main memory reference might take 100 ns.
 
The standard essentially requires ::std::unordered_map to be
implemented with buckets of key-value pairs for each hash
entry. (The standard requires bucket iterators!) These
buckets are linked list, so one essentially always has some
pointer chasing. This might result in cache misses sometimes.
 
It might be faster sometimes to store the values in a vector
or use a custom unordered map with the table stored as a
/contiguous/ range of memory. Such an implementation is not
possible with the standard unordered_map interface which
requires bucket iterators.
 
(The preceding four, especially the preceding three,
paragraphs used information published by Chandler Carruth.)
 
Rule 3. Fancy algorithms are slow when n is small, and n is
usually small. Fancy algorithms have big constants. Until
you know that n is frequently going to be big, don't get fancy.
- Rob Pike
ram@zedat.fu-berlin.de (Stefan Ram): Feb 01 01:42PM

>(The preceding four, especially the preceding three,
>paragraphs used information published by Chandler Carruth.)
 
I do not claim that I fully understand the reasons for the
execution time of certain parts of the program, I will not
have time to delve deeper into it; but still, I tried to
optimize my code.
 
I observed that I have used »-D_GLIBCXX_DEBUG«, which might
have introduced some bounds checking, so I removed this option
for benchmarking and production releases.
 
Then, I added two optimizations:
 
I added a sentinel value to the end of the string, so that
I do not have to check for the end by other means anymore.
 
I replaced some unsigned loop counters with signed loop
counters, so that the implementation is relieved from the
burden of providing defined behavior on overflow.
 
So now, the C++ code is
 
static tchar first_unique_char_of( tstring & s )
{ long long const z = s.size();
s.push_back( 0 ); /* add sentinel */
long long i = 0; /* no-defined-overflow optimization */
auto const zs = s.size();
for( i = 0; i != z; ++i )
{ tchar const ch = s[ i ];
s[ z ]= ch; /* set sentinel */
bool found = false;
long long j = 0; /* no-defined-overflow optimization */
for( j = 0; ; ++j )
 
/* the next line contains the "inner loop" */
{ if( s[ j ]== ch && j != i ){ found = true; break; }}
 
if( j == z )found = false; /* sentinel was found */
if( !found )return ch; }
s.pop_back(); /* remove sentinel */
return tchar{ 0 }; }
 
and now it's inner loop is compiled to just (manually edited):
 
LOOP: add rax,0x1 # ++j
cmp WORD PTR [rcx+rax*2],dx # s[ j ]== ch?
jne LOOP # if not so, go back to LOOP
 
But beware, I have /not measured/ that this is actually faster!
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: