- The first non-duplicated letter in a string - 9 Updates
- C++ 2017 -- win, lose and draw - 6 Updates
- Fun with macros :) - 8 Updates
- The first non-duplicated letter in a string - 2 Updates
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 01:32AM +0100 On 31.01.2017 15:46, Stefan Ram wrote: > That means, in this case, »first_unique_char_of« is more > than 1000 times faster. (Unless I made an error, so feel > free to reproduce this.) You've apparently made some errors of methodology, I /think/, but still this is a fascinating result and a very good question. :) First, the methodology: • To measure performance, always turn on optimization, e.g. `g++ -O3`. Otherwise there is an abstraction cost for function calls, to make sure that one can delve into them in a debugger. This favors non- abstracted code. The cost is usually a fixed factor, but a huge one. • Always use reasonable data for what you test. For example, testing with a string of 200 000 '#' would be unreasonable, because one function would use just a couple of machine code instructions to find that single value '#', while the other would delve into a complex data structure first. Still with this methodology in place there's a completely weird "breaking point" at about string size 200 000 on the machine I'm writing this on, a really old Asus laptop running Windows 7. Before this breaking point the O(n^2) brute force algorithm is unreasonably fast. With larger strings it exhibits the expected O(n^2) sluggishness. I discuss that below. The code below is the code that I used to test. It bears witness of the way I honed in on the truth, at first suspecting some really ungood implementation of `std::unordered_multiset` with both g++ and Visual C++. That turned out to not be the case but I let the workaround stand: [code] #include <iostream> #include <iterator> // std::(begin, end) #include <locale.h> // setlocale #include <stddef.h> // ptrdiff_t #include <stdlib.h> // rand, srand #include <string> // std::basic_string #include <set> // std::multiset #include <time.h> // clock #include <unordered_map> // std::unordered_map #include <map> #include <unordered_set> // std::unordered_multiset namespace alf { using Size = ptrdiff_t; template< class Type > class Unordered_multiset_ { private: std::unordered_map< Type, int > values_; public: auto count( Type const& value ) const -> Size { auto const it = values_.find( value ); return (it == values_.end()? 0 : it->second); } template< class Iterator > Unordered_multiset_( Iterator const first, Iterator const beyond ) { //values_.reserve( beyond - first ); // Ungood, more slow. for( Iterator it = first; it != beyond; ++it ) { ++values_[*it]; } } }; } // namespace alf namespace stefan { using tchar = wchar_t; using tstring = ::std::basic_string< tchar >; template< class Value > //using Multiset_ = std::unordered_multiset< Value >; using Multiset_ = alf::Unordered_multiset_<Value>; static tchar first_unique_char_in( tstring const & s ) { Multiset_< tchar > const chars( begin( s ), end( s )); for( tchar const ch : s ) { if( chars.count( ch ) == 1 ) { return ch; } } return tchar{ 0 }; } static tchar first_unique_char_of( tstring const & s ) { auto const z = s.size(); auto i = z; for( i = 0; i < z; ++i ) { tchar const ch = s[ i ]; bool found = false; auto j = z; for( j = 0; j < z; ++j ) { if( s[ j ]== ch && i != j ) { found = true; break; } } if( !found )return ch; } return tchar{ 0 }; } } namespace alf { using namespace std; using Func = auto( stefan::tstring const& ) -> stefan::tchar; void test( char const funcname[], Func* const func, stefan::tstring const& s ) { time_t const start = clock(); auto const result = func( s ); (void) result; time_t const end = clock(); double const duration = 1.0*(end - start)/CLOCKS_PER_SEC; wcout << funcname << ": " << duration << endl; } } #define TEST( f, s ) alf::test( #f, f, s ) auto data( int const n ) -> stefan::tstring { stefan::tstring s( n, '\0' ); srand( 42 ); for( int i = 0; i < n; ++i ) { s[i] = static_cast<stefan::tchar>( rand() ); } return s; } auto main( int, char** args ) -> int { int const n = [=]{ try{ return std::stoi( args[1] ); } catch(...) { return 200'000; } }(); setlocale( LC_ALL, "" ); auto const s = data( n ); using namespace std; wcout << "Testing string with " << n << " units." << endl; TEST( stefan::first_unique_char_in, s ); TEST( stefan::first_unique_char_of, s ); } [code] As you can see I've reformatted your code, in order to be able to relate it more easily to generated assembly. Also note that a string of completely random values, as in this code, would be very unkind to a set implementation optimized for ¹Zipf's law, that the frequency of any value is inversely proportional to its rank in the frequency table, e.g. that ASCII characters should be very much more frequent than obscure old Chinese glyphs, say. So with such optimization involved the completely random data would skew the results. But I think it's good enough here. [results] [C:\my\forums\clc++\054] > a Testing string with 200000 units. stefan::first_unique_char_in: 0.015 stefan::first_unique_char_of: 0 [C:\my\forums\clc++\054] > a 400000 Testing string with 400000 units. stefan::first_unique_char_in: 0.027 stefan::first_unique_char_of: 4.861 [C:\my\forums\clc++\054] > a 800000 Testing string with 800000 units. stefan::first_unique_char_in: 0.05 stefan::first_unique_char_of: 25.453 [C:\my\forums\clc++\054] > _ [/results] With 200 000 chars, the brute force algorithm uses ~0 time. Above that, doubling from 400K to 800K yields a four time increase in the now sluggish behavior, as expected from O(n^2). To investigate the odd extreme performance of the double loop for a string below 200 000 characters, I generated annotated assembly code with these commands: [commands] [C:\my\forums\clc++\054] > g++ -c compare.cpp -O3 -g [C:\my\forums\clc++\054] > objdump -d -M intel -S compare.o >compare.s [C:\my\forums\clc++\054] > _ [/commands] The hypothesis was that maybe g++ was smart enough to recognize that with a short enough string it could replace the inner loop with something based on a table lookup. This was before I made the string size a command line argument. I just used a fixed size string at the start of the investigation, so the size was known to the compiler. However, apparently g++ generates just very straightforward assembly (I've manually removed some very long symbolic label comments): [assembly] static tchar first_unique_char_of( tstring const & s ) { 0: 4c 8b 41 08 mov r8,QWORD PTR [rcx+0x8] auto const z = s.size(); auto i = z; for( i = 0; i < z; ++i ) 4: 4d 85 c0 test r8,r8 7: 74 39 je 42 9: 48 8b 09 mov rcx,QWORD PTR [rcx] c: 45 31 c9 xor r9d,r9d f: 44 0f b7 11 movzx r10d,WORD PTR [rcx] tchar const ch = s[ i ]; bool found = false; auto j = z; for( j = 0; j < z; ++j ) { if( s[ j ]== ch && i != j ) 13: 4d 85 c9 test r9,r9 tchar const ch = s[ i ]; 16: 42 0f b7 04 49 movzx eax,WORD PTR [rcx+r9*2] if( s[ j ]== ch && i != j ) 1b: 74 06 je 23 1d: 66 44 39 d0 cmp ax,r10w 21: 74 16 je 39 23: 31 d2 xor edx,edx for( j = 0; j < z; ++j ) 25: 48 83 c2 01 add rdx,0x1 29: 4c 39 c2 cmp rdx,r8 2c: 74 17 je 45 if( s[ j ]== ch && i != j ) 2e: 66 39 04 51 cmp WORD PTR [rcx+rdx*2],ax 32: 75 f1 jne 25 34: 4c 39 ca cmp rdx,r9 37: 74 ec je 25 for( i = 0; i < z; ++i ) 39: 49 83 c1 01 add r9,0x1 3d: 4d 39 c1 cmp r9,r8 40: 75 d1 jne 13 found = true; break; } } if( !found )return ch; } return tchar{ 0 }; 42: 31 c0 xor eax,eax 44: c3 ret } 45: c3 ret [/assembly] There's no special optimization here. My main hypothesis is therefore that the buffer of the string of 200 000 characters, which would be around 800 KB, fits in a processor cache, while the buffer of just double that size doesn't. But the effect seems to be too drastic for just that, at least a millionfold improvement? Cheers!, still baffled, - Alf References: ¹ https://simple.wikipedia.org/wiki/Zipf's_law |
Ralf Goertz <me@myprovider.invalid>: Feb 01 08:22AM +0100 Am Wed, 1 Feb 2017 01:32:03 +0100 > int const n = [=]{ try{ return std::stoi( args[1] ); } > catch(...) { return 200'000; } }(); I get a syntax error: error: missing terminating ' character return 200'000; } }(); ^ Why don't you? Is it a locale issue? Or c++17? |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 08:37AM +0100 On 01.02.2017 08:22, Ralf Goertz wrote: > return 200'000; } }(); > ^ > Why don't you? Is it a locale issue? Or c++17? C++14 introduced digit group separators (you can ¹add an apostrophe between any pair of digits), and binary literals. Unfortunately my compilers choke on `200'000` as user input. I think that the standard library wasn't updated to follow the core language syntax. But I'm not sure, I haven't checked the standard. Cheers!, - Alf References: ¹ <url: http://en.cppreference.com/w/cpp/language/integer_literal> |
David Brown <david.brown@hesbynett.no>: Feb 01 09:16AM +0100 On 01/02/17 02:45, Stefan Ram wrote: > -msse2 -march=native -Ofast -O3 -g -fgcse -fgcse-las > (maybe I should still remove »-g« here, these are just my > one-size-fits-all default settings.) That's not a great set of flags - it shows a certain confusion about what they actually do. If you have "-march=native", the compiler will use all the instructions it can based on the processor doing the compiling. "-msse2" is redundant. "-Ofast" enables a wide set of optimisations, including everything in "-O3" - so "-O3" is redundant. Note that it also turns off certain strict compliance support, such as turning on "-ffast-math" that can floating point significantly faster at the expense of IEEE standard handling of rounding, ordering, NaNs, etc. "-g" is fine - it does not limit code generation. "-fgcse" is redundant as it is included in -O2 "-fgcse-las" is an odd choice. It is an optimisation pass that is not enabled by any "-O" flag. The two common reasons for not including the pass in a -"O" level are that the pass sometimes makes code less efficient, or that it can generate code that does not work the way the programmer might expect (since not all programmers fully understand the language). In this case, code that writes and then reads data through incompatible pointers might fail. Typically these are passes that have relatively little benefit for most code. In general, these extra flags are usually unnecessary. But if you are trying to get the /very/ best code, you might want to run tests with a combination of them to see what works best for the code in question. There are quite a lot of such flags - I don't know why you single out this one. That all leaves you with: -march=native -Ofast -g as a better choice of default flags. |
David Brown <david.brown@hesbynett.no>: Feb 01 09:16AM +0100 On 01/02/17 09:16, David Brown wrote: > That all leaves you with: > -march=native -Ofast -g > as a better choice of default flags. I forgot to add warnings: -march=native -Ofast -g -Wall -Wextra |
Robert Wessel <robertwessel2@yahoo.com>: Feb 01 03:25AM -0600 On Wed, 1 Feb 2017 08:37:57 +0100, "Alf P. Steinbach" >Unfortunately my compilers choke on `200'000` as user input. I think >that the standard library wasn't updated to follow the core language >syntax. But I'm not sure, I haven't checked the standard. The input functions are basically unchanged, and don't accept digit separators (at least not as digit separators). |
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 06:42AM -0800 > } > return Char{ 0 }; > } This is a nice solution. Maybe not the fastest, but very likely fast enough, and very nicely simple and obviously right. |
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 09:33AM -0800 > this breaking point the O(n^2) brute force algorithm is unreasonably > fast. With larger strings it exhibits the expected O(n^2) > sluggishness. Nice analysis. My compliments on an interesting posting. > much more frequent than obscure old Chinese glyphs, say. So with such > optimization involved the completely random data would skew the > results. But I think it's good enough here. The string being initialized to random values is important! Continued below... > cache, while the buffer of just double that size doesn't. > But the effect seems to be too drastic for just that, at least a > millionfold improvement? What you may be seeing is a consequence of using rand() and the particular value of RAND_MAX on that system. If RAND_MAX is large (eg, 2**31-1), using rand() to initialize the string is very kind to the first_unique_char_of() algorithm. The reason is, in most cases the first unique character will be the first or second character in the string. Scanning 200,000 characters, whether once or twice, takes almost no time. If you investigate different values of RAND_MAX (eg, by using 'rand() & 0xffff'), I think you find that there is a knee in the curve in different places (for where the first unique value occurs, and hence also the running time), depending on how many distinct random values there are. I suggest time trials using a different initialization method. For example, a string of length 2n+1 might be initialized to (if you see what I mean): 1..n, 1..n, n+1 whereupon you should be able to see evidence of quadratic behavior with much smaller string sizes (eg, n = 100 and n = 200), and say one million iterations of calling first_unique_char_of(). |
Tim Rentsch <txr@alumni.caltech.edu>: Feb 01 11:00AM -0800 > algorithm for. I assume that in many programs many strings are > sized well below 100'000 and use less than 1'000 of all the code > points of Unicode. Still more than enough for quadratic behavior to be slower than linear behavior, even if the linear behavior has a constant factor of 100. > I expected the brute force algorithm to be quite fast because it's > inner loop has high cache locality and perfect predictability of > its access pattern for the prefetcher. That doesn't matter nearly as much as how many times that loop is executed. In cases where the double loop algorithm is faster, it is faster because the inner loop isn't executed very many times. > Sometimes, RAND_MAX is as small as 32767, so this might not fill > the whole range of a tchar, which might have 32 bits, but then, > realistic data might not fill it either. Based on my tests, a smaller value of RAND_MAX will tend to be worse for the double loop algorithm, because duplicate values usually make it slow down. >> millionfold improvement? > When the string is small, the probability that some character is > unique is high and the outer loop can exit early. If the string is small (say, less than 25 characters), it won't matter if an O( n**2 ) algorithm is used. But even relatively short strings (a few hundred characters) can get bogged down, depending mostly on the characters. The relevant question is _where_ does the first unique character occur? If it's the first character in the string, the double loop algorithm is very fast. Somewhere up around the 20'th or 30'th character, the double loop algorithm starts to lose to the algorithms that go through the string a (small) fixed number of times. > usually small. Fancy algorithms have big constants. Until > you know that n is frequently going to be big, don't get fancy. > - Rob Pike This quote isn't really apt, because that isn't what's going on here. The double loop algorithm is faster _on some inputs_, but O( n**2 ) for other inputs, and even for shortish strings this matters. If it were known that all the inputs were 25 characters or less, the double loop algorithm would be fine. Above that, it depends on the strings in question. Here we may need a fancier (but still not all that fancy) algorithm not because of how large n is but because of what the input values might be. Insertion sort can be very fast on some inputs, but it still is a poor choice for more than a hundred elements or so. The same kind of reasoning applies to the find-first-unique question. |
woodbrian77@gmail.com: Jan 31 06:53PM -0800 On Tuesday, January 31, 2017 at 8:06:42 AM UTC-6, Scott Lurndal wrote: > >When string_view is 16 bytes it's probably a good idea > >to use (lvalue) references with it. > Not necessarily, I did a little test on an i3 laptop and wasn't able to detect a pattern. Sometimes the test that used references was slightly faster and other times the test that used values was slightly faster. So it may not make as much of difference as I thought. I'll probably pass string_view by value for the time being. Brian Ebenezer Enterprises http://webEbenezer.net |
Juha Nieminen <nospam@thanks.invalid>: Feb 01 07:56AM > It seems though that putting the pointer first might help > in terms of preventing some padding in the type if the pointer > is 8 bytes and the length member is 4. Can you name a system where pointers are 8 bytes long but size_t is 4 bytes? |
Robert Wessel <robertwessel2@yahoo.com>: Feb 01 03:21AM -0600 On Wed, 1 Feb 2017 07:56:39 +0000 (UTC), Juha Nieminen >> in terms of preventing some padding in the type if the pointer >> is 8 bytes and the length member is 4. >Can you name a system where pointers are 8 bytes long but size_t is 4 bytes? The old AS/400 C compiler had 16 byte pointers and 4 byte size_t's. In the early days of that, you couldn't have any individual object bigger than 64KB, so a 2 byte size_t would have been possible, but unless memory is failing me, size_t was was an int and 4 bytes. |
woodbrian77@gmail.com: Feb 01 08:58AM -0800 On Wednesday, February 1, 2017 at 1:56:47 AM UTC-6, Juha Nieminen wrote: > > in terms of preventing some padding in the type if the pointer > > is 8 bytes and the length member is 4. > Can you name a system where pointers are 8 bytes long but size_t is 4 bytes? No. I'm reconsidering 32 bit operating systems though in order to get a more reasonable size_t. Brian Ebenezer Enterprises - "I was a hopeless fool, now I'm hopelessly devoted to You." https://duckduckgo.com/?q=love+broke+thru+tobymac&t=ffsb&ia=videos&iax=1&iai=44l9PRI4c2M |
scott@slp53.sl.home (Scott Lurndal): Feb 01 05:32PM >> Can you name a system where pointers are 8 bytes long but size_t is 4 bytes? >No. I'm reconsidering 32 bit operating systems though in order >to get a more reasonable size_t. What's wrong with an 8-byte size_t? Note that even IOS is deprecating 32-bit apps. Good luck with that. |
woodbrian77@gmail.com: Feb 01 10:55AM -0800 On Wednesday, February 1, 2017 at 11:33:02 AM UTC-6, Scott Lurndal wrote: > >No. I'm reconsidering 32 bit operating systems though in order > >to get a more reasonable size_t. > What's wrong with an 8-byte size_t? I don't need support for such mammoth string lengths. > Note that even IOS is deprecating 32-bit apps. Good luck with that. https://www.gotquestions.org/God-is-in-control.html The advent of the C++ Middleware Writer is further evidence of G-d's sovereignty. I didn't say that I plan to start using 32 bit operating systems to support the C++ Middleware Writer. Brian Ebenezer Enterprises - "And one called out to another and said, "Holy, Holy, Holy, is the L-RD of hosts, The whole earth is full of His glory." Isaiah 6:3 http://webEbenezer.net |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 08:41AM +0100 Example of using macros to introduce more expressive (but weird) pseudo-keywords in the Niklaus Wirth-languages direction: [code] // Encoding: UTF-8 with BOM. // Copyright © 2017 Alf P. Steinbach #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*) #include <nlohmann/json.hpp> // nlohmann::* #include <iostream> // std::(cin, cout, <<, endl) #include <p/cppx/expressive.hpp> // progrock::cppx::expressive::* using_nested_namespacE( progrock, cppx ); namespace windowsx_h { namespace nlm = nlohmann; using_all_from_namespacE( cppx::expressive ); procedurE to_json( ref_<nlm::json> j, ref_<Function const> func ) { namE o = func; j = nlm::json { { "args", o.args }, { "base_name", o.base_name }, { "result_type", o.result_type } }; } procedurE to_json( ref_<nlm::json> j, ref_<Argument const> arg ) { namE o = arg; j = nlm::json { { "name", o.name }, { "type", o.type } }; } procedurE to_json( ref_<nlm::json> j, ref_<Call_spec const> call_spec ) { namE o = call_spec; j = nlm::json { { "arg_expressions", o.arg_expressions }, { "result_cast", o.result_cast } }; } procedurE to_json( ref_<nlm::json> j, ref_<Message_cracker const> cracker ) { namE o = cracker; j = nlm::json { { "handler", o.handler }, { "call_spec", o.call_spec } }; } procedurE to_json( ref_<nlm::json> j, ref_<Crackers const> crackers ) { namE o = crackers; j = nlm::json { { "by_message", o.by_message } }; } } // namespace windowsx_h using namespace std; namespace nlm = nlohmann; using_all_from_namespacE( cppx::expressive ); namespace app { procedurE generate_json_for( ref_<windowsx_h::Crackers const> crackers, ref_<ostream> stream ) { nlm::json const j{ crackers }; stream << j.dump( 4 ) << endl; } procedurE cpp_main() { leT crackers = windowsx_h::parse( cin ); app::generate_json_for( crackers, cout ); } } // namespace app functioN main() -> int { using cppx::Exit_code; try { app::cpp_main(); return Exit_code::success; } catch( exception const& x ) { cerr << "!" << x.what() << endl; } return Exit_code::failure; } [/code] where [code] #pragma once // #include <p/cppx/expressive/pseudo_keywords.hpp> // Copyright © 2017 Alf P. Steinbach #define functioN auto #define procedurE void #define leT auto const #define namE auto& #define readonly_namE auto const& #define repeaT do{ #define untiL }while #define for_eacH for // TODO: make variadic #define using_from_namespaceE( ns, thingy ) \ using ns::thingy #define using_all_from_namespacE( ns ) \ using namespace ns; #define using_nested_namespacE( ns, nested ) \ namespace nested = ns::nested #define lambdA( ... ) [__VA_ARGS__] [/code] Cheers!, :-p - Alf |
Juha Nieminen <nospam@thanks.invalid>: Feb 01 07:58AM > #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*) > #include <nlohmann/json.hpp> // nlohmann::* And this should be of interest to us why, exactly? |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 09:54AM +0100 On 01.02.2017 08:58, Juha Nieminen wrote: >> #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*) >> #include <nlohmann/json.hpp> // nlohmann::* > And this should be of interest to us why, exactly? Apparently you're speaking for you family. I guess they're not into C++ much, and I didn't mean my posting to be read by them. Regarding your quote it's pretty irrelevant about anything, except if you don't know a nice JSON parser and generator: then, Nils Lohmann's is ¹pretty OK. Cheers & hth., - Alf Notes: ¹ I had to fix his source just a tiny teeny little bit to make it compile with Visual C++ 2015 update 2. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 05:30PM +0100 On 01.02.2017 08:41, Alf P. Steinbach wrote: > leT crackers = windowsx_h::parse( cin ); > app::generate_json_for( crackers, cout ); > } Instead of avoiding name clashes via that silly trailing uppercase convention, I figured one could use a special leading character. After all C++ supports a wide range of Unicode in identifiers. However, the g++ compiler doesn't support that, even with option `-fextended-identifiers`. But both g++, Visual C++ and clang, and I suspect also most other C++ compilers, support `$` in identifiers. The `$` sign is not formally part of the basic C++ source character set, I think because of politics: at one time the `$` was changed to the über-silly completely and utterly useless ¹"universal currency sign" `¤` in ASCII. And the name of ASCII was changed too, I don't recall the details. So, as a concrete example, instead of the above one can write, $procedure cpp_main() { $let crackers = windowsx_h::parse( cin ); app::generate_json_for( crackers, cout ); } as a Pascal/Modula/Oberon-like more-readable-for-novices C++, with pretty clear meaning, which after macro replacement is void cpp_main() { auto const crackers = windowsx_h::parse( cin ); app::generate_json_for( crackers, cout ); } using keywords `void` and `auto` with multiple context-dependent meanings which must be figured out. The use of `$` here is non-standard, and consequently clang emits a warning in pedantic mode unless the option `-Wdollar-in-identifier-extension` is used. Formally, after emitting a diagnostic for use of a language extension, a compiler can just go on compiling and accepting the source code. And the standard doesn't define what a diagnostic is, it doesn't impose any requirements on diagnostics, so the requisite diagnostic can be e.g. that the compiler blinks a single pixel on the screen for about 1/70th of a second, say. Anyway the nice thing about non-standard is that it's extremely unlikely that anyone has used these identifiers. Yet. They're up for grabs, with the 3 main compilers. :) Cheers!, - Alf Links: ¹ <url: https://en.wikipedia.org/wiki/Currency_sign_(typography)> |
red floyd <dont.bother@its.invalid>: Feb 01 08:42AM -0800 On 1/31/2017 11:58 PM, Juha Nieminen wrote: >> #include "windowsx_h/parsing.hpp" // windowsx_h::(Crackers, parse*) >> #include <nlohmann/json.hpp> // nlohmann::* > And this should be of interest to us why, exactly? Not to mention that the semantics of his untiL macro are wrong. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 01 05:45PM +0100 On 01.02.2017 17:42, red floyd wrote: >>> #include <nlohmann/json.hpp> // nlohmann::* >> And this should be of interest to us why, exactly? > Not to mention that the semantics of his untiL macro are wrong. Bah, who cares about exclamation marks!!!!! Cheers?, - Alf |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 01 06:08PM On 01/02/2017 07:41, Alf P. Steinbach wrote: > namespace nested = ns::nested > #define lambdA( ... ) [__VA_ARGS__] > [/code] Doing such things makes C++ code harder to read not easier. Avoid macros, period. /Flibble |
Daniel <danielaparker@gmail.com>: Feb 01 10:26AM -0800 On Wednesday, February 1, 2017 at 1:09:10 PM UTC-5, Mr Flibble wrote: > Doing such things makes C++ code harder to read not easier. Avoid > macros, period. Except when you can't, of course. |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 01 01:45AM >- To measure performance, always turn on optimization, e.g. -g++ -O3-. I did use -msse2 -march=native -Ofast -O3 -g -fgcse -fgcse-las (maybe I should still remove »-g« here, these are just my one-size-fits-all default settings.) >- Always use reasonable data for what you test. We would need to know from the OP what he has intended this algorithm for. I assume that in many programs many strings are sized well below 100'000 and use less than 1'000 of all the code points of Unicode. >this on, a really old Asus laptop running Windows 7. Before this >breaking point the O(n^2) brute force algorithm is unreasonably fast. >With larger strings it exhibits the expected O(n^2) sluggishness. I expected the brute force algorithm to be quite fast because it's inner loop has high cache locality and perfect predictability of its access pattern for the prefetcher. >s[i] = static_cast<stefan::tchar>( rand() ); Sometimes, RAND_MAX is as small as 32767, so this might not fill the whole range of a tchar, which might have 32 bits, but then, realistic data might not fill it either. >But the effect seems to be too drastic for just that, at least a >millionfold improvement? It is possible that the precision of the clock used is not large enough to measure some small but nonzero durations. When the string is small, the probability that some character is unique is high and the outer loop can exit early. The algorithm with the unordered_multiset would still have to fill the unordered_multiset with /all/ characters from the string in this case, and an unordered_multiset might be implemented with memory from the heap that might not have the locality that helps to exploit a cache nor the predictability that helps to exploit a prefetcher. An L1 cache reference might take 0.5 ns, an L2 cache reference 7 ns, while a main memory reference might take 100 ns. The standard essentially requires ::std::unordered_map to be implemented with buckets of key-value pairs for each hash entry. (The standard requires bucket iterators!) These buckets are linked list, so one essentially always has some pointer chasing. This might result in cache misses sometimes. It might be faster sometimes to store the values in a vector or use a custom unordered map with the table stored as a /contiguous/ range of memory. Such an implementation is not possible with the standard unordered_map interface which requires bucket iterators. (The preceding four, especially the preceding three, paragraphs used information published by Chandler Carruth.) Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. - Rob Pike |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 01 01:42PM >(The preceding four, especially the preceding three, >paragraphs used information published by Chandler Carruth.) I do not claim that I fully understand the reasons for the execution time of certain parts of the program, I will not have time to delve deeper into it; but still, I tried to optimize my code. I observed that I have used »-D_GLIBCXX_DEBUG«, which might have introduced some bounds checking, so I removed this option for benchmarking and production releases. Then, I added two optimizations: I added a sentinel value to the end of the string, so that I do not have to check for the end by other means anymore. I replaced some unsigned loop counters with signed loop counters, so that the implementation is relieved from the burden of providing defined behavior on overflow. So now, the C++ code is static tchar first_unique_char_of( tstring & s ) { long long const z = s.size(); s.push_back( 0 ); /* add sentinel */ long long i = 0; /* no-defined-overflow optimization */ auto const zs = s.size(); for( i = 0; i != z; ++i ) { tchar const ch = s[ i ]; s[ z ]= ch; /* set sentinel */ bool found = false; long long j = 0; /* no-defined-overflow optimization */ for( j = 0; ; ++j ) /* the next line contains the "inner loop" */ { if( s[ j ]== ch && j != i ){ found = true; break; }} if( j == z )found = false; /* sentinel was found */ if( !found )return ch; } s.pop_back(); /* remove sentinel */ return tchar{ 0 }; } and now it's inner loop is compiled to just (manually edited): LOOP: add rax,0x1 # ++j cmp WORD PTR [rcx+rax*2],dx # s[ j ]== ch? jne LOOP # if not so, go back to LOOP But beware, I have /not measured/ that this is actually faster! |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment