| "Öö Tiib" <ootiib@hot.ee>: Nov 24 01:57AM -0800 On Sunday, 24 November 2019 01:02:33 UTC+2, Paul wrote: > Thanks for your interest. > My experimental results don't agree with your post. > ***However, there is definitely the possibility that my testing is flawed.*** Note that 100 millions and 10 millions are different counts. Also the randomness can be biased, IIRC then RAND_MAX was 32767 in some Visual Studio. What it is in yours? With such tiny RAND_MAX the winner algorithm would count into std::array<int,RAND_MAX+1> allCounts; hash table is wasteful there. |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 11:12AM +0100 > The std::sort of vector of 10 millions of really random integers > tends to be about twice faster than counting equal elements of it > into hash table. ... LOL. |
| "Öö Tiib" <ootiib@hot.ee>: Nov 24 03:26AM -0800 On Sunday, 24 November 2019 01:02:33 UTC+2, Paul wrote: > Thanks for your interest. Can you report results of that on your platform: #include <algorithm> // generate and sort #include <chrono> // chrono #include <iostream> // cout #include <random> // random_device and mt19937 #include <unordered_map> // unordered_map #include <vector> // vector // note that std::random_device does nothing good on lot of platforms // here it is just used for toy example that it is static std::random_device rnd_device; static std::mt19937 engine {rnd_device()}; static std::uniform_int_distribution<int> dist {0, INT_MAX}; int gen() { return dist(engine); }; size_t hashcount(const std::vector<int>& vec) { // For each int in the vector, count how many times it occurs. std::unordered_map<int, int> allCounts; for (int v : vec) ++allCounts[v]; return allCounts.size(); } int main() { std::vector<int> vec(10000000); std::generate(begin(vec), end(vec), gen); auto t1 = std::chrono::high_resolution_clock::now(); hashcount(vec); auto t2 = std::chrono::high_resolution_clock::now(); std::cout << "hash " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " "; std::vector<int> vec2(10000000); std::generate(begin(vec2), end(vec2), gen); t1 = std::chrono::high_resolution_clock::now(); std::sort(begin(vec2),end(vec2)); t2 = std::chrono::high_resolution_clock::now(); std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " "; return 0; } |
| Paul <pepstein5@gmail.com>: Nov 24 03:48AM -0800 On Sunday, November 24, 2019 at 10:12:28 AM UTC, Bonita Montero wrote: > > tends to be about twice faster than counting equal elements of it > > into hash table. ... > LOL. Yes, I too was very surprised by this claim, and my initial surprise is backed up by my experiment. I didn't feel confident enough to challenge the claim so directly, though. Paul |
| Paul <pepstein5@gmail.com>: Nov 24 03:56AM -0800 On Sunday, November 24, 2019 at 9:57:20 AM UTC, Öö Tiib wrote: > some Visual Studio. What it is in yours? With such tiny RAND_MAX > the winner algorithm would count into > std::array<int,RAND_MAX+1> allCounts; hash table is wasteful there. I started testing with a length of 10 million and tested with a length of 100 million at a later stage. Carelessly, I didn't update the comments and all defaults to reflect this change but this is basically a documentation error. RAND_MAX is indeed 32767 in my version of Visual Studio. My intent is to treat the processes of testing the algos and designing the algos as separate. The vector generation with std::rand() is part of the testing, not part of the algorithm. Changing allCounts as you suggest would tie the algorithms to the testing. In test driven development, people do indeed tie the algorithms to the testing. But I don't want to do it that way. The algorithms should be independent of the methods used to test them. Paul |
| "Öö Tiib" <ootiib@hot.ee>: Nov 24 04:13AM -0800 On Sunday, 24 November 2019 13:56:39 UTC+2, Paul wrote: > testing. > But I don't want to do it that way. > The algorithms should be independent of the methods used to test them. Then use random engines like std::mt19937 for generating test data. These behave more similarly between platforms than std::rand and so avoid confusion. Usage of std:rand with so low max and cycle size does cause odd artifacts and effects so people do avoid it even for dice rolling in entertainment. |
| "Öö Tiib" <ootiib@hot.ee>: Nov 24 04:27AM -0800 On Sunday, 24 November 2019 13:49:05 UTC+2, Paul wrote: > Yes, I too was very surprised by this claim, and my initial surprise > is backed up by my experiment. I didn't feel confident enough to > challenge the claim so directly, though. Note also that the claim keeps being true and your experiments used narrow range of test values like I predicted. |
| Paul <pepstein5@gmail.com>: Nov 24 04:43AM -0800 On Sunday, November 24, 2019 at 12:14:01 PM UTC, Öö Tiib wrote: > so avoid confusion. Usage of std:rand with so low max and cycle size > does cause odd artifacts and effects so people do avoid it > even for dice rolling in entertainment. Good advice, here. Thanks. Paul |
| Paul <pepstein5@gmail.com>: Nov 24 04:44AM -0800 On Sunday, November 24, 2019 at 12:27:24 PM UTC, Öö Tiib wrote: > > challenge the claim so directly, though. > Note also that the claim keeps being true and your experiments used > narrow range of test values like I predicted. Yes, your claim is vindicated by your code that I tested on my platform as requested. I will report the results in my reply to the post of yours that contains your code. Paul |
| Paul <pepstein5@gmail.com>: Nov 24 04:48AM -0800 On Sunday, November 24, 2019 at 11:26:13 AM UTC, Öö Tiib wrote: > std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " "; > return 0; > } The console output was hash 5559216 sort 1263252 which vindicates your initial response. It's an interesting example of big O analysis not being the whole story. Although sorting is O(n * log (n) ), sorting seems best in this context even though linear methods are available. Paul |
| Paul <pepstein5@gmail.com>: Nov 24 05:40AM -0800 On Sunday, November 24, 2019 at 11:26:13 AM UTC, Öö Tiib wrote: > std::cout << "sort " << std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count() << " "; > return 0; > } The line allCounts.reserve(vec.size()); gives a significantly improved performance. But this basically means that hashing becomes 3 times slower than sorting instead of 4 times slower. Paul |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 03:04PM +0100 >> tends to be about twice faster than counting equal elements of it >> into hash table. ... > LOL. On my old Ryzen 7 1800X, sorting of 10'000'000 integters takes more than twice the time than counting with VC++2019 and nearly twice the time than counting with g++. #include <iostream> #include <vector> #include <unordered_set> #include <limits> #include <algorithm> #include <random> #include <chrono> using namespace std; using namespace chrono; int main() { size_t const N_ELEMENTS = 10'000'000; random_device rd; uniform_int_distribution<int> uid( numeric_limits<int>::min(), numeric_limits<int>::max() ); vector<int> v; unordered_set<int> s( N_ELEMENTS ); time_point<high_resolution_clock> start; double seconds; v.resize( N_ELEMENTS ); for( int &e : v ) e = uid( rd ), s.insert( e ); start = high_resolution_clock::now(); size_t uniq = 0; for( unordered_set<int>::iterator it = s.begin(); it != s.end(); ) { ++uniq; int e = *it++; for( ; it != s.end() && *it == e; ++it ); } seconds = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0E9; cout << "unique elements: " << uniq << endl; cout << "time to count: " << seconds << "s" << endl; start = high_resolution_clock::now(); sort( v.begin(), v.end() ); seconds = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0E9; cout << "time to sort: " << seconds << "s" << endl; } And I reserved as equal buckets a there are values, which should speed up counting anyway. |
| Paul <pepstein5@gmail.com>: Nov 24 06:34AM -0800 On Sunday, November 24, 2019 at 2:04:38 PM UTC, Bonita Montero wrote: > } > And I reserved as equal buckets a there are values, which should speed > up counting anyway. Thanks for this. But your code seems to lose the context of the original task. Fix r and fix a vector. The task is to find the size of the largest subsequence of the vector so that the range of that subsequence <= r. As far as I know, non-sorting methods need to maintain a correspondence in the form of key-value pairs {v, number of occurrences of v} This is why Oo (sorry but I don't know how to create the diacritical marks) and I both used unordered_map instead of unordered_set. I am not surprised that unordered_set operations are quicker. Oo's comment about relative speeds of hashing/sorting should be understood in the context of the entire thread -- I don't think that he or she intended the comment to be read as a standalone. Paul |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 03:42PM +0100 > But your code seems to lose the context of the original task. I only wanted to falsify Öö Tiib statement. |
| Paul <pepstein5@gmail.com>: Nov 24 07:51AM -0800 On Sunday, November 24, 2019 at 2:42:43 PM UTC, Bonita Montero wrote: > > But your code seems to lose the context of the original task. > I only wanted to falsify Öö Tiib statement. It's very easy (and IMO silly) to falsify statements by ignoring the context, as you did. Example: A. "I prefer white wine to red wine." B. "Not me. I only ever drink red wine." C, after observing B taking ibuprofen pills with water: "B said 'I only ever drink red wine, but I've just seen B drinking water. B is a liar." Paul |
| Bonita Montero <Bonita.Montero@gmail.com>: Nov 24 04:53PM +0100 >> I only wanted to falsify Öö Tiib statement. > It's very easy (and IMO silly) to falsify statements by ignoring the context, as you did. I didn't respond to the OP but to Öö Tiib. And at this place my response was in the correct context. |
| Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 24 04:08PM On Sat, 2019-11-23, Paul wrote: >> mean? What does "stats" mean, for that matter? >> Since max(subseq) <= max(seq) and min(subseq) >= min(seq), it seems to >> me the answer is always "the full sequence" or "there isn't one". I see now that I misread a part of your problem definition that /was/ clear. I read "... longest subsequence which has range >= r" which led to the uninteresting problem. Sorry! I must have been half asleep, or something. > However, that might be confusing because there's a set-theoretic definition > of "range" and a statistical definition of "range" and these definitions > are very different. I mean the statistical definition of "range". Ah, ok. I was unaware of "stats" as an abbreviation of "statistics" (not that I remember how "range" is used in statistics anyway). It would have been easier to explain it as "range(seq) = max(seq) - min(seq)". Anyway: I know understand the problem and see why it's fun/interesting to solve. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
| "Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:26AM +0100 On 22.11.2019 22:46, Vir Campestris wrote: > who owns our coding standards :( > The argument is that given "x=4.2" you _know_ that x is a double, > there's no point in saying so. It requires /reading/ with analysis and comprehension as opposed to just looking at. And it fails for auto hm = 0x1'0000'0000; This type depends on the compiler. > look at the declaration. Which if I'm unlucky is one of several > overloads with template type deduction from the parameters. > But IMHO reading matters more. You'll only write it once. I prefer that the type is generally specified explicitly in a declaration, either on the left side or explicitly in the initializer. Exception: types that are effectively or formally unspecified, such as iterator type or the type of an iostream manipulator. Example, display the ASCII code: #include <ctype.h> // ::isprint #include <iomanip> // std::setw #include <iostream> #include <limits.h> // CHAR_MIN, UCHAR_MAX using Byte = unsigned char; auto is_noncontrol_ascii_char( const int code ) -> bool { return (code > CHAR_MIN and code < UCHAR_MAX and ::isprint( Byte( code ) ); } auto main() -> int { using std::cout, std::endl, std::left, std::setw; constexpr auto& hex_digits = "0123456789abcdef"; const auto field = setw( 4 ); cout << left; // Column headers. cout << field << ""; for( int x = 0; x < 16; ++x ) { cout << field << hex_digits[x]; } cout << endl; cout << endl; // Spacer line to make the header row stand out as such. // Main table with a row header column to the left. for( int y = 0; y < 8; ++y ) { cout << field << hex_digits[y]; for( int x = 0; x < 16; ++x ) { const int code = 16*y + x; const auto ch = char( is_noncontrol_ascii_char( code )? code: ascii::del ); cout << field << ch; } cout << endl; } } - Alf |
| "Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:28AM +0100 On 24.11.2019 00:26, Alf P. Steinbach wrote: > cout << endl; > } > } Oops. ">=", "<=". Oh well. - Alf |
| "Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 24 12:45AM +0100 On 24.11.2019 00:26, Alf P. Steinbach wrote: > code that wouldn't even compile Good idea to compile before posting, yes! So here's the example, with a few instances of `auto` declarations of things where either the type is irrelevant, or already explicit: #include <ctype.h> // ::isprint #include <iomanip> // std::setw #include <iostream> #include <limits.h> // CHAR_MIN, UCHAR_MAX using Byte = unsigned char; auto is_noncontrol_ascii_char( const int code ) -> bool { return (code >= CHAR_MIN and code <= UCHAR_MAX and ::isprint( Byte( code ) ) ); } auto main() -> int { using std::cout, std::endl, std::left, std::setw; constexpr auto& hex_digits = "0123456789abcdef"; const auto field = setw( 4 ); const auto ascii_del = char( 127 ); cout << left; // Column headers. cout << field << ""; for( int x = 0; x < 16; ++x ) { cout << field << hex_digits[x]; } cout << endl; cout << endl; // Spacer line to make the header row stand out as such. // Main table with a row header column to the left. for( int y = 0; y < 8; ++y ) { cout << field << hex_digits[y]; for( int x = 0; x < 16; ++x ) { const int code = 16*y + x; const auto ch = char( is_noncontrol_ascii_char( code )? code: ascii_del ); cout << field << ch; } cout << endl; } } - Alf |
| boltar@nowhere.co.uk: Nov 24 10:47AM On Sun, 24 Nov 2019 00:26:14 +0100 >auto main() > -> int And the point of writing that instead of int main() is what exactly? Other than showing off of course. |
| Melzzzzz <Melzzzzz@zzzzz.com>: Nov 24 01:37PM > And the point of writing that instead of > int main() > is what exactly? Other than showing off of course. He doesn't like C++ ;) -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
| Soviet_Mario <SovietMario@CCCP.MIR>: Nov 24 12:54AM +0100 On 23/11/2019 21:27, Keith Thompson wrote: > type names u_int{8,32}_t. It doesn't introduce u_int_{8,32}, and I > don't see that name in the code you quoted (other than in your union > definition) or anywhere under /usr/include on my system. yes I miss-spelled forgetting the final _t (but the pieces of headers pasted mentioned it) > meet your requirements, and there is no good reason to use non-standard > types with similar names. > Do you have some reason to prefer to use u_int_8 rather than uint8_t? no no reason at all. I just noticed this subtle difference after having rewritten always :) But It will take short to do a mass-replace and change the included header >> I guess the variations stands for the same concept > Yes. Names other than uint8_t are non-standard and will cause your code > not to compile in many environments. well I must have mis-expressed again when I said I was looking for a portable way, I actually meant I needed a way that allowed proper data exchange from front-end gambas side and C++ engine. I'll never deploy the program anywhere else. So as long as it compiles here, fine enough >>> by default.) >> it gives me warnings indeed, but seems to compile nevertheless > What warnings? that anonymous structs (or unions) are non standard and supported only with GCC later than some version. I'm using the "native" (= preinstalled) GCC here on MX (QT itself did not ship with any own compiler in the free version). I have installed the later build-essentials from "stable" repository. I'm not very willing to update from testing or backport. Maybe if I found some flatpak or appimage version, more isolated. > If you compile with "gcc -std=c11" or with a > sufficiently new version of gcc, there shouldn't be any. I'm looking to assess the GCC version here (on Debian its a bit older still). back here I have GCC 4:6.3.0-4 (and supporting packages seem 6.3.0-18) anche il CPP riporta la stessa signature. I found a 9.4-2 MX version and I'm adding it too. I don't know if QT ide will automatically detect it (and I'm sure I won't be able to configure it, otherwise) > Always pay attention to warnings. It was reading the "padding" warning that all this odessy began :) >> I thought : I give you an hint : TRY to pack. If you really >> cannot, there is nothing else I could do > There's no guarantee that all compilers will understand your "hint", I just use GCC, not others >> access to separate channels I use named members. > What operations do you need on AllChannels that you can't perforom on > the struct? You can assign structs. uhm ... memberwise or all-at-once ? I thought struct assignment (normally I wrote operator =) should pass through memberwise copying. So the aim was to copy everything in one assignment only instead of four of them. But if struct can be assigned smartly enough then the union is not needed. Also I duplicated pixel addressing. One slower way through operator () (int row, int column) and a massive one dimensional linear addressing for mass processing of all pixels. >> 1 byte sizes, and the only big member starts at 0 offset) > The drawback of #pragma pack is that it's non-standard. Given the > concern you've expressed about portability, sorry I realize I had been unwittingly misleading. I was just thinking of interoperate gambas code and library code. As long as it works on my two couple of machines that would be enough. I have Linux MX on both, BTW, among other things > I would think that would > be important to you. And given that it will almost certainly do > nothing, I don't know why you're so insistent on using it. oh, sorry at last for my often unclear English. I tend to sow misunderstandings everywhere due to this :\ -- 1) Resistere, resistere, resistere. 2) Se tutti pagano le tasse, le tasse le pagano tutti Soviet_Mario - (aka Gatto_Vizzato) |
| Keith Thompson <kst-u@mib.org>: Nov 23 05:06PM -0800 >> definition) or anywhere under /usr/include on my system. > yes I miss-spelled forgetting the final _t (but the pieces > of headers pasted mentioned it) To be very clear, if you need an unsigned integer type of at least 8 bits, there is almost no good reason not to use uint8_t. There is no good reason to use some system-specific (or even undocumented) equivalent. "Almost" means you might have to consider older implementations that don't support <stdint.h> / <cstdint>. Any version of g++ you're likely to encounter should support C++11 or later -- older versions might require an explicit option. Even older versions are likely to support <stdint.h> or <cstdint> as an extension. > and C++ engine. > I'll never deploy the program anywhere else. So as long as > it compiles here, fine enough If you use uint8_t it will compile. >> What warnings? > that anonymous structs (or unions) are non standard and > supported only with GCC later than some version. It's always better to copy-and-paste diagnostic messages rather than trying to summarize them. BTW, I made a serious mistake here. C supports anonymous structs as of C11. C++ does not support them at all. g++ does support them as an extension. With "g++ -std=c17 -pedantic", I get "warning: ISO C++ prohibits anonymous structs [-Wpedantic]". My apologies for that. If you're only using gcc, you can use anonymous structs in C++, but your code will be non-portable. [...] > instead of four of them. > But if struct can be assigned smartly enough then the union > is not needed. The semantics of struct assignment (assuming you haven't overloaded operator=) are equivalent to the semantics of copying memberwise, but in the case of a struct containing 4 uint8_t members, no sane compiler will do anything other than a 32-bit copy. That's assuming the struct has an alignment that permits a 32-bit copy. If copying the entire structure is the only operation you care about, keep it simple: just define the structure and use ordinary structure assignment. [...] -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
| Soviet_Mario <SovietMario@CCCP.MIR>: Nov 24 12:10PM +0100 On 24/11/2019 02:06, Keith Thompson wrote: >> yes I miss-spelled forgetting the final _t (but the pieces >> of headers pasted mentioned it) > To be very clear, if you need an unsigned integer type of at least 8 well strictly speaking I need a type of EXACTLY 8 bits, not more >> supported only with GCC later than some version. > It's always better to copy-and-paste diagnostic messages rather than > trying to summarize them. well, I did not copy the warning when the compiler produced it. I would have to re-edit the source (or restore the older version) to just replicate the output ... maybe messing sth :\ > of C11. C++ does not support them at all. g++ does support them as > an extension. With "g++ -std=c17 -pedantic", I get "warning: ISO C++ > prohibits anonymous structs [-Wpedantic]". My apologies for that. no problem !!! this is not an helpdesk, I get the good offered for free with good intention. That's more than enough ! > If you're only using gcc, you can use anonymous structs in C++, > but your code will be non-portable. yes only GCC. BTW : as I feared, QT (even after a system reboot ... actually power went off tonight, here it poured dogs and cats ! 160 mm of rain in 24 hours, a disaster) did not automatically detect the new version from MX repo. it just still see the original version. I'll see to this later If i find how to configure it (and locate it first) > but in the case of a struct containing 4 uint8_t members, no sane > compiler will do anything other than a 32-bit copy. That's assuming > the struct has an alignment that permits a 32-bit copy. I would think that the existence of such monolythic member should ensure the alignment. I should look the asm generated, but I can't be bothered :) :) -- 1) Resistere, resistere, resistere. 2) Se tutti pagano le tasse, le tasse le pagano tutti Soviet_Mario - (aka Gatto_Vizzato) |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment