- C++ threshold for "stupid" sorting algorithms (O(n^2)) - 11 Updates
- Can any library be reduced to a header file? - 4 Updates
- Don't be fooled by cpp.sh - 3 Updates
- Wading through template instantiation errors for allocator - 1 Update
- uint32_t is not the same as long unsigned int ? - 1 Update
- std::bless - 2 Updates
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 05:39AM +0100 > all your test sizes for some data sets -- specifically when the data are > already sorted. The factor is now smaller (no longer always greater > than 2) but still between 1.5 and 2. The discussion wasn't about presorted datasets and that's not usual. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 05:50AM +0100 > I've found the stable_sort in the header algorithm. > I did not fully understand its internals ... stable_sort is usually mergesort because mergesort is usually the fastest stable sort algorithm. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 06:55AM +0100 > I've found the stable_sort in the header algorithm. > I did not fully understand its internals ... As I said stable_sort is usually mergesort. I developed a parallel mergesort some month ago. I stripped it to single-threaded and here's the code: #include <iostream> #include <iterator> #include <vector> #include <random> #include <functional> #include <memory> #include <cassert> template<typename It, typename Cmp = std::less<typename std::iterator_traits<It>::value_type>, typename Allocator = std::allocator<typename std::iterator_traits<It>::value_type>> class merge_sort { public: merge_sort( It start, It end, Cmp const &cmp = Cmp(), Allocator const &alloc = Allocator() ); private: Cmp m_cmp; template<typename UpIt, typename BufIt> void recursion( UpIt start, UpIt end, BufIt buf ); template<typename UpIt, typename BufIt> void merge( UpIt start, BufIt leftBuf, BufIt rightBuf, BufIt bufEnd ); }; template<typename It, typename Cmp, typename Allocator> merge_sort<It, Cmp, Allocator>::merge_sort( It start, It end, Cmp const &cmp, Allocator const &alloc ) : m_cmp( cmp ) { using namespace std; using T = typename iterator_traits<It>::value_type; size_t bs = 0; // calculate buffer-/stack-size for( size_t split = end - start; split > 1; bs += split, split -= split / 2 ); vector<T, Allocator> buf( bs, T(), alloc );; recursion( start, end, buf.begin() ); } template<typename It, typename Cmp, typename Allocator> template<typename UpIt, typename BufIt> void merge_sort<It, Cmp, Allocator>::recursion( UpIt start, UpIt end, BufIt buf ) { using namespace std; size_t n = end - start; if( n < 2 ) return; copy( start, end, buf ); size_t right = n / 2, left = n - right; BufIt leftBuf = buf, leftBufEnd = buf + left, rightBuf = leftBufEnd, bufEnd = buf + n; recursion( leftBuf, leftBufEnd, bufEnd ); recursion( rightBuf, bufEnd, bufEnd ); merge( start, leftBuf, rightBuf, bufEnd ); } template<typename It, typename Cmp, typename Allocator> template<typename UpIt, typename BufIt> void merge_sort<It, Cmp, Allocator>::merge( UpIt start, BufIt leftBuf, BufIt rightBuf, BufIt bufEnd ) { assert(leftBuf < rightBuf && rightBuf < bufEnd); BufIt leftBufEnd = rightBuf; for( UpIt wrtBack = start; ; ) if( m_cmp( *leftBuf, *rightBuf ) ) { *wrtBack++ = *leftBuf; if( ++leftBuf == leftBufEnd ) { // faster for small number of elements than std::copy do *wrtBack++ = *rightBuf; while( ++rightBuf != bufEnd ); return; } } else { *wrtBack++ = *rightBuf; if( ++rightBuf == bufEnd ) { do *wrtBack++ = *leftBuf; while( ++leftBuf != leftBufEnd ); return; } } } int main() { using namespace std; size_t const SIZE = 1'000'000; vector<int> vSort( SIZE, 0 ); mt19937_64 rg; uniform_int_distribution<int> uid( numeric_limits<int>::min(), numeric_limits<int>::max() ); for( int &v : vSort ) v = uid( rg ); merge_sort( vSort.begin(), vSort.end() ); } Maybe you would be able to understand what's going on. |
"Öö Tiib" <ootiib@hot.ee>: Jan 05 11:54PM -0800 On Monday, 6 January 2020 06:40:02 UTC+2, Bonita Montero wrote: > > already sorted. The factor is now smaller (no longer always greater > > than 2) but still between 1.5 and 2. > The discussion wasn't about presorted datasets and that's not usual. Already sorted or close to sorted data tends to be more likely than other permutations. But main issue is why you constantly write such blatant falsehoods? Ben specifically wrote "with sorted data": Bonita Montero wrote: >> sort as faster (by a factor of at least 2) for all sizes tested (you >> go up to 20). ... > That's not true. You're a liar. You are clearly liar yourself, Ben did not lie. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 08:59AM +0100 >> The discussion wasn't about presorted datasets and that's not usual. > Already sorted or close to sorted data tends to be more likely than > other permutations. Give me a URL to a paper that proves this. |
"Öö Tiib" <ootiib@hot.ee>: Jan 06 01:02AM -0800 On Monday, 6 January 2020 09:59:18 UTC+2, Bonita Montero wrote: > > Already sorted or close to sorted data tends to be more likely than > > other permutations. > Give me a URL to a paper that proves this. Miserable attempt of hiding being caught being liar noted. It is just quite common with real world data that it was entered in sorted order or that some other part of system already sorted it. <https://en.wikipedia.org/wiki/Sorting_algorithm#Efficient_sorts> "Second, the algorithms often perform poorly on already sorted data or almost sorted data – these are common in real-world data, and can be sorted in O(n) time by appropriate algorithms." |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Jan 06 09:34AM On Mon, 2020-01-06, Öö Tiib wrote: >> > Already sorted or close to sorted data tends to be more likely than >> > other permutations. >> Give me a URL to a paper that proves this. ... > It is just quite common with real world data that it was entered in sorted order > or that some other part of system already sorted it. > <https://en.wikipedia.org/wiki/Sorting_algorithm#Efficient_sorts> It's rather obvious if you think about it. If a sequence of records doesn't appear at random, it's probably going to be sorted according to the record's primary key, because it makes sense for the data source to keep the records in some order. This has other consequences, too. A system I once worked with accepted tens of thousands of network connections/sessions, and stored them as a std::map. This worked fine if clients connected one by one, but if the next gateway (there was such a thing) restarted, it reestablished the sessions in key order, and performance suffered because the std::map got rebalanced repeatedly. We had to add a workaround for that (I forget what the workaround was). /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 10:36AM +0100 > "Second, the algorithms often perform poorly on already sorted data or almost > sorted data – these are common in real-world data, and can be sorted in O(n) > time by appropriate algorithms." That's not a proof on the heuristics of chosen sorting algorithms. |
"Öö Tiib" <ootiib@hot.ee>: Jan 06 03:33AM -0800 On Monday, 6 January 2020 11:36:12 UTC+2, Bonita Montero wrote: > > sorted data – these are common in real-world data, and can be sorted in O(n) > > time by appropriate algorithms." > That's not a proof on the heuristics of chosen sorting algorithms. So what? As system designer you should have good idea what kind of data from what kinds of sources it processes. So asking for formal statistical proofs indicates cluenesness. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 12:59PM +0100 > So what? As system designer you should have good idea what kind of > data from what kinds of sources it processes. So asking for formal > statistical proofs indicates cluenesness. If you imagine for what purposes the one or the other algoritm might be suitable you don't find out how often the one or the other is actually needed in real projects. So you're clueness here as you suggest it is possible to find out this the way you describe. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 06 10:54PM >> already sorted. The factor is now smaller (no longer always greater >> than 2) but still between 1.5 and 2. > The discussion wasn't about presorted datasets and that's not usual. You made the discussion about (a) a sweeping statement of yours that was untrue, and (b) your calling me a liar for make a true statement. -- Ben. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 06 07:09AM -0800 Let's say my library is one source file and one header file. It contains one global variable with external linkage, and one function with external linkage, as follows: // Contents of gotham.hpp extern int Func(int); extern int g_index; // Contents of gotham.cpp int g_index = 2; int Func(int const i) { return i / 2; } I can combine this library into one header file like this: // Contents of gotham.hpp inline int &Func_Containing_gIndex(void) { static int s_index = 2; return g_index; } static int &g_index = Func_Containing_gIndex(); inline int Func(int const i) { return i / 2; } To what extent can any library be reduced to one header file? One thing I see so far: This won't work if either the value of "g_index" or the address of "g_index" is required as a constexpr. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 06 07:47AM -0800 On Monday, January 6, 2020 at 3:09:45 PM UTC, Frederick Gotham wrote: > static int s_index = 2; > return g_index; > } That should be: return s_index; |
"Öö Tiib" <ootiib@hot.ee>: Jan 06 08:26AM -0800 On Monday, 6 January 2020 17:09:45 UTC+2, Frederick Gotham wrote: > To what extent can any library be reduced to one header file? Fully. As of C++17 "The inline specifier can be applied to variables as well as to functions." And inline only means that it may be defined in header without ODR (one definition rule) violations. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jan 06 08:13PM +0100 On 06.01.2020 16:09, Frederick Gotham wrote: > static int &g_index = Func_Containing_gIndex(); > inline int Func(int const i) { return i / 2; } > To what extent can any library be reduced to one header file? To the extent that its implementation doesn't have to use dirty headers (e.g. `<windows.h>` comes to mind) that one doesn't want to expose, that one needs the possibility to distribute updates to object code without client code having to be recompiled, to the extent that company or project rules permit header only modules, and to the extent that one doesn't need to keep the source unavailable in order to protect of intellectual property rights. I grabbed some of that wording from <url: https://accu.org/index.php/journals/482>, an old article that can be interesting now just because some of the issues discussed in earnest then are now so irrelevant, i.e. how we have progressed. Anyway, in some cases one can work around the header dependency issue. E.g. for the mentioned header, with 64-bit programming it's practically doable, but inconvenient, to just declare in ones own headers the API functions that one uses. Since there is a single common machine code level calling convention, and smarter linkers now than 20 years ago, there is little or no chance of one's tools fouling it up, so I've done it a number of times -- but only in exploratory, experimental code. > One thing I see so far: This won't work if either the value of > "g_index" or the address of "g_index" is required as a constexpr. But then neither would the original. - Alf |
Robert Wessel <robertwessel2@yahoo.com>: Jan 05 09:07PM -0600 On Sun, 5 Jan 2020 20:11:17 +0100, David Brown >less used, and any attempts will be very biased based on the area of >experience of the guesser. With that proviso, I'd be surprised if >complex types really were used more than VLAs. I'm not going to take much of a position on the usage of the native C complex type, given the slow pickup of C99, it really never got a huge amount of traction, but it is there. On the other hand, plenty of folks have done complex arithmetic in C my rolling their own functions. >correct that you need to be careful about stack use before using them, >you don't need more care than you should be taking for any other kind of >non-variable local array, or for any other kind of memory allocation. For very small systems, especially ones where recursive calls are also difficult to support, VLAs may well present some challenges. In the more ordinary case, I think VLAs doe present an additional challenge to stack management, since the frame size required by a function is now at the mercy of its callers. That at least theoretically creates some new paths for storage overlays and the like. I'm not sure it's as big a deal as some people made it out to be, but there are a few new pitfalls there. >going to make a compiler that is C11 compliant but not C99 compliant? >Who is going to be capable of making a C11 compiler but find supporting >VLA's or complex numbers a major issue? My point exactly. I don't even understand why complex was optional in C99 for freestanding implementations. |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jan 05 08:40PM -0800 > On Sun, 5 Jan 2020 20:11:17 +0100, David Brown > <david.brown@hesbynett.no> wrote: [...] >>VLA's or complex numbers a major issue? > My point exactly. I don't even understand why complex was optional in > C99 for freestanding implementations. Freestanding implementations don't have to provide library functions, and complex numbers aren't very useful without the functions declared in <complex.h> such as creal() and cimag(). -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com [Note updated email address] Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Jan 06 06:39PM On Sun, 2020-01-05, Richard Damon wrote: > On 12/31/19 12:46 PM, Bonita Montero wrote: ... > #if 0 >
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment