- A different algorithm - 22 Updates
- Finding CPU utilization of NUMA nodes - 2 Updates
- collection of pointers vs collection of objects - 1 Update
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:14PM -0600 On 2/10/2017 1:43 AM, Christopher J. Pisz wrote: > Eurika! > I am implementing this now. I suspect it will greatly speed things up. Pushed algorithm 2 to github. It is significantly worse. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:21PM -0600 On 2/10/2017 6:17 AM, Paavo Helde wrote: SNIP > Generate a dictionary of 1e6 random words and test your solutions vs > Alf's solution and then it becomes immediately clear which solutions are > faster, no need for brilliant dream ideas. Alf is using a lot fancy syntax I am unfamiliar with. However, his algorithm looks the same. Beyond using the vector, I don't see any differences and the performance increase I am getting is not enough to match what they are claiming. > C++ job where one often needs to profile and optimize code. Plus, you > get some insight about what things are slow and what are fast, on your > current hardware at least. Yea, I been doing just that all day. Profiling alternative idea, different data structures, etc. Just haven't found anything near the performance they are claiming. I found this is a problem on leetcode.com under "Word Search II" I am looking at some of the top solutions there now too. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:24PM -0600 On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: > inline auto operator<<( ostream& stream, Position const& pos ) > -> ostream& > { return stream << "{" << pos.x << ", " << pos.y << "}"; } Can you explain what -> ostream& means? I don't think I've ever seen this syntax you are using with -> following returntype functionname(arg1, arg1) |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 11 09:15AM +0100 On 11.02.2017 02:24, Christopher J. Pisz wrote: > Can you explain what -> ostream& means? > I don't think I've ever seen this syntax you are using with -> following > returntype functionname(arg1, arg1) The `->` is called trailing return type syntax, and was introduced in C++11. The `auto` as pseudo return type at the front of the declaration, says that either a `->` trailing return type will follow, or, in C++14 and later, a return type specification is omitted and the type will be inferred from the function body (obviously the latter is only possible on a full function definition, not on a pure declaration). Trailing return type is the only syntax available for lambdas, so it's the only syntax that can be used uniformly for all function declarations. It has some direct advantages including that one can avoid qualification of the return type when it's a type defined in the function's class (just as with argument types); that it allows the return type to be specified in terms of the argument types; and that it generally makes it easier to visually parse complicated declarations. I adopted it early on as a single, uniform syntax, with no exceptions for new code, not even for `main`. Using it for `main` on e.g. Stack Overflow tends to generate questions about it. Which allows teaching. :) Still I prefer to use the old syntax for `void` routines in general, even though a `void` lambda can't be written that way. This corresponds to Pascal's differentiation of `function` versus `procedure`: whether a routine can meaningfully be used in an expression, contributing to the expression result. More that way: I wish C++ had support for pure functions, purely computational functions without side effects, beyond the very limited C++11 compile time `constexpr` function. Cheers!, - Alf |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 11:25AM On Fri, 10 Feb 2017 19:21:07 -0600 > Just haven't found anything near the performance they are claiming. > I found this is a problem on leetcode.com under "Word Search II" > I am looking at some of the top solutions there now too. I am quite surprised at this (assuming you have tested Alf's code) because on looking at his implementation it seems reasonable (and follows the recursive dictionary slicing approach I mentioned in another post of mine). I thought it likely that your interviewers were mainly interested in whether candidates understood how to build up results using recursion, but it seems not. The only thing that struck me about Alf's implementation is that the starts_with() function is in the recursion hot path and by calling std::string::substr() might allocate memory, although with small string optimization one would hope not, which using std::string::compare() shouldn't do. You could try substituting std::string::compare() in that function and see if that makes any difference (it probably won't). You would also need to omit the call to clog in the hot path (presumably inserted for debugging) to get proper timings of that implementation. Chris |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 11 12:53PM +0100 On 11.02.2017 12:25, Chris Vine wrote: > The only thing that struck me about Alf's implementation is that the > starts_with() function is in the recursion hot path and by calling > std::string::substr() might allocate memory I also compared strings for equality when known that one is leading substring of other, instead of just comparing lengths then. Even for a first working version that was, well, dumb. :) Cheers!, - Alf |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 12:52PM On Sat, 11 Feb 2017 12:53:01 +0100 > I also compared strings for equality when known that one is leading > substring of other, instead of just comparing lengths then. > Even for a first working version that was, well, dumb. :) I think your code was OK on the equality test. The dictionary lower bound has been reset on re-entering recursively_append_words_from(), and end_dict_range is always one past the end of the dictionary, so 'word' might no longer be a leading substring of *lower. (It occurs to me that since 'lower' could be at 'end_dict_range', this should probably be checked for before 'lower' is dereferenced on calling starts_with().) Chris |
"Öö Tiib" <ootiib@hot.ee>: Feb 11 05:39AM -0800 On Saturday, 11 February 2017 10:16:14 UTC+2, Alf P. Steinbach wrote: > > I don't think I've ever seen this syntax you are using with -> following > > returntype functionname(arg1, arg1) > The `->` is called trailing return type syntax, and was introduced in C++11. For to get used to that syntax one may write few apps in Rust and/or Swift. It is possible even to get paid for doing uneducated attempts of usage of Swift since there seems to be quite serious shortage of experienced iOS developers at the moment. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 02:10PM On Fri, 10 Feb 2017 19:14:26 -0600 > > up. > Pushed algorithm 2 to github. > It is significantly worse. I am not surprised. From what I could make out of your code, you are walking through the entire dictionary and testing iteratively whether each word in the dictionary is on the board. On a large dictionary that is about the worst possible implementation. Instead, on a large dictionary you need to step through the board and work from there. What are the respective timings, on the same input data, of: 1. Your first implementation. 2. Your second implementation 3. Alf's implementation? Chris |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 10:27AM -0600 On 2/11/2017 8:10 AM, Chris Vine wrote: > dictionary you need to step through the board and work from there. > What are the respective timings, on the same input data, of: > 1. Your first implementation. 412 ms > 2. Your second implementation Didn't bother. It is visibly slower, so why measure > 3. Alf's implementation? Working on trying to get it to conform to test, so no idea yet. I am using https://leetcode.com/problems/word-search-ii/?tab=Description to test now. It also has 1000s of submissions and is close enough to the original problem that the interviewers asked. Just adjusted for 4 directions vs 8 and no q->qu. The acceptable range is 40ms-60ms. If I look at the top performing answer, they used a preallocated trie and didn't even bother with any OO or separation of concerns, which I guess is a requirement when we start trying to squeeze out performance and drop all concern for reusability or maintenance. I'll post updated times when I have them. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 11:30AM -0600 On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: > - Alf I am trying to get this to confor to the test on https://leetcode.com/problems/word-search-ii/?tab=Description That way I can easily compare results of my various attempts, your, and others. I am having trouble with your indexing. you init things with +1 and then get index_from, and I am lost. The tester's format is a wee bit different then first discussed. )They only go verticle and horizontal )They don't care about q->qu )They give the input data in Solution::findWords and we are to fill out that method and class. I tried to get your solution into the tester's format below and I am getting vector out of range problems: #include <algorithm> #include <assert.h> #include <ctype.h> #include <fstream> #include <iostream> #include <vector> #include <set> #include <stddef.h> #include <stdexcept> #include <stdlib.h> #include <string> #include <utility> using Size = ptrdiff_t; using Index = Size; template< class Container > auto n_items_of(Container const& c) -> Size { return c.size(); } namespace cppx { using std::ostream; using std::vector; using std::runtime_error; using std::string; using Byte = unsigned char; inline auto hopefully(bool const e) -> bool { return e; } inline auto fail(string const& s) -> bool { throw runtime_error(s); } struct Position { Index x; Index y; }; inline auto operator<<(ostream& stream, Position const& pos) -> ostream& { return stream << "{" << pos.x << ", " << pos.y << "}"; } template< class Item > class Matrix { private: struct Wrapped_item { Item object; }; // Avoid problems with vector<bool>. Size row_length_; vector<Wrapped_item> items_; auto index_for(Index x, Index y) const -> Index { return y*row_length_ + x; } public: auto id_for_position(Index x, Index y) const -> Index { return index_for(x, y); } auto size() const -> Size { return row_length_; } auto item(Index const x, Index const y) const -> Item { return items_[index_for(x, y)].object; } auto item(Index const x, Index const y) -> Item& { return items_[index_for(x, y)].object; } Matrix() : Matrix{ 0 } {} explicit Matrix(Size const size) : row_length_(size) , items_(size*size) {} }; auto inline to_lowercase(char const ch) -> char { return tolower(static_cast<Byte>(ch)); } auto inline starts_with(string const& prefix, string const& s) -> bool { return s.substr(0, prefix.length()) == prefix; } } // namespace cppx namespace app { using namespace std; using namespace cppx; auto load_dictionary(string const& file_path) -> vector<string> { ifstream data{ file_path }; hopefully(!data.fail()) || fail("Failed to open dictionary `" + file_path + "`."); vector<string> result; string word; while( getline(data, word) ) { result.push_back(move(word)); } return result; } auto chars_from(string s) -> string { string chars{ s.begin(), remove(s.begin(), s.end(), ' ') }; transform(chars.begin(), chars.end(), chars.begin(), to_lowercase); return chars; } using Dictionary_iter = vector<string>::const_iterator; void recursively_append_words_from( Dictionary_iter const start_dict_range, Dictionary_iter const end_dict_range, Matrix<char> const& chars, Position const position, Matrix<bool>& visited, string& word, vector<string>& result ) { Size const original_word_length = word.length(); char const ch = chars.item(position.x, position.y); word += ch; auto const lower = lower_bound(start_dict_range, end_dict_range, word); clog << position << " -> " << *lower << "\n"; if( starts_with(word, *lower) ) { if( word == *lower ) { result.push_back(word); } visited.item(position.x, position.y) = true; for( int dx = -1; dx <= +1; ++dx ) { for( int dy = -1; dy <= +1; ++dy ) { if( position.x + dx < 0 || position.x + dx > chars.size() - 1 || position.y + dy < 0 || position.y + dy > chars.size() - 1 ) { continue; } Position const new_position = { position.x + dx, position.y + dy }; if( !visited.item(new_position.x, new_position.y) ) { recursively_append_words_from( lower, end_dict_range, chars, new_position, visited, word, result ); } } } visited.item(position.x, position.y) = false; } word.resize(original_word_length); } void append_words_from( vector<string> const& dictionary, Matrix<char> const& chars, Position const& start, vector<string>& result ) { string word; Size const n = chars.size(); Matrix<bool> visited{ n }; for( Index i = 0; i < n; ++i ) { visited.item(i, 0) = true; visited.item(0, i) = true; visited.item(i, n - 1) = true; visited.item(n - 1, i) = true; } recursively_append_words_from( dictionary.begin(), dictionary.end(), chars, start, visited, word, result ); } } // namespace app class Solution { public: Solution() {} std::vector<std::string> findWords(std::vector<std::vector<char>>& board, std::vector<std::string>& words) { // Sort the dictionary std::sort(words.begin(), words.end()); // Load the matrix cppx::Matrix<char> matrix(board.size()); for( Index y = 0; y < board.size(); ++y ) { for( Index x = 0; x < board[0].size(); ++x ) { matrix.item(x, y) = board[y][x]; } } // Solve Size const n = board.size(); std::vector<std::string> results; for( Index y = 0; y < n; ++y ) { for( Index x = 0; x < n; ++x ) { app::append_words_from(words, matrix, {x, y}, results); } } sort(results.begin(), results.end()); results.erase(unique(results.begin(), results.end()), results.end()); return results; } private: }; int main() { std::vector<std::vector<char>> board = { { 'o','a','a','n' },{ 'e','t','a','e', },{ 'i','h','k','r' },{ 'i','f','l','v' } }; std::vector<std::string> words = { "oath", "pea", "eat", "rain" }; Solution solution; std::vector<std::string> result = solution.findWords(board, words); return 0; } |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 11:49AM -0600 On 2/11/2017 11:30 AM, Christopher J. Pisz wrote: > On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: >> On 10.02.2017 01:15, Christopher J. Pisz wrote: I think he is also using a fence. If I put the indices back as normal, I still get a vector range assertion here: void recursively_append_words_from( Dictionary_iter const start_dict_range, Dictionary_iter const end_dict_range, Matrix<char> const& chars, Position const position, Matrix<bool>& visited, string& word, vector<string>& result ) { Size const original_word_length = word.length(); char const ch = chars.item(position.x, position.y); word += ch; auto const lower = lower_bound(start_dict_range, end_dict_range, word); clog << position << " -> " << *lower << "\n"; I think that is an actual bug. What if lower_bound returned end()? You cannot deref end? |
"Öö Tiib" <ootiib@hot.ee>: Feb 11 10:10AM -0800 On Saturday, 11 February 2017 18:27:33 UTC+2, Christopher J. Pisz wrote: > and didn't even bother with any OO or separation of concerns, which I > guess is a requirement when we start trying to squeeze out performance > and drop all concern for reusability or maintenance. No way. OO and separation of concerns usually improves performance. However there is a "but". In task under hand we have only two objects. We have "boggle field" B and "dictionary" D. We have only one concern that involves both objects. What texts from D are present on B? So what deep OO you anticipate here? Here can be done unneeded overengineering. That usually indeed degrades performance in addition to making code more fragile and harder to understand. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 12:20PM -0600 On 2/11/2017 12:10 PM, Öö Tiib wrote: > Here can be done unneeded overengineering. That usually indeed > degrades performance in addition to making code more fragile and > harder to understand. I see a objects: a board a dictionary a solver and in the real world possibly: a board::path a coordinate Now on the website mentioned, if we look at the best solutions, they did indeed use a trei for their dictionary, and had to create it. Rather than made a class Trei, they just implemented it as part of the solver. More than that, but a lot of them implemented it for the specific problem at hand, rather than something that was reusable at all. Consider this top ranked C++ solution and notice the board, the solver, the dictionary, the trei, are all built into the one class and only designed to solve the very specific problem: class Solution { vector<string> results; int x_dim, y_dim; struct TrieNode { TrieNode* exits[26]; bool is_end; }; void recurse( vector<vector<char>>& board, string& s, int x, int y, TrieNode* trie_curr ) { if ( unsigned(x) >= x_dim || unsigned(y) >= y_dim ) return; const auto c = board[y][x]; if ( !c ) return; auto trie_next = trie_curr->exits[ c-'a' ]; if ( !trie_next ) return; s.push_back( c ); board[y][x] = 0; if ( trie_next->is_end ) { results.push_back( s ); trie_next->is_end = false; } recurse( board, s, x-1, y, trie_next ); recurse( board, s, x+1, y, trie_next ); recurse( board, s, x, y-1, trie_next ); recurse( board, s, x, y+1, trie_next ); board[y][x] = c; s.pop_back(); } public: vector<string> findWords(vector<vector<char>>& board, vector<string>& words) { if ( !words.size() || !board.size() || !board[0].size() ) return results; int tot_chars = 0; for ( const auto& word : words ) tot_chars += word.length(); TrieNode* trie_pool = (TrieNode*)calloc( sizeof(TrieNode), tot_chars + 2 ); int num_trie_nodes = 0; TrieNode* trie_root = &trie_pool[ num_trie_nodes++ ]; for ( const auto& word : words ) { TrieNode* trie_curr = trie_root; for ( const auto c : word ) { auto& trie_next = trie_curr->exits[ c-'a' ]; if (!trie_next) trie_next = &trie_pool[ num_trie_nodes++ ]; trie_curr = trie_next; } trie_curr->is_end = true; } x_dim = board[0].size(); y_dim = board.size(); string s; for ( auto y = 0 ; y < y_dim ; ++y ) for ( auto x = 0; x < x_dim; ++x ) recurse( board, s, x, y, trie_root ); free( trie_pool ); // from C to shining C. return results; } }; |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 01:24PM -0600 On 2/11/2017 11:49 AM, Christopher J. Pisz wrote: > On 2/11/2017 11:30 AM, Christopher J. Pisz wrote: >> On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: >>> On 10.02.2017 01:15, Christopher J. Pisz wrote: SNIP I got Alf's solution to run on the test site. If I changed everything correctly, he has scored 66ms. Mine ran in 412 ms. The acceptable range is 40-60ms. So yea, My RadixDictionary is craptastic. So, now I am going to go back and redo my code without the trei, without set, and just vectors, and use STL sort and lower_bound, when I need to, then see if that mathches up with Alf's score. After that I will try to reduce it further with a trei implementation that allocates up front. Here is the code I ran: #include <algorithm> #include <assert.h> #include <ctype.h> #include <fstream> #include <iostream> #include <vector> #include <set> #include <stddef.h> #include <stdexcept> #include <stdlib.h> #include <string> #include <utility> /* * Solution from Alf Steinbach on comp.lang.C++ for me to compare against */ using Size = ptrdiff_t; using Index = Size; template< class Container > auto n_items_of(Container const& c) -> Size { return c.size(); } namespace cppx { using std::ostream; using std::vector; using std::runtime_error; using std::string; using Byte = unsigned char; inline auto hopefully(bool const e) -> bool { return e; } inline auto fail(string const& s) -> bool { throw runtime_error(s); } struct Position { Index x; Index y; }; /* inline auto operator<<(ostream& stream, Position const& pos) -> ostream& { return stream << "{" << pos.x << ", " << pos.y << "}"; } */ template< class Item > class Matrix { private: struct Wrapped_item { Item object; }; // Avoid problems with vector<bool>. Size num_rows_; Size num_cols_; vector<Wrapped_item> items_; auto index_for(Index x, Index y) const -> Index { return y*num_cols_ + x; } public: auto id_for_position(Index x, Index y) const -> Index { return index_for(x, y); } auto num_rows() const -> Size { return num_rows_; } auto num_cols() const -> Size { return num_cols_; } auto item(Index const x, Index const y) const -> Item { return items_[index_for(x, y)].object; } auto item(Index const x, Index const y) -> Item& { return items_[index_for(x, y)].object; } Matrix() : Matrix{ 0 } {} explicit Matrix(Size const num_rows, Size const num_cols) : num_rows_(num_rows) , num_cols_(num_cols) , items_(num_rows * num_cols) {} }; auto inline to_lowercase(char const ch) -> char { return tolower(static_cast<Byte>(ch)); } auto inline starts_with(string const& prefix, string const& s) -> bool { return s.substr(0, prefix.length()) == prefix; } } // namespace cppx namespace app { using namespace std; using namespace cppx; auto chars_from(string s) -> string { string chars{ s.begin(), remove(s.begin(), s.end(), ' ') }; transform(chars.begin(), chars.end(), chars.begin(), to_lowercase); return chars; } using Dictionary_iter = vector<string>::const_iterator; void recursively_append_words_from( Dictionary_iter const start_dict_range, Dictionary_iter const end_dict_range, Matrix<char> const& chars, Position const position, Matrix<bool>& visited, string& word, vector<string>& result ) { Size const original_word_length = word.length(); char const ch = chars.item(position.x, position.y); word += ch; auto const lower = lower_bound(start_dict_range, end_dict_range, word); if( lower == end_dict_range ) { word.resize(original_word_length); return; } /* clog << position << " -> " << *lower << "\n"; */ if( starts_with(word, *lower) ) { if( word == *lower ) { result.push_back(word); } visited.item(position.x, position.y) = true; const int offsetsX[] = { 0, 1, 0, -1}; const int offsetsY[] = {-1, 0, 1, 0}; for( int offsetIndex = 0; offsetIndex < 4; ++offsetIndex ) { Position const new_position = { position.x + offsetsX[offsetIndex], position.y + offsetsY[offsetIndex] }; if( !visited.item(new_position.x, new_position.y) ) { recursively_append_words_from( lower, end_dict_range, chars, new_position, visited, word, result ); } } visited.item(position.x, position.y) = false; } word.resize(original_word_length); } void append_words_from( vector<string> const& dictionary, Matrix<char> const& chars, Position const& start, vector<string>& result ) { string word; Size const num_rows = chars.num_rows(); Size const num_cols = chars.num_cols(); Matrix<bool> visited{ num_rows, num_cols }; // Flag the fence visited for( Index index = 0; index < num_cols; ++index ) { visited.item(index, 0) = true; visited.item(index, num_rows - 1) = true; } for( Index index = 0; index < num_rows; ++index ) { visited.item(0 , index) = true; visited.item(num_cols - 1, index) = true; } recursively_append_words_from( dictionary.begin(), dictionary.end(), chars, start, visited, word, result ); } } // namespace app class Solution { public: Solution() {} std::vector<std::string> findWords(std::vector<std::vector<char>>& board, std::vector<std::string>& words) { // Sort the dictionary std::sort(words.begin(), words.end()); // Load the matrix with a fence cppx::Matrix<char> matrix(board.size() + 2, board[0].size() + 2); for( Index y = 0; y < board.size(); ++y ) { for( Index x = 0; x < board[0].size(); ++x ) { matrix.item(x + 1, y + 1) = board[y][x]; } } // Solve Size const num_rows = board.size(); Size const num_cols = board[0].size(); std::vector<std::string> results; for( Index y = 0; y < num_rows; ++y ) { for( Index x = 0; x < num_cols; ++x ) { app::append_words_from(words, matrix, {x + 1, y + 1}, results); } } sort(results.begin(), results.end()); results.erase(unique(results.begin(), results.end()), results.end()); return results; } private: }; |
"Öö Tiib" <ootiib@hot.ee>: Feb 11 11:36AM -0800 On Saturday, 11 February 2017 20:20:34 UTC+2, Christopher J. Pisz wrote: > a board > a dictionary > a solver Yes, I think that those "something doers" (like that "solver") are not objects but functions or methods. So if there is a board whose concern is to know what board is and board_solver whose concern is to solve that board then it is "knower and doer" overengineering antipattern for me. > and in the real world possibly: > a board::path > a coordinate These do not seem to help during task and are not requested by task. What I would imagine that make sense to put forward might be "trie" (as just generic kind of dictionary) and "solution". > indeed use a trei for their dictionary, and had to create it. > Rather than made a class Trei, they just implemented it as part of the > solver. Yes, but that design does not give any performance advantages nor disadvantages. What gives performance advantages is limiting that trie to 26 letters of English alphabet and things like that. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 10:21PM +0200 On 11.02.2017 21:24, Christopher J. Pisz wrote: > and redo my code without the trei, without set, and just vectors, and > use STL sort and lower_bound, when I need to, then see if that mathches > up with Alf's score. Good! Meanwhile, Alf has posted some enhancement suggestions for his code elsethread, like replacing substr() with compare() and replacing word == *lower check with string size comparison. One thing that it is not clear to me - is the initial sorting of the dictionary included in the timings? > After that I will try to reduce it further with a > trei implementation that allocates up front. I suspect this might make things worse, but YMMV. Cheers Paavo |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 08:25PM On Sat, 11 Feb 2017 13:24:14 -0600 > if that mathches up with Alf's score. After that I will try to reduce > it further with a trei implementation that allocates up front. > Here is the code I ran: OK, can you deal with the two outstanding points I raised, namely: 1. Try using std::string::compare() instead of std::string::substr() in starts_with(). 2. You need to test for lower == end_dict_range in recursively_append_words_from() before calling starts_with(), as it could be equal to the end of the dictionary. (You also spotted this, but haven't dealt with it.) You have dealt with the clog point. Chris |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 02:28PM -0600 On 2/11/2017 2:21 PM, Paavo Helde wrote: > Good! Meanwhile, Alf has posted some enhancement suggestions for his > code elsethread, like replacing substr() with compare() and replacing > word == *lower check with string size comparison. Oh, yea I forgot to throw those in too. > One thing that it is not clear to me - is the initial sorting of the > dictionary included in the timings? Yes, the initialization is part of the problem. The site I am now testing on actually gives their data unsorted, whereas the in the interview it was sorted. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 08:38PM On Sat, 11 Feb 2017 13:24:14 -0600 > vectors, and use STL sort and lower_bound, when I need to, then see > if that mathches up with Alf's score. After that I will try to reduce > it further with a trei implementation that allocates up front. Posting separately (as a programmer I don't want to mix concerns!), I hope not to sound like a cracked record but as I have said the thing you must attend to in your own implementation is the dictionary slicing, for which a recursive implementation is the most natural fit, or at any rate certainly the easiest. If you can code that up with treis, excellent, if not it won't do the business. Chris |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 11:06PM +0200 On 11.02.2017 22:28, Christopher J. Pisz wrote: >> One thing that it is not clear to me - is the initial sorting of the >> dictionary included in the timings? > Yes, the initialization is part of the problem. OK, then I see why the winning trie solution is the best, it probably initializes its dictionary faster than std::sort() of a vector. > The site I am now testing on actually gives their data unsorted, whereas > the in the interview it was sorted. If the initial data is already a sorted random access array, then I suspect the std::lower_bound() based solution should have an edge. Cheers Paavo |
"Öö Tiib" <ootiib@hot.ee>: Feb 11 01:29PM -0800 On Saturday, 11 February 2017 23:07:02 UTC+2, Paavo Helde wrote: > If the initial data is already a sorted random access array, then I > suspect the std::lower_bound() based solution should have an edge. Trie is doomed to win any lower_bound in anything because finding all words that start with F from trie is O(1) and after that finding all the words that start with FR is again O(1). |
"Chris M. Thomasson" <invalid@invalid.invalid>: Feb 10 09:18PM -0800 On 2/9/2017 1:43 PM, Paavo Helde wrote: > Yes I am attempting to bind a parallel thread pool in my program to a > single NUMA node (because profiling shows there is no point to let it > spread across nodes, it would be just wasting computer resources). Indeed. Sharing across NUMA nodes should be avoided like the damn plague! > What I need is to figure out which NUMA node to bind to. That can be a rather nasty problem; what is your optimal criteria for choosing? Think one step further and bind worker threads from the pool to the local processors on the NUMA node. Well, should the OS handle that set of details? Okay, iirc this contrived kind of thing helped my code figure out a mapping for a system, this is from a while ago (decade+) and the idea went something like: take the number of NUMA nodes, and create a thread pool for each one. Then make all of the pools just lock their individual mutexs, increment a local numa counter, and unlock, no mutex contention between the nodes. They do this in a loop until a signal is hit, from a single controlling thread, that tells them to stop. The end resulting NUMA counters with the highest numbers are more active, and better "performing" in the system. But, keep in mind this data can be totally arbitrary, or can be out-of-date rather quickly, depending on load in the system. I have no idea how this system is going to effect the counters. Quite honestly, and for some reason, your question kind of reminds me of: https://groups.google.com/d/topic/comp.lang.c++/7u_rLgQe86k/discussion ;^) |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 07:54PM +0200 On 11.02.2017 7:18, Chris M. Thomasson wrote: >> What I need is to figure out which NUMA node to bind to. > That can be a rather nasty problem; what is your optimal criteria for > choosing? The optimal criteria is that there should be no NUMA node sitting idle while some other NUMA node is overloaded. But it looks like this is not so easy to achieve in general, as the jobs I am binding can be in different processes and their number and duration are not known in advance. > Think one step further and bind worker threads from the pool > to the local processors on the NUMA node. Well, should the OS handle > that set of details? I believe OS would handle thread scheduling in a single NUMA node better than me. Actually I have tried to bind threads to single PU-s on non-NUMA systems and the performance stayed the same or went worse, depending on the OS/hardware. |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 11:35PM >before summing, with similar results (the vector pointer >is slower, but only by a factor between something like >2 and 8). The following program now prints runtime = 92,345.917 runtime = 2'690,778.190 . The first runtime pertains to the vector containing employees, the second to the vector containing pointers to employees. So pointers are much slower! (I am /not/ serious here, because there is a major bias: I modified the algorithm until I finally got this result!) /* Warning: When z in main is large, some operating systems might swap a lot and might not respond anymore. */ #include <algorithm> #include <chrono> #include <cstddef> #include <initializer_list> #include <iostream> #include <ostream> #include <random> #include <stdexcept> #include <vector> static void escape( void * p ) /* by Chandler Carruth */ { asm volatile( "" : : "g"(p) : "memory" ); } template< typename T > ::std::string format( T value ) { double privacy_bias = 1.0; double privacy_noise = 0.00; ::std::string s; int i = 0; int n = 0; long double v = value; v = v * privacy_bias; v = v *( 1.0 + privacy_noise * ( rand() /( 1.0 + RAND_MAX )- 0.5 )); value = v; if( value ) while( value ) { if( i &&( i % 3 == 0 )) { s = " .,';:-=%#$"[ i/3 ] + s; ++n; } /* above there is a bug: i/3 might be larger than the string */ ++i; ++n; s = "0123456789"[ value % 10 ] + s; value /= 10; } else s = "0"; while( n++ < 20 )s = " " + s; return s; } struct timer { using timepoint = ::std::chrono::high_resolution_clock::time_point; ::timer::timepoint t1; ::timer::timepoint t2; decltype( ::timer::t2 - ::timer::t1 )sum{}; timer(){ this->reset(); } void reset() { this->t1 = ::std::chrono::high_resolution_clock::now(); this->t2 = t1; sum = t2 - t1; } void print() { this->t2 = ::std::chrono::high_resolution_clock::now(); this->sum += this->t2 - this->t1; ::std::cout << "runtime = " << format(( sum ).count()) << '\n'; }}; auto const seed { ::std::chrono::system_clock::now().time_since_epoch().count() }; auto engine { ::std::default_random_engine( seed )}; /* application */ struct emp { double salary = rand(); /* char data[ 200 ] = { 0, }; */ }; int main() { size_t z = 80'000'000; /* number of iterations */ ::std::vector<emp> vector_of_employees( z ); ::std::vector<emp*> vector_of_pointers( z ); for( size_t i = 0; i < z; ++i ) { /* malloc used for size manipulations (not shown here) */ vector_of_pointers[ i ]=( emp * )malloc( sizeof( emp )); new( vector_of_pointers[ i ]) emp{}; } shuffle( begin( vector_of_employees ), end( vector_of_employees ), engine ); shuffle( begin( vector_of_pointers ), end( vector_of_pointers ), engine ); { ::timer t; double s = 0; for( size_t i = 0; i < z; ++i )s += vector_of_employees[ i ].salary; escape( &s ); t.print(); } { ::timer t; double s = 0; for( size_t i = 0; i < z; ++i )s += vector_of_pointers[ i ]->salary; escape( &s ); t.print(); }} |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment