soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

A different algorithm - 22 Updates
Finding CPU utilization of NUMA nodes - 2 Updates
collection of pointers vs collection of objects - 1 Update

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:14PM -0600

On 2/10/2017 1:43 AM, Christopher J. Pisz wrote:

> Eurika!
> I am implementing this now. I suspect it will greatly speed things up.

Pushed algorithm 2 to github.
It is significantly worse.

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:21PM -0600

On 2/10/2017 6:17 AM, Paavo Helde wrote:
SNIP

> Generate a dictionary of 1e6 random words and test your solutions vs
> Alf's solution and then it becomes immediately clear which solutions are
> faster, no need for brilliant dream ideas.

Alf is using a lot fancy syntax I am unfamiliar with. However, his
algorithm looks the same. Beyond using the vector, I don't see any
differences and the performance increase I am getting is not enough to
match what they are claiming.

> C++ job where one often needs to profile and optimize code. Plus, you
> get some insight about what things are slow and what are fast, on your
> current hardware at least.

Yea, I been doing just that all day.
Profiling alternative idea, different data structures, etc.

Just haven't found anything near the performance they are claiming.

I found this is a problem on leetcode.com under "Word Search II"
I am looking at some of the top solutions there now too.

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 07:24PM -0600

On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:

> inline auto operator<<( ostream& stream, Position const& pos )
> -> ostream&
> { return stream << "{" << pos.x << ", " << pos.y << "}"; }

Can you explain what -> ostream& means?

I don't think I've ever seen this syntax you are using with -> following
returntype functionname(arg1, arg1)

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 11 09:15AM +0100

On 11.02.2017 02:24, Christopher J. Pisz wrote:

> Can you explain what -> ostream& means?

> I don't think I've ever seen this syntax you are using with -> following
> returntype functionname(arg1, arg1)

The `->` is called trailing return type syntax, and was introduced in C++11.

The `auto` as pseudo return type at the front of the declaration, says
that either a `->` trailing return type will follow, or, in C++14 and
later, a return type specification is omitted and the type will be
inferred from the function body (obviously the latter is only possible
on a full function definition, not on a pure declaration).

Trailing return type is the only syntax available for lambdas, so it's
the only syntax that can be used uniformly for all function
declarations. It has some direct advantages including that one can avoid
qualification of the return type when it's a type defined in the
function's class (just as with argument types); that it allows the
return type to be specified in terms of the argument types; and that it
generally makes it easier to visually parse complicated declarations.

I adopted it early on as a single, uniform syntax, with no exceptions
for new code, not even for `main`. Using it for `main` on e.g. Stack
Overflow tends to generate questions about it. Which allows teaching. :)

Still I prefer to use the old syntax for `void` routines in general,
even though a `void` lambda can't be written that way. This corresponds
to Pascal's differentiation of `function` versus `procedure`: whether a
routine can meaningfully be used in an expression, contributing to the
expression result. More that way: I wish C++ had support for pure
functions, purely computational functions without side effects, beyond
the very limited C++11 compile time `constexpr` function.

Cheers!,

- Alf

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 11:25AM

On Fri, 10 Feb 2017 19:21:07 -0600

> Just haven't found anything near the performance they are claiming.

> I found this is a problem on leetcode.com under "Word Search II"
> I am looking at some of the top solutions there now too.

I am quite surprised at this (assuming you have tested Alf's code)
because on looking at his implementation it seems reasonable (and
follows the recursive dictionary slicing approach I mentioned in
another post of mine). I thought it likely that your interviewers were
mainly interested in whether candidates understood how to build up
results using recursion, but it seems not.

The only thing that struck me about Alf's implementation is that the
starts_with() function is in the recursion hot path and by calling
std::string::substr() might allocate memory, although with small string
optimization one would hope not, which using std::string::compare()
shouldn't do. You could try substituting std::string::compare() in that
function and see if that makes any difference (it probably won't). You
would also need to omit the call to clog in the hot path (presumably
inserted for debugging) to get proper timings of that implementation.

Chris

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 11 12:53PM +0100

On 11.02.2017 12:25, Chris Vine wrote:

> The only thing that struck me about Alf's implementation is that the
> starts_with() function is in the recursion hot path and by calling
> std::string::substr() might allocate memory

I also compared strings for equality when known that one is leading
substring of other, instead of just comparing lengths then.

Even for a first working version that was, well, dumb. :)

Cheers!,

- Alf

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 12:52PM

On Sat, 11 Feb 2017 12:53:01 +0100

> I also compared strings for equality when known that one is leading
> substring of other, instead of just comparing lengths then.

> Even for a first working version that was, well, dumb. :)

I think your code was OK on the equality test. The dictionary lower
bound has been reset on re-entering recursively_append_words_from(),
and end_dict_range is always one past the end of the dictionary, so
'word' might no longer be a leading substring of *lower. (It occurs
to me that since 'lower' could be at 'end_dict_range', this should
probably be checked for before 'lower' is dereferenced on calling
starts_with().)

Chris

"Öö Tiib" <ootiib@hot.ee>: Feb 11 05:39AM -0800

On Saturday, 11 February 2017 10:16:14 UTC+2, Alf P. Steinbach wrote:

> > I don't think I've ever seen this syntax you are using with -> following
> > returntype functionname(arg1, arg1)

> The `->` is called trailing return type syntax, and was introduced in C++11.

For to get used to that syntax one may write few apps in Rust and/or Swift.
It is possible even to get paid for doing uneducated attempts of usage
of Swift since there seems to be quite serious shortage of experienced
iOS developers at the moment.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 02:10PM

On Fri, 10 Feb 2017 19:14:26 -0600
> > up.

> Pushed algorithm 2 to github.
> It is significantly worse.

I am not surprised. From what I could make out of your code, you are
walking through the entire dictionary and testing iteratively whether
each word in the dictionary is on the board. On a large dictionary
that is about the worst possible implementation. Instead, on a large
dictionary you need to step through the board and work from there.

What are the respective timings, on the same input data, of:

1. Your first implementation.

2. Your second implementation

3. Alf's implementation?

Chris

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 10:27AM -0600

On 2/11/2017 8:10 AM, Chris Vine wrote:
> dictionary you need to step through the board and work from there.

> What are the respective timings, on the same input data, of:

> 1. Your first implementation.

412 ms

> 2. Your second implementation

Didn't bother. It is visibly slower, so why measure

> 3. Alf's implementation?

Working on trying to get it to conform to test, so no idea yet.

I am using
https://leetcode.com/problems/word-search-ii/?tab=Description
to test now. It also has 1000s of submissions and is close enough to the
original problem that the interviewers asked. Just adjusted for 4
directions vs 8 and no q->qu.

The acceptable range is 40ms-60ms.

If I look at the top performing answer, they used a preallocated trie
and didn't even bother with any OO or separation of concerns, which I
guess is a requirement when we start trying to squeeze out performance
and drop all concern for reusability or maintenance.

I'll post updated times when I have them.

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 11:30AM -0600

On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:

> - Alf

I am trying to get this to confor to the test on
https://leetcode.com/problems/word-search-ii/?tab=Description

That way I can easily compare results of my various attempts, your, and
others.

I am having trouble with your indexing. you init things with +1 and then
get index_from, and I am lost.

The tester's format is a wee bit different then first discussed.

)They only go verticle and horizontal
)They don't care about q->qu
)They give the input data in Solution::findWords and we are to fill out
that method and class.

I tried to get your solution into the tester's format below and I am
getting vector out of range problems:

#include <algorithm>
#include <assert.h>
#include <ctype.h>
#include <fstream>
#include <iostream>
#include <vector>
#include <set>
#include <stddef.h>
#include <stdexcept>
#include <stdlib.h>
#include <string>
#include <utility>

using Size = ptrdiff_t;
using Index = Size;

template< class Container >
auto n_items_of(Container const& c)
-> Size
{
return c.size();
}

namespace cppx {
using std::ostream;
using std::vector;
using std::runtime_error;
using std::string;

using Byte = unsigned char;
inline auto hopefully(bool const e) -> bool { return e; }
inline auto fail(string const& s) -> bool { throw runtime_error(s); }

struct Position
{
Index x;
Index y;
};

inline auto operator<<(ostream& stream, Position const& pos)
-> ostream&
{
return stream << "{" << pos.x << ", " << pos.y << "}";
}

template< class Item >
class Matrix
{
private:
struct Wrapped_item { Item object; }; // Avoid problems
with vector<bool>.

Size row_length_;
vector<Wrapped_item> items_;

auto index_for(Index x, Index y) const
-> Index
{
return y*row_length_ + x;
}

public:
auto id_for_position(Index x, Index y) const
-> Index
{
return index_for(x, y);
}

auto size() const
-> Size
{
return row_length_;
}

auto item(Index const x, Index const y) const
-> Item
{
return items_[index_for(x, y)].object;
}

auto item(Index const x, Index const y)
-> Item&
{
return items_[index_for(x, y)].object;
}

Matrix() : Matrix{ 0 } {}

explicit Matrix(Size const size)
: row_length_(size)
, items_(size*size)
{}

};

auto inline to_lowercase(char const ch)
-> char
{
return tolower(static_cast<Byte>(ch));
}

auto inline starts_with(string const& prefix, string const& s)
-> bool
{
return s.substr(0, prefix.length()) == prefix;
}

} // namespace cppx

namespace app
{
using namespace std;
using namespace cppx;

auto load_dictionary(string const& file_path)
-> vector<string>
{
ifstream data{ file_path };
hopefully(!data.fail())
|| fail("Failed to open dictionary `" + file_path + "`.");

vector<string> result;
string word;
while( getline(data, word) )
{
result.push_back(move(word));
}
return result;
}

auto chars_from(string s)
-> string
{
string chars{ s.begin(), remove(s.begin(), s.end(), ' ') };
transform(chars.begin(), chars.end(), chars.begin(), to_lowercase);
return chars;
}

using Dictionary_iter = vector<string>::const_iterator;

void recursively_append_words_from(
Dictionary_iter const start_dict_range,
Dictionary_iter const end_dict_range,
Matrix<char> const& chars,
Position const position,
Matrix<bool>& visited,
string& word,
vector<string>& result
)
{
Size const original_word_length = word.length();
char const ch = chars.item(position.x, position.y);
word += ch;
auto const lower = lower_bound(start_dict_range,
end_dict_range, word);

clog << position << " -> " << *lower << "\n";
if( starts_with(word, *lower) )
{
if( word == *lower )
{
result.push_back(word);
}
visited.item(position.x, position.y) = true;
for( int dx = -1; dx <= +1; ++dx )
{
for( int dy = -1; dy <= +1; ++dy )
{
if( position.x + dx < 0 || position.x + dx >
chars.size() - 1 ||
position.y + dy < 0 || position.y + dy >
chars.size() - 1 )
{
continue;
}

Position const new_position = { position.x + dx,
position.y + dy };
if( !visited.item(new_position.x, new_position.y) )
{
recursively_append_words_from(
lower, end_dict_range, chars, new_position,
visited, word, result
);
}
}
}
visited.item(position.x, position.y) = false;
}
word.resize(original_word_length);
}

void append_words_from(
vector<string> const& dictionary,
Matrix<char> const& chars,
Position const& start,
vector<string>& result
)
{
string word;
Size const n = chars.size();
Matrix<bool> visited{ n };

for( Index i = 0; i < n; ++i )
{
visited.item(i, 0) = true;
visited.item(0, i) = true;
visited.item(i, n - 1) = true;
visited.item(n - 1, i) = true;
}

recursively_append_words_from(
dictionary.begin(), dictionary.end(), chars, start,
visited, word, result
);
}
} // namespace app

class Solution
{
public:

Solution() {}

std::vector<std::string> findWords(std::vector<std::vector<char>>&
board, std::vector<std::string>& words)
{
// Sort the dictionary
std::sort(words.begin(), words.end());

// Load the matrix
cppx::Matrix<char> matrix(board.size());

for( Index y = 0; y < board.size(); ++y )
{
for( Index x = 0; x < board[0].size(); ++x )
{
matrix.item(x, y) = board[y][x];
}
}

// Solve
Size const n = board.size();
std::vector<std::string> results;

for( Index y = 0; y < n; ++y )
{
for( Index x = 0; x < n; ++x )
{
app::append_words_from(words, matrix, {x, y}, results);
}
}

sort(results.begin(), results.end());
results.erase(unique(results.begin(), results.end()),
results.end());

return results;
}

private:

};

int main()
{
std::vector<std::vector<char>> board = { { 'o','a','a','n' },{
'e','t','a','e', },{ 'i','h','k','r' },{ 'i','f','l','v' } };
std::vector<std::string> words = { "oath", "pea", "eat", "rain" };

Solution solution;
std::vector<std::string> result = solution.findWords(board, words);

return 0;
}

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 11:49AM -0600

On 2/11/2017 11:30 AM, Christopher J. Pisz wrote:
> On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:
>> On 10.02.2017 01:15, Christopher J. Pisz wrote:

I think he is also using a fence.
If I put the indices back as normal, I still get a vector range
assertion here:

void recursively_append_words_from(
Dictionary_iter const start_dict_range,
Dictionary_iter const end_dict_range,
Matrix<char> const& chars,
Position const position,
Matrix<bool>& visited,
string& word,
vector<string>& result
)
{
Size const original_word_length = word.length();
char const ch = chars.item(position.x, position.y);
word += ch;
auto const lower = lower_bound(start_dict_range,
end_dict_range, word);

clog << position << " -> " << *lower << "\n";

I think that is an actual bug. What if lower_bound returned end()? You
cannot deref end?

"Öö Tiib" <ootiib@hot.ee>: Feb 11 10:10AM -0800

On Saturday, 11 February 2017 18:27:33 UTC+2, Christopher J. Pisz wrote:
> and didn't even bother with any OO or separation of concerns, which I
> guess is a requirement when we start trying to squeeze out performance
> and drop all concern for reusability or maintenance.

No way. OO and separation of concerns usually improves performance.
However there is a "but".

In task under hand we have only two objects. We have "boggle field"
B and "dictionary" D. We have only one concern that involves both
objects. What texts from D are present on B? So what deep OO you
anticipate here?

Here can be done unneeded overengineering. That usually indeed
degrades performance in addition to making code more fragile and
harder to understand.

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 12:20PM -0600

On 2/11/2017 12:10 PM, Öö Tiib wrote:

> Here can be done unneeded overengineering. That usually indeed
> degrades performance in addition to making code more fragile and
> harder to understand.

I see a objects:
a board
a dictionary
a solver

and in the real world possibly:
a board::path
a coordinate

Now on the website mentioned, if we look at the best solutions, they did
indeed use a trei for their dictionary, and had to create it.

Rather than made a class Trei, they just implemented it as part of the
solver.

More than that, but a lot of them implemented it for the specific
problem at hand, rather than something that was reusable at all.

Consider this top ranked C++ solution and notice the board, the solver,
the dictionary, the trei, are all built into the one class and only
designed to solve the very specific problem:

class Solution {
vector<string> results;
int x_dim, y_dim;

struct TrieNode {
TrieNode* exits[26];
bool is_end;
};

void recurse( vector<vector<char>>& board, string& s, int x, int y,
TrieNode* trie_curr ) {
if ( unsigned(x) >= x_dim || unsigned(y) >= y_dim )
return;

const auto c = board[y][x];

if ( !c )
return;

auto trie_next = trie_curr->exits[ c-'a' ];
if ( !trie_next )
return;

s.push_back( c );
board[y][x] = 0;

if ( trie_next->is_end ) {
results.push_back( s );
trie_next->is_end = false;
}

recurse( board, s, x-1, y, trie_next );
recurse( board, s, x+1, y, trie_next );
recurse( board, s, x, y-1, trie_next );
recurse( board, s, x, y+1, trie_next );

board[y][x] = c;
s.pop_back();
}

public:
vector<string> findWords(vector<vector<char>>& board,
vector<string>& words) {
if ( !words.size() || !board.size() || !board[0].size() )
return results;

int tot_chars = 0;
for ( const auto& word : words )
tot_chars += word.length();

TrieNode* trie_pool = (TrieNode*)calloc( sizeof(TrieNode),
tot_chars + 2 );
int num_trie_nodes = 0;

TrieNode* trie_root = &trie_pool[ num_trie_nodes++ ];

for ( const auto& word : words ) {
TrieNode* trie_curr = trie_root;
for ( const auto c : word ) {
auto& trie_next = trie_curr->exits[ c-'a' ];

if (!trie_next)
trie_next = &trie_pool[ num_trie_nodes++ ];

trie_curr = trie_next;
}
trie_curr->is_end = true;
}

x_dim = board[0].size();
y_dim = board.size();

string s;
for ( auto y = 0 ; y < y_dim ; ++y )
for ( auto x = 0; x < x_dim; ++x )
recurse( board, s, x, y, trie_root );

free( trie_pool ); // from C to shining C.
return results;
}
};

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 01:24PM -0600

On 2/11/2017 11:49 AM, Christopher J. Pisz wrote:
> On 2/11/2017 11:30 AM, Christopher J. Pisz wrote:
>> On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:
>>> On 10.02.2017 01:15, Christopher J. Pisz wrote:

SNIP

I got Alf's solution to run on the test site.

If I changed everything correctly, he has scored 66ms.
Mine ran in 412 ms.
The acceptable range is 40-60ms.

So yea, My RadixDictionary is craptastic. So, now I am going to go back
and redo my code without the trei, without set, and just vectors, and
use STL sort and lower_bound, when I need to, then see if that mathches
up with Alf's score. After that I will try to reduce it further with a
trei implementation that allocates up front.

Here is the code I ran:

#include <algorithm>
#include <assert.h>
#include <ctype.h>
#include <fstream>
#include <iostream>
#include <vector>
#include <set>
#include <stddef.h>
#include <stdexcept>
#include <stdlib.h>
#include <string>
#include <utility>

/*
* Solution from Alf Steinbach on comp.lang.C++ for me to compare against
*/
using Size = ptrdiff_t;
using Index = Size;

template< class Container >
auto n_items_of(Container const& c)
-> Size
{
return c.size();
}

namespace cppx {
using std::ostream;
using std::vector;
using std::runtime_error;
using std::string;

using Byte = unsigned char;
inline auto hopefully(bool const e) -> bool { return e; }
inline auto fail(string const& s) -> bool { throw runtime_error(s); }

struct Position
{
Index x;
Index y;
};
/*
inline auto operator<<(ostream& stream, Position const& pos)
-> ostream&
{
return stream << "{" << pos.x << ", " << pos.y << "}";
}
*/
template< class Item >
class Matrix
{
private:
struct Wrapped_item { Item object; }; // Avoid problems
with vector<bool>.

Size num_rows_;
Size num_cols_;
vector<Wrapped_item> items_;

auto index_for(Index x, Index y) const
-> Index
{
return y*num_cols_ + x;
}

public:
auto id_for_position(Index x, Index y) const
-> Index
{
return index_for(x, y);
}

auto num_rows() const
-> Size
{
return num_rows_;
}

auto num_cols() const
-> Size
{
return num_cols_;
}

auto item(Index const x, Index const y) const
-> Item
{
return items_[index_for(x, y)].object;
}

auto item(Index const x, Index const y)
-> Item&
{
return items_[index_for(x, y)].object;
}

Matrix() : Matrix{ 0 } {}

explicit Matrix(Size const num_rows, Size const num_cols)
: num_rows_(num_rows)
, num_cols_(num_cols)
, items_(num_rows * num_cols)
{}

};

auto inline to_lowercase(char const ch)
-> char
{
return tolower(static_cast<Byte>(ch));
}

auto inline starts_with(string const& prefix, string const& s)
-> bool
{
return s.substr(0, prefix.length()) == prefix;
}

} // namespace cppx

namespace app
{
using namespace std;
using namespace cppx;

auto chars_from(string s)
-> string
{
string chars{ s.begin(), remove(s.begin(), s.end(), ' ') };
transform(chars.begin(), chars.end(), chars.begin(), to_lowercase);
return chars;
}

using Dictionary_iter = vector<string>::const_iterator;

void recursively_append_words_from(
Dictionary_iter const start_dict_range,
Dictionary_iter const end_dict_range,
Matrix<char> const& chars,
Position const position,
Matrix<bool>& visited,
string& word,
vector<string>& result
)
{
Size const original_word_length = word.length();
char const ch = chars.item(position.x, position.y);
word += ch;
auto const lower = lower_bound(start_dict_range,
end_dict_range, word);

if( lower == end_dict_range )
{
word.resize(original_word_length);
return;
}
/*
clog << position << " -> " << *lower << "\n";
*/
if( starts_with(word, *lower) )
{
if( word == *lower )
{
result.push_back(word);
}

visited.item(position.x, position.y) = true;

const int offsetsX[] = { 0, 1, 0, -1};
const int offsetsY[] = {-1, 0, 1, 0};

for( int offsetIndex = 0; offsetIndex < 4; ++offsetIndex )
{
Position const new_position = { position.x +
offsetsX[offsetIndex], position.y + offsetsY[offsetIndex] };

if( !visited.item(new_position.x, new_position.y) )
{
recursively_append_words_from(
lower, end_dict_range, chars, new_position,
visited, word, result
);
}
}
visited.item(position.x, position.y) = false;
}
word.resize(original_word_length);
}

void append_words_from(
vector<string> const& dictionary,
Matrix<char> const& chars,
Position const& start,
vector<string>& result
)
{
string word;
Size const num_rows = chars.num_rows();
Size const num_cols = chars.num_cols();
Matrix<bool> visited{ num_rows, num_cols };

// Flag the fence visited
for( Index index = 0; index < num_cols; ++index )
{
visited.item(index, 0) = true;
visited.item(index, num_rows - 1) = true;
}

for( Index index = 0; index < num_rows; ++index )
{
visited.item(0 , index) = true;
visited.item(num_cols - 1, index) = true;
}

recursively_append_words_from(
dictionary.begin(), dictionary.end(), chars, start,
visited, word, result
);
}
} // namespace app

class Solution
{
public:

Solution() {}

std::vector<std::string> findWords(std::vector<std::vector<char>>&
board, std::vector<std::string>& words)
{
// Sort the dictionary
std::sort(words.begin(), words.end());

// Load the matrix with a fence
cppx::Matrix<char> matrix(board.size() + 2, board[0].size() + 2);

for( Index y = 0; y < board.size(); ++y )
{
for( Index x = 0; x < board[0].size(); ++x )
{
matrix.item(x + 1, y + 1) = board[y][x];
}
}

// Solve
Size const num_rows = board.size();
Size const num_cols = board[0].size();

std::vector<std::string> results;

for( Index y = 0; y < num_rows; ++y )
{
for( Index x = 0; x < num_cols; ++x )
{
app::append_words_from(words, matrix, {x + 1, y + 1},
results);
}
}

sort(results.begin(), results.end());
results.erase(unique(results.begin(), results.end()),
results.end());

return results;
}

private:

};

"Öö Tiib" <ootiib@hot.ee>: Feb 11 11:36AM -0800

On Saturday, 11 February 2017 20:20:34 UTC+2, Christopher J. Pisz wrote:
> a board
> a dictionary
> a solver

Yes, I think that those "something doers" (like that "solver") are not
objects but functions or methods. So if there is a board whose concern
is to know what board is and board_solver whose concern is to solve
that board then it is "knower and doer" overengineering antipattern
for me.

> and in the real world possibly:
> a board::path
> a coordinate

These do not seem to help during task and are not requested by task.
What I would imagine that make sense to put forward might be "trie"
(as just generic kind of dictionary) and "solution".

> indeed use a trei for their dictionary, and had to create it.

> Rather than made a class Trei, they just implemented it as part of the
> solver.

Yes, but that design does not give any performance advantages nor
disadvantages. What gives performance advantages is limiting that trie
to 26 letters of English alphabet and things like that.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 10:21PM +0200

On 11.02.2017 21:24, Christopher J. Pisz wrote:
> and redo my code without the trei, without set, and just vectors, and
> use STL sort and lower_bound, when I need to, then see if that mathches
> up with Alf's score.

Good! Meanwhile, Alf has posted some enhancement suggestions for his
code elsethread, like replacing substr() with compare() and replacing
word == *lower check with string size comparison.

One thing that it is not clear to me - is the initial sorting of the
dictionary included in the timings?

> After that I will try to reduce it further with a
> trei implementation that allocates up front.

I suspect this might make things worse, but YMMV.

Cheers
Paavo

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 08:25PM

On Sat, 11 Feb 2017 13:24:14 -0600
> if that mathches up with Alf's score. After that I will try to reduce
> it further with a trei implementation that allocates up front.

> Here is the code I ran:

OK, can you deal with the two outstanding points I raised, namely:

1. Try using std::string::compare() instead of std::string::substr() in
starts_with().

2. You need to test for lower == end_dict_range in
recursively_append_words_from() before calling starts_with(), as it
could be equal to the end of the dictionary. (You also spotted this,
but haven't dealt with it.)

You have dealt with the clog point.

Chris

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 11 02:28PM -0600

On 2/11/2017 2:21 PM, Paavo Helde wrote:

> Good! Meanwhile, Alf has posted some enhancement suggestions for his
> code elsethread, like replacing substr() with compare() and replacing
> word == *lower check with string size comparison.

Oh, yea I forgot to throw those in too.

> One thing that it is not clear to me - is the initial sorting of the
> dictionary included in the timings?

Yes, the initialization is part of the problem.

The site I am now testing on actually gives their data unsorted, whereas
the in the interview it was sorted.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 11 08:38PM

On Sat, 11 Feb 2017 13:24:14 -0600
> vectors, and use STL sort and lower_bound, when I need to, then see
> if that mathches up with Alf's score. After that I will try to reduce
> it further with a trei implementation that allocates up front.

Posting separately (as a programmer I don't want to mix concerns!), I
hope not to sound like a cracked record but as I have said the thing you
must attend to in your own implementation is the dictionary slicing, for
which a recursive implementation is the most natural fit, or at any
rate certainly the easiest. If you can code that up with treis,
excellent, if not it won't do the business.

Chris

Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 11:06PM +0200

On 11.02.2017 22:28, Christopher J. Pisz wrote:

>> One thing that it is not clear to me - is the initial sorting of the
>> dictionary included in the timings?

> Yes, the initialization is part of the problem.

OK, then I see why the winning trie solution is the best, it probably
initializes its dictionary faster than std::sort() of a vector.

> The site I am now testing on actually gives their data unsorted, whereas
> the in the interview it was sorted.

If the initial data is already a sorted random access array, then I
suspect the std::lower_bound() based solution should have an edge.

Cheers
Paavo

"Öö Tiib" <ootiib@hot.ee>: Feb 11 01:29PM -0800

On Saturday, 11 February 2017 23:07:02 UTC+2, Paavo Helde wrote:

> If the initial data is already a sorted random access array, then I
> suspect the std::lower_bound() based solution should have an edge.

Trie is doomed to win any lower_bound in anything because finding
all words that start with F from trie is O(1) and after that
finding all the words that start with FR is again O(1).

Finding CPU utilization of NUMA nodes

"Chris M. Thomasson" <invalid@invalid.invalid>: Feb 10 09:18PM -0800

On 2/9/2017 1:43 PM, Paavo Helde wrote:

> Yes I am attempting to bind a parallel thread pool in my program to a
> single NUMA node (because profiling shows there is no point to let it
> spread across nodes, it would be just wasting computer resources).

Indeed. Sharing across NUMA nodes should be avoided like the damn plague!

> What I need is to figure out which NUMA node to bind to.

That can be a rather nasty problem; what is your optimal criteria for
choosing? Think one step further and bind worker threads from the pool
to the local processors on the NUMA node. Well, should the OS handle
that set of details?

Okay, iirc this contrived kind of thing helped my code figure out a
mapping for a system, this is from a while ago (decade+) and the idea
went something like:

take the number of NUMA nodes, and create a thread pool for each one.
Then make all of the pools just lock their individual mutexs, increment
a local numa counter, and unlock, no mutex contention between the nodes.
They do this in a loop until a signal is hit, from a single controlling
thread, that tells them to stop. The end resulting NUMA counters with
the highest numbers are more active, and better "performing" in the
system. But, keep in mind this data can be totally arbitrary, or can be
out-of-date rather quickly, depending on load in the system. I have no
idea how this system is going to effect the counters. Quite honestly,
and for some reason, your question kind of reminds me of:

https://groups.google.com/d/topic/comp.lang.c++/7u_rLgQe86k/discussion

;^)

Paavo Helde <myfirstname@osa.pri.ee>: Feb 11 07:54PM +0200

On 11.02.2017 7:18, Chris M. Thomasson wrote:

>> What I need is to figure out which NUMA node to bind to.

> That can be a rather nasty problem; what is your optimal criteria for
> choosing?

The optimal criteria is that there should be no NUMA node sitting idle
while some other NUMA node is overloaded. But it looks like this is not
so easy to achieve in general, as the jobs I am binding can be in
different processes and their number and duration are not known in advance.

> Think one step further and bind worker threads from the pool
> to the local processors on the NUMA node. Well, should the OS handle
> that set of details?

I believe OS would handle thread scheduling in a single NUMA node better
than me. Actually I have tried to bind threads to single PU-s on
non-NUMA systems and the performance stayed the same or went worse,
depending on the OS/hardware.

collection of pointers vs collection of objects

ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 11:35PM

>before summing, with similar results (the vector pointer
>is slower, but only by a factor between something like
>2 and 8).

The following program now prints

runtime = 92,345.917
runtime = 2'690,778.190

. The first runtime pertains to the vector containing
employees, the second to the vector containing pointers
to employees.

So pointers are much slower! (I am /not/ serious here,
because there is a major bias: I modified the algorithm
until I finally got this result!)

/* Warning:
When z in main is large, some operating systems
might swap a lot and might not respond anymore. */

#include <algorithm>
#include <chrono>
#include <cstddef>
#include <initializer_list>
#include <iostream>
#include <ostream>
#include <random>
#include <stdexcept>
#include <vector>

static void escape( void * p ) /* by Chandler Carruth */
{ asm volatile( "" : : "g"(p) : "memory" ); }

template< typename T >
::std::string format( T value )
{ double privacy_bias = 1.0;
double privacy_noise = 0.00;
::std::string s; int i = 0; int n = 0;
long double v = value;
v = v * privacy_bias;
v = v *( 1.0 + privacy_noise * ( rand() /( 1.0 + RAND_MAX )- 0.5 ));
value = v;
if( value )
while( value )
{ if( i &&( i % 3 == 0 ))
{ s = " .,';:-=%#$"[ i/3 ] + s; ++n; }
/* above there is a bug: i/3 might be larger than the string */
++i; ++n;
s = "0123456789"[ value % 10 ] + s;
value /= 10; }
else s = "0";
while( n++ < 20 )s = " " + s;
return s; }

struct timer
{ using timepoint = ::std::chrono::high_resolution_clock::time_point;
::timer::timepoint t1;
::timer::timepoint t2;
decltype( ::timer::t2 - ::timer::t1 )sum{};
timer(){ this->reset(); }

void reset()
{ this->t1 = ::std::chrono::high_resolution_clock::now();
this->t2 = t1;
sum = t2 - t1; }

void print()
{ this->t2 = ::std::chrono::high_resolution_clock::now();
this->sum += this->t2 - this->t1;
::std::cout << "runtime = " << format(( sum ).count()) << '\n'; }};

auto const seed
{ ::std::chrono::system_clock::now().time_since_epoch().count() };

auto engine
{ ::std::default_random_engine( seed )};

/* application */

struct emp
{ double salary = rand(); /* char data[ 200 ] = { 0, }; */ };

int main()
{ size_t z = 80'000'000; /* number of iterations */
::std::vector<emp> vector_of_employees( z );
::std::vector<emp*> vector_of_pointers( z );

for( size_t i = 0; i < z; ++i )
{ /* malloc used for size manipulations (not shown here) */
vector_of_pointers[ i ]=( emp * )malloc( sizeof( emp ));
new( vector_of_pointers[ i ]) emp{}; }

shuffle( begin( vector_of_employees ), end( vector_of_employees ), engine );
shuffle( begin( vector_of_pointers ), end( vector_of_pointers ), engine );

{ ::timer t; double s = 0;
for( size_t i = 0; i < z; ++i )s += vector_of_employees[ i ].salary;
escape( &s ); t.print(); }

{ ::timer t; double s = 0;
for( size_t i = 0; i < z; ++i )s += vector_of_pointers[ i ]->salary;
escape( &s ); t.print(); }}

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Saturday, February 11, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me