Friday, February 10, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:03PM -0600

During one interview I was asked about things I would do to optimize
some code snippet.
 
I mentioned that
 
std::vector<Employee> was probably not as good as
std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
 
Because of the copy construction on insert.
The fellow did not seem to like that answer at all.
 
I mentioned that depending on the size of the object a collection of
pointers might be better.
 
He asked about cache misses.
 
I really don't know much about how this would interact with the cache. I
explained that, as I understand it, the employee would have to be a
certain size to fit them on the cache anyway and if it was a really
large object, you probably aren't going to get the whole collection in
there anyway.
 
Now that I think about it, we have emplace and move, so maybe my point
was silly.
 
What differences do you guys know about the two ways of containing
objects and why might you choose one over the other?
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 08:25PM -0600

On 2/9/2017 7:25 PM, Stefan Ram wrote:
 
> Which results do you get when you do a microbenchmark
> for both cases (sum the salaries of all employees) for
> some large number of employees.
 
It's nearly the same. But it might not be an accurate test, because
everything is nearly contiguous since nothing else in the process is
being allocated.
 
 
> cache and the main memory have on your system?
 
> What does the processor have to do in both cases (you might
> compile the benchmark and look at the assembler output).
 
I have no idea. I don't know assembly.
 
I am not sure how byte width comes into play. I know that obviously
the entire vector of Employees does not fit, I don't know if even one
Employee fits. But assuming two fit, then I would say that iteration
over the employees will be faster in the collection that does not use
pointers, while insertion will be slower. Because a register somewhere
has to load the pointer, the address has to be read, and the actual
value has to be obtained for each entry.
 
I know the processor has to load from a hierarhcy of fast to slow
locations something like L1 cache, then L2 cache, L3, physical memory,
page file.
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 08:19AM +0200

On 10.02.2017 2:03, Christopher J. Pisz wrote:
 
> I mentioned that
 
> std::vector<Employee> was probably not as good as
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
 
This might or might not be faster, depending on how the data is
constructed and accessed. Anyway, I guess the interviewer expected an
answer which would change the big-O complexity instead.
Gareth Owen <gwowen@gmail.com>: Feb 10 08:47AM


> He asked about cache misses.
 
vector<Employee> guarantees contiguous memory so, with sane access
patterns, hardware prefetch will help you out.
With vector<Employee*> you're jumping all over the heap, so it won't.
 
> to be a certain size to fit them on the cache anyway and if it was a
> really large object, you probably aren't going to get the whole
> collection in there anyway.
 
A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably
not storing that much info per Employee.
Gareth Owen <gwowen@gmail.com>: Feb 10 08:48AM

>> collection in there anyway.
 
> A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably
> not storing that much info per Employee.
 
And, of course, prefetch isn't about storing the whole thing in cache.
It's about having the next-one ready when you need it.
Bo Persson <bop@gmb.dk>: Feb 10 10:03AM +0100

On 2017-02-10 01:03, Christopher J. Pisz wrote:
> was silly.
 
> What differences do you guys know about the two ways of containing
> objects and why might you choose one over the other?
 
You seem to focus on the cost of populating the vector, while the other
guy probably thinks about how to use the vector.
 
Having all data in contiguous memory is one advantage a vector has over
a linked list, for example. Traversing the list, every access will be a
random access that the processor can't easily predict. Very high risk
for a cache miss or a cache collision.
 
On the other hand, when traversing a vector it is very easy to predict
your next memory access and the cache might even be able to prefetch the
data. And if it isn't, the risk of a cache collision with the previous
element is about zero.
 
A vector of pointers is about the same as a linked list, just that the
links are collected in one place. The access pattern for the data will
be similar.
 
 
Bo Persson
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 10:52AM

On Thu, 9 Feb 2017 18:03:15 -0600
> collection in there anyway.
 
> Now that I think about it, we have emplace and move, so maybe my
> point was silly.
 
If the Employee type is mainly composed of fundamental types (integers,
floating points and so forth, say comprising an employee id number, the
rate of pay, a location code and the like) or moderately sized arrays of
these, then holding Employee objects in the vector by value is going to
be considerably faster because of the excellent locality. You will have
both data in cache and pre-fetchability, and also copying will be fast.
 
If however the Employee type has member data which is allocated on the
heap (which will degrade locality as between both different members of
an Employee object and different Employee objects in the vector) and is
expensive to copy, and it does not have a move constructor and move
assignment operator, then you will probably find yourself better off
holding objects of that type in the vector by pointer, unique_ptr or
shared_ptr. If the Employee type does have a move constructor and move
assignment operator then much of its implementation is probably
allocated on the heap anyway to support movability, and if the use case
mainly involves moving Employee objects into the vector rather than
copying them there, it would normally be better to hold them by value.
 
In the second case (the Employee type is not mainly composed of
fundamental types or moderately sized arrays of these) you are in a
grey area. You need to know more about the details of the
implementation of the type and how objects of the type are used. Mainly
though, you would need to measure.
 
Chris
Gareth Owen <gwowen@gmail.com>: Feb 10 10:59AM


> You need to know more about the details of the implementation of the
> type and how objects of the type are used. Mainly though, you would
> need to measure.
 
These two sentences can be used at the end to to improve your answer for
almost any programming interview question. :)
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 01:35AM +0200

On 10.02.2017 0:36, Christopher J. Pisz wrote:
> in the dictionary, right?
 
> They hinted at the dictionary being some 20MB of words, so I figure the
> lookup part is very significant.
 
The average length of a word is probably less than 20 chars, so we are
talking about 1M of words here.
 
 
> In my tests, the gains were really minimal though and the code far more
> complex, like you say, but I don't have a 20MB dictionary to test with.
 
You can easily generate a 20MB dictionary of random character sequences.
 
> myset.lower_bound(value);
 
> ?
> I thought they are equivalent, is that incorrect?
 
Yep, at least with std::set. You see, std::set is not random access.
When you need to find, e.g. the middle point in 1M words, you need to
advance the begin() iterator 500,000 times. You can see where it goes. A
20 MB dictionary might not fit in L3 caches (depending on hardware), so
traversing it in such a fashion over and over might be extremely slow.
 
http://www.cplusplus.com/reference/algorithm/lower_bound/ says: "On
non-random-access iterators, the iterator advances produce themselves an
additional linear complexity in N on average."
 
In contrast, std::set::lower_bound is guaranteed to be O(log N), with no
caveats.
 
Note that nowadays in general the memory access is the slowest thing and
cpu-intensive calculations (like short string comparison) are fast, so
if std::lower_bound() is O(log N) in comparisons and O(N) in iterator
traversal to find the things to compare, then O(N) it is, no wiggle room
left.
 
log2(1M)==20, so with 1M elements std::set::lower_bound() should be
about 50,000 times faster than std::lower_bound(), plus-minus a couple
of magnitudes for constant factors. It looks like with your complicated
radix trie structures you have managed to offset some of the losses, but
not all.
 
 
> So the equivalent would be
 
> if ( (*itChild)->m_value.compare(0, wordPart.size(), wordPart) == 0) ??
> and that would prevent one temporary from the substr return?
 
Yep. Maybe it should be possible to optimize the temporary creation
away, but I am not sure if the current compilers are smart enough for that.
 
Potentially the most expensive part in temporary string construction is
dynamic allocation. Luckily, nowadays most implementations use
short-string optimization, meaning no dynamic allocation for strings
less than 16 bytes, for example. Alas, for example gcc<5.0 does not have
SSO.
 
>> Paavo
 
> How do I prepare myself for the transition of doing 10 years of mostly
> back-end business services to doing more "high performance" code?
 
Practice?
 
> I've looked up custom memory management
 
This is important indeed.
 
> I've come to terms with people insisting on using c-style code in some
> performance intensive areas, like sprintf vs stream.
 
Streams are indeed awful in performance. Luckily I have never had a need
for streams aka "formatted text" in my professional career, it has
always been binary files or XML. I should note that sprintf() is also a
bit slow and undeterministic because of locale dependencies.
 
 
> I've reviewed the complexity of operations on the common data structures
> every day this month.
 
That's one good point! But next time read beyond the first sentence, at
least for the complexity of std::lower_bound()!
 
 
> I am not sure what else I can do.
> I am a 40 year old man about to cry if I don't get a job soon.
 
I bet you are pretty young compared to the average of this newsgroup ;-)
 
Cheers
Paavo
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 01:02AM +0100

On 09.02.2017 21:08, Christopher J. Pisz wrote:
 
> C++ is getting REALLY performance orientated. I used to get a job within
> 2 days just by knowing what virtual inheritance is. Now, I don't really
> know how to "Get better"
 
I didn't look at your code. I just implemented it straightforward using
`lower_bound` to look for dictionary words in a vector. It's sort of
instant. BUT I generally get more words than the solution file:
 
[example]
[C:\my\forums\clc++\057 boggle]
> fc my_solution.txt data\board_1_solution.txt
Comparing files my_solution.txt and DATA\BOARD_1_SOLUTION.TXT
***** my_solution.txt
a
as
entry
go
hint
***** DATA\BOARD_1_SOLUTION.TXT
entry
hint
*****
 
***** my_solution.txt
hits
in
is
it
its
***** DATA\BOARD_1_SOLUTION.TXT
hits
its
*****
 
***** my_solution.txt
pen
saint
***** DATA\BOARD_1_SOLUTION.TXT
pen
quit
quits
saint
*****
 
***** my_solution.txt
sits
so
thin
***** DATA\BOARD_1_SOLUTION.TXT
sits
thin
*****
 
***** my_solution.txt
try
we
went
***** DATA\BOARD_1_SOLUTION.TXT
try
went
*****
 
 
[C:\my\forums\clc++\057 boggle]
> _
[example]
 
The weasel word "generally" indicates a bug, that my implementation
didn't find "quit" and "quits" as it should.
 
Still, have I misunderstood the rules?
 
What are they, exactly?
 
 
Cheers!,
 
- Alf
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:15PM -0600

On 2/9/2017 6:02 PM, Alf P. Steinbach wrote:
 
> Cheers!,
 
> - Alf
 
Don't forget to turn a board Q into QU when comparing to the dictionary.
And diagonal
 
Rules are, and I am retyping them from memory...:
 
Given a board that is a grid of characters and an input file
Given a dictionary of valid words as an input file
Output all the words in the dictionary that can be found on the board,
where "finding" means
 
Start at any coordinate on the board and be able to trace the word
through adjacent nodes without traveling to any node more than once.
 
You may travel in all 8 directions. (diagonals included)
 
The dictionary input and the output is ordered alphabetically.
 
Some examples:
 
G D L W
O O Z E
D E O S
I Q S D
 
Doe
Go
God
Godless
Good
Quid
 
Assuming those are all in the dictionary file.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 01:22AM +0100

On 10.02.2017 01:15, Christopher J. Pisz wrote:
 
> Don't forget to turn a board Q into QU when comparing to the dictionary.
 
Oh!
 
 
> And diagonal
 
Huh?
 
 
> Good
> Quid
 
> Assuming those are all in the dictionary file.
 
Okay I'm stilling getting more words than the solution file has.
 
Those are short words, like "go".
 
Here:
 
 
Comparing files my_solution.txt and DATA\BOARD_1_SOLUTION.TXT
***** my_solution.txt
a
as
entry
go
hint
***** DATA\BOARD_1_SOLUTION.TXT
entry
hint
*****
 
***** my_solution.txt
hits
in
is
it
its
***** DATA\BOARD_1_SOLUTION.TXT
hits
its
*****
 
***** my_solution.txt
sits
so
thin
***** DATA\BOARD_1_SOLUTION.TXT
sits
thin
*****
 
***** my_solution.txt
try
we
went
***** DATA\BOARD_1_SOLUTION.TXT
try
went
*****
 
 
I haven't tested more boards but it seems I get all the words in the
solution file, plus some?
 
Here's the code:
 
 
#include <algorithm> // std::(is_sorted, remove, sort, transform)
#include <assert.h> // assert
#include <ctype.h> // tolower
#include <fstream> // std::ifstream
#include <iostream>
#include <vector> // std::vector
#include <stddef.h> // ptrdiff_t
#include <stdexcept> // std::(exception, runtime_error)
#include <stdlib.h> // EXIT_FAILURE, EXIT_SUCCESS
#include <string > // std::string
#include <utility> // std::move
 
using Size = ptrdiff_t;
using Index = Size;
 
template< class Container >
auto n_items_of( Container const& c )
-> Size
{ return c.size(); }
 
namespace cppx {
using std::ostream;
using std::vector;
using std::runtime_error;
using std::string;
 
using Byte = unsigned char;
inline auto hopefully( bool const e ) -> bool { return e; }
inline auto fail( string const& s ) -> bool { throw runtime_error(
s ); }
 
struct Position
{
Index x;
Index y;
};
 
inline auto operator<<( ostream& stream, Position const& pos )
-> ostream&
{ return stream << "{" << pos.x << ", " << pos.y << "}"; }
 
template< class Item >
class Matrix
{
private:
struct Wrapped_item { Item object; }; // Avoid problems
with vector<bool>.
 
Size row_length_;
vector<Wrapped_item> items_;
 
auto index_for( Index x, Index y ) const
-> Index
{ return y*row_length_ + x; }
 
public:
auto id_for_position( Index x, Index y ) const
-> Index
{ return index_for( x, y ); }
 
auto size() const
-> Size
{ return row_length_; }
 
auto item( Index const x, Index const y ) const
-> Item
{ return items_[index_for( x, y )].object; }
 
auto item( Index const x, Index const y )
-> Item&
{ return items_[index_for( x, y )].object; }
 
Matrix(): Matrix{ 0 } {}
 
explicit Matrix( Size const size )
: row_length_( size )
, items_( size*size )
{}
 
};
 
auto inline to_lowercase( char const ch )
-> char
{ return tolower( static_cast<Byte>( ch ) ); }
 
auto inline starts_with( string const& prefix, string const& s )
-> bool
{ return s.substr( 0, prefix.length() ) == prefix; }
 
} // namespace cppx
 
namespace app {
using namespace std;
using namespace cppx;
 
auto load_dictionary( string const& file_path )
-> vector<string>
{
ifstream data{ file_path };
hopefully( not data.fail() )
|| fail( "Failed to open dictionary `" + file_path + "`." );
 
vector<string> result;
string word;
while( getline( data, word ) )
{
result.push_back( move( word ) );
}
return result;
}
 
auto chars_from( string s )
-> string
{
string chars{ s.begin(), remove( s.begin(), s.end(), ' ' ) };
transform( chars.begin(), chars.end(), chars.begin(),
to_lowercase );
return chars;
}
 
auto load_bubble_board( string const& file_path )
-> Matrix<char>
{
ifstream data{ file_path };
hopefully( not data.fail() )
|| fail( "Failed to open dictionary `" + file_path + "`." );
 
string line;
string chars; // Whitespace removed.
getline( data, line )
|| fail( "Unable to read first line of `" + file_path + "`." );
chars = chars_from( line );
Size const n = n_items_of( chars );
Matrix<char> result{ 2 + n };
for( Index x = 0; x < n; ++x ) { result.item( x + 1, 1 ) =
chars[x]; }
for( Index y = 1; y < n; ++y )
{
getline( data, line )
|| fail( "Unable to read line " + to_string( y + 1 ) +
" of `" + file_path + "`." );
chars = chars_from( line );
hopefully( n_items_of( chars ) == n )
|| fail( "Inconsistent row lengths in board data `" +
file_path + "`" );
for( Index x = 0; x < n; ++x ) { result.item( x + 1, y + 1
) = chars[x]; }
}
return result;
}
 
using Dictionary_iter = vector<string>::const_iterator;
 
void recursively_append_words_from(
Dictionary_iter const start_dict_range,
Dictionary_iter const end_dict_range,
Matrix<char> const& chars,
Position const position,
Matrix<bool>& visited,
string& word,
vector<string>& result
)
{
Size const original_word_length = word.length();
char const ch = chars.item( position.x, position.y );
if( ch == 'q' ) { word += "qu"; } else { word += ch; }
auto const lower = lower_bound( start_dict_range,
end_dict_range, word );
 
clog << position << " -> " << *lower << "\n";
if( starts_with( word, *lower ) )
{
if( word == *lower )
{
result.push_back( word );
}
visited.item( position.x, position.y ) = true;
for( int dx = -1; dx <= +1; ++dx )
{
for( int dy = -1; dy <= +1; ++dy )
{
Position const new_position = {position.x + dx,
position.y + dy};
if( not visited.item( new_position.x,
new_position.y ) )
{
recursively_append_words_from(
lower, end_dict_range, chars, new_position,
visited, word, result
);
}
}
}
visited.item( position.x, position.y ) = false;
}
word.resize( original_word_length );
}
 
void append_words_from(
vector<string> const& dictionary,
Matrix<char> const& chars,
Position const& start,
vector<string>& result
)
{
string word;
Size const n = chars.size();
Matrix<bool> visited{ n };
 
for( Index i = 0; i < n; ++ i )
{
visited.item( i, 0 ) = true;
visited.item( 0, i ) = true;
visited.item( i, n - 1 ) = true;
visited.item( n - 1, i ) = true;
}
 
recursively_append_words_from(
dictionary.begin(), dictionary.end(), chars, start,
visited, word, result
);
}
 
void cpp_main( vector<string> const& args )
{
hopefully( n_items_of( args ) == 3 )
|| fail( "Usage: " + args[0] + " DICTIONARY BUBBLEBOARD" );
 
vector<string> const dictionary = load_dictionary( args[1] );
Matrix<char> const board = load_bubble_board( args[2] );
 
assert( is_sorted( dictionary.begin(), dictionary.end() ) );
Size const n = board.size();
vector<string> words;
for( Index y = 0; y < n; ++y ) for( Index x = 0; x < n; ++x )
{
append_words_from( dictionary, board, {x + 1, y + 1}, words );
}
 
sort( words.begin(), words.end() );
words.erase( unique( words.begin(), words.end() ), words.end() );
for( auto const& word : words )
{
cout << word << "\n";
}
}
} // namespace app
 
auto main( int const n_args, char** args )
-> int
{
using namespace std;
try
{
app::cpp_main( {args, args + n_args} );
return EXIT_SUCCESS;
}
catch( exception const& x )
{
cerr << "! " << x.what() << "\n";
}
return EXIT_FAILURE;
}
 
 
Cheers & hth.,
 
- Alf
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:29PM -0600

On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:
 
> Cheers & hth.,
 
> - Alf
 
Oh crap. I forgot a rule.
The word has to be at least 3 letters long. Sorry. From memory sucks
with old man memory.
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 07:00PM -0600

On 2/9/2017 5:35 PM, Paavo Helde wrote:
 
>>>> I did a depth first search and compared the path thus far with words in
>>>> the dictionary using a Radix Trei. I don't know of any other possible
>>>> algorithm to solve the problem, or how I would implement it "better."
 
SNIP
 
 
> additional linear complexity in N on average."
 
> In contrast, std::set::lower_bound is guaranteed to be O(log N), with no
> caveats.
 
SNIP
 
>> and that would prevent one temporary from the substr return?
 
> Yep. Maybe it should be possible to optimize the temporary creation
> away, but I am not sure if the current compilers are smart enough for that.
 
I got about 30% better performance out of those changes.
 
Significant, but they were claiming other implemented it 25X as fast as
mine. There must be something else fundamentally wrong, like the
algorithm itself.
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 07:13PM -0600

On 2/9/2017 6:22 PM, Alf P. Steinbach wrote:
 
> Cheers & hth.,
 
> - Alf
 
As far as I can tell you are doing pretty much exactly what I did when I
used lower_bound. Although, you're code is fancier with new C++11 and
C++14 stuff.
 
My dictionary was a set<string> at that time though, compared to your
vector<string>
 
I wonder if they would have kicked you out too :/
Or maybe I am just missing some glaring error in my code or its logic.
 
I wish I could find a bigger board, dictionary, and solutions to test
with somewhere. Mine solves in a matter of microseconds in release with
the given data, but whatever they used, they claim it took 25 seconds.
 
I need to ponder how I would generate a bigger set of data, but still
have the solutions to test against.
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 08:06AM +0200

On 10.02.2017 3:13, Christopher J. Pisz wrote:
 
>> void recursively_append_words_from(
>> Dictionary_iter const start_dict_range,
>> Dictionary_iter const end_dict_range,
[...]
>> )
>> {
[...]
> C++14 stuff.
 
> My dictionary was a set<string> at that time though, compared to your
> vector<string>
 
Alf is using a sorted vector, and this means the iterators are random
access. This makes the big difference of O(log N) instead of O(N) in
std::lower_bound().
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 01:43AM -0600

On 2/9/2017 2:08 PM, Christopher J. Pisz wrote:
 
> C++ is getting REALLY performance orientated. I used to get a job within
> 2 days just by knowing what virtual inheritance is. Now, I don't really
> know how to "Get better"
 
 
 
So I just woke up and I can't sleep, because this is bothering me so much.
 
The thought occurred to me:
 
What if instead of traversing the board and checking if what we find on
every node along our path is some word in our huge dictionary, we
checked if words in the dictionary were on the board instead.
 
I saw so much work on prefixes, that I envision the pattern:
to
top
toppless
topper
topping
 
Well, if we don't find the letters "to" then we can eliminate ALL those
words!
 
Since the dictionary size is obviously really large, this will greatly
reduce it.
 
Also, we can make a lookup by word that gives us a structure containing
the last letter found, its location on the board, and a grid denoting
which nodes were traversed. That way after searching for the word "to"
we can pick up where we left off for any future searches for words
beginning with "to"
 
Eurika!
I am implementing this now. I suspect it will greatly speed things up.
"Öö Tiib" <ootiib@hot.ee>: Feb 10 01:39AM -0800

On Friday, 10 February 2017 09:43:54 UTC+2, Christopher J. Pisz wrote:
> topping
 
> Well, if we don't find the letters "to" then we can eliminate ALL those
> words!
 
Sure, yes. But beware! You may draw oversimplified conclusion
from that.
 
The dictionary structure that is used when that matters is suffix
tree also named "trie". You can likely find insightful discussions
in net about usage of trie for speeding such search algorithms.
Public domain trie implementations in C++ that I have seen are
not too great (or may be did not fit to my problems too well).
 
 
> Since the dictionary size is obviously really large, this will greatly
> reduce it.
 
Why? No. That's what I meant with "beware" above. It does not pack
data on general case.
 
> beginning with "to"
 
> Eurika!
> I am implementing this now. I suspect it will greatly speed things up.
 
:) Yes, usage of correct data layout can give order of magnitude or better
performance improvements. But ... read about it a bit.
Popping mad <rainbow@colition.gov>: Feb 10 04:30AM

Has anyone ever seen any standard code for a Euler Path?
"Öö Tiib" <ootiib@hot.ee>: Feb 09 11:33PM -0800

On Friday, 10 February 2017 06:30:09 UTC+2, Popping mad wrote:
> Has anyone ever seen any standard code for a Euler Path?
 
I have copy-pasted me such pseudo-code from somewhere:
 
compute_euler(G, path, is_circuit)
s := V[g][0] initialize the start vertex
if (is_circuit)
for each vertex u in V[G]
if (is_odd(u))
s := u choose an odd vertex if computing a path
break
end if
end for
end if
for each edge e in E[G] initialize edge colors
color[e] := WHITE
end for
tmp_vertex := null
PUSH(s, stack) push start vertex to stack
while (SIZE(stack) != 0)
u := TOP(stack)
edge_available := false
for each edge e in out_edges[u] search for an unused incident edge
if (color[e] == WHITE)
edge_available := true
color[e] := BLACK
neighbor := target(e)
PUSH(neighbor, stack) push the edge's target vertex to the stack
break
end if
end for
if (NOT(edge_available) if edge was unavailable, backtrack and add
if (tmp_vertex != null) to final path
edge := edge(tmp_vertex, u, G)
ADD(edge, path)
end if
tmp_vertex := u
POP(stack)
end if
end while
end
Ben Bacarisse <ben.usenet@bsb.me.uk>: Feb 10 01:15AM


> Some implementations have this property:
 
> A double has at least 14 significant digits (about 14-16) and
 
> 9007199254740991.
 
I.e. 2^53 - 1 or, in most C implementations these days
 
(1llu << DBL_MANT_DIG) - 1
 
This works when FLT_RADIX is 2, but it need not be.
 
 
> I wonder whether one can somehow find the smallest
> omega for the type double that is required by the
> standard.
 
I don't think you can get more than an approximation to it and probably
quite a crude one at that. The standard does not give minimum values
for DBL_MANT_DIG as you have no doubt found out already.
 
<snip>
--
Ben.
Robert Wessel <robertwessel2@yahoo.com>: Feb 09 11:58PM -0600

On Fri, 10 Feb 2017 01:15:50 +0000, Ben Bacarisse
 
>I.e. 2^53 - 1 or, in most C implementations these days
 
> (1llu << DBL_MANT_DIG) - 1
 
>This works when FLT_RADIX is 2, but it need not be.
 
 
(pow(FLT_RADIX, DBL_MANT_DIG) - 1)
 
Should get close. That does have the disadvantage that it can't be
computed at compile time. And resulting in a FP number.
 
 
 
>I don't think you can get more than an approximation to it and probably
>quite a crude one at that. The standard does not give minimum values
>for DBL_MANT_DIG as you have no doubt found out already.
 
 
I'm not sure what the OP is asking, but if it's about the minimum
possible "omega" on a conforming implementation (rather than the
actual omega of a particular implementation), then the answer would
appear to be approximately:
 
(pow(10,DBL_DIG) - 1)
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 12:52AM

Newsgroups: comp.lang.c,comp.lang.c++
 
#include <float.h>
 
Some implementations have this property:
 
A double has at least 14 significant digits (about 14-16) and
 
9007199254740991.
 
is the largest double value that can be added to 1.0
giving the "next" double value
 
9007199254740992.
 
(when one adds 1.0 again, the value will not change
anymore).
 
I don't know whether there is a common name for such a
value. I will call it (just for this post) "omega".
(In the above example, the omega is 9007199254740991.)
 
Now, the standard does not require 14 significant
digits, but 10 (DBL_DIG).
 
I wonder whether one can somehow find the smallest
omega for the type double that is required by the
standard.
 
Clearly, omega >= 0 and omega <= DBL_MAX.
 
Newsgroups: comp.lang.c,comp.lang.c++
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 01:25AM

>collection of pointers vs collection of objects
 
Collections of pointers /are/ collections of objects,
because pointers /are/ objects!
 
>std::vector<Employee> was probably not as good as
>std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
 
Which results do you get when you do a microbenchmark
for both cases (sum the salaries of all employees) for
some large number of employees.
 
Which byte width and access times do the L1 cache, the L2
cache and the main memory have on your system?
 
What does the processor have to do in both cases (you might
compile the benchmark and look at the assembler output).
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 03:30AM

>It's nearly the same. But it might not be an accurate test, because
>everything is nearly contiguous since nothing else in the process is
>being allocated.
 
Yes, I just confirmed your results. The difference when one just
fills the two vectors and then loops through each one is negligible.
 
The common wisdom is that a vector with all data in place is much
faster than a vector with pointers to heap instances.
 
But in this special benchmark, it cannot be observed.
 
Your interpretation about locality for the allocated
blocks might be correct.
 
When I vary the size of the objects on the heap (so that
they do not all have the same size), then the runtime for
the vector with pointers gets worse by a factor between
2 and 8. The contrast is not as large as expected by me.
 
I then also used ::std::shuffle on the pointer vector
before summing, with similar results (the vector pointer
is slower, but only by a factor between something like
2 and 8).
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: