- collection of pointers vs collection of objects - 8 Updates
- Request Feedback on Test - 10 Updates
- Euler Path - 2 Updates
- The required minimum omega for double - 2 Updates
- The required minimum omega for double - 3 Updates
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:03PM -0600 During one interview I was asked about things I would do to optimize some code snippet. I mentioned that std::vector<Employee> was probably not as good as std::vector<Employee *> or std::vector<std::shared_ptr<Employee>> Because of the copy construction on insert. The fellow did not seem to like that answer at all. I mentioned that depending on the size of the object a collection of pointers might be better. He asked about cache misses. I really don't know much about how this would interact with the cache. I explained that, as I understand it, the employee would have to be a certain size to fit them on the cache anyway and if it was a really large object, you probably aren't going to get the whole collection in there anyway. Now that I think about it, we have emplace and move, so maybe my point was silly. What differences do you guys know about the two ways of containing objects and why might you choose one over the other? |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 08:25PM -0600 On 2/9/2017 7:25 PM, Stefan Ram wrote: > Which results do you get when you do a microbenchmark > for both cases (sum the salaries of all employees) for > some large number of employees. It's nearly the same. But it might not be an accurate test, because everything is nearly contiguous since nothing else in the process is being allocated. > cache and the main memory have on your system? > What does the processor have to do in both cases (you might > compile the benchmark and look at the assembler output). I have no idea. I don't know assembly. I am not sure how byte width comes into play. I know that obviously the entire vector of Employees does not fit, I don't know if even one Employee fits. But assuming two fit, then I would say that iteration over the employees will be faster in the collection that does not use pointers, while insertion will be slower. Because a register somewhere has to load the pointer, the address has to be read, and the actual value has to be obtained for each entry. I know the processor has to load from a hierarhcy of fast to slow locations something like L1 cache, then L2 cache, L3, physical memory, page file. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 08:19AM +0200 On 10.02.2017 2:03, Christopher J. Pisz wrote: > I mentioned that > std::vector<Employee> was probably not as good as > std::vector<Employee *> or std::vector<std::shared_ptr<Employee>> This might or might not be faster, depending on how the data is constructed and accessed. Anyway, I guess the interviewer expected an answer which would change the big-O complexity instead. |
Gareth Owen <gwowen@gmail.com>: Feb 10 08:47AM > He asked about cache misses. vector<Employee> guarantees contiguous memory so, with sane access patterns, hardware prefetch will help you out. With vector<Employee*> you're jumping all over the heap, so it won't. > to be a certain size to fit them on the cache anyway and if it was a > really large object, you probably aren't going to get the whole > collection in there anyway. A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably not storing that much info per Employee. |
Gareth Owen <gwowen@gmail.com>: Feb 10 08:48AM >> collection in there anyway. > A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably > not storing that much info per Employee. And, of course, prefetch isn't about storing the whole thing in cache. It's about having the next-one ready when you need it. |
Bo Persson <bop@gmb.dk>: Feb 10 10:03AM +0100 On 2017-02-10 01:03, Christopher J. Pisz wrote: > was silly. > What differences do you guys know about the two ways of containing > objects and why might you choose one over the other? You seem to focus on the cost of populating the vector, while the other guy probably thinks about how to use the vector. Having all data in contiguous memory is one advantage a vector has over a linked list, for example. Traversing the list, every access will be a random access that the processor can't easily predict. Very high risk for a cache miss or a cache collision. On the other hand, when traversing a vector it is very easy to predict your next memory access and the cache might even be able to prefetch the data. And if it isn't, the risk of a cache collision with the previous element is about zero. A vector of pointers is about the same as a linked list, just that the links are collected in one place. The access pattern for the data will be similar. Bo Persson |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 10:52AM On Thu, 9 Feb 2017 18:03:15 -0600 > collection in there anyway. > Now that I think about it, we have emplace and move, so maybe my > point was silly. If the Employee type is mainly composed of fundamental types (integers, floating points and so forth, say comprising an employee id number, the rate of pay, a location code and the like) or moderately sized arrays of these, then holding Employee objects in the vector by value is going to be considerably faster because of the excellent locality. You will have both data in cache and pre-fetchability, and also copying will be fast. If however the Employee type has member data which is allocated on the heap (which will degrade locality as between both different members of an Employee object and different Employee objects in the vector) and is expensive to copy, and it does not have a move constructor and move assignment operator, then you will probably find yourself better off holding objects of that type in the vector by pointer, unique_ptr or shared_ptr. If the Employee type does have a move constructor and move assignment operator then much of its implementation is probably allocated on the heap anyway to support movability, and if the use case mainly involves moving Employee objects into the vector rather than copying them there, it would normally be better to hold them by value. In the second case (the Employee type is not mainly composed of fundamental types or moderately sized arrays of these) you are in a grey area. You need to know more about the details of the implementation of the type and how objects of the type are used. Mainly though, you would need to measure. Chris |
Gareth Owen <gwowen@gmail.com>: Feb 10 10:59AM > You need to know more about the details of the implementation of the > type and how objects of the type are used. Mainly though, you would > need to measure. These two sentences can be used at the end to to improve your answer for almost any programming interview question. :) |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 01:35AM +0200 On 10.02.2017 0:36, Christopher J. Pisz wrote: > in the dictionary, right? > They hinted at the dictionary being some 20MB of words, so I figure the > lookup part is very significant. The average length of a word is probably less than 20 chars, so we are talking about 1M of words here. > In my tests, the gains were really minimal though and the code far more > complex, like you say, but I don't have a 20MB dictionary to test with. You can easily generate a 20MB dictionary of random character sequences. > myset.lower_bound(value); > ? > I thought they are equivalent, is that incorrect? Yep, at least with std::set. You see, std::set is not random access. When you need to find, e.g. the middle point in 1M words, you need to advance the begin() iterator 500,000 times. You can see where it goes. A 20 MB dictionary might not fit in L3 caches (depending on hardware), so traversing it in such a fashion over and over might be extremely slow. http://www.cplusplus.com/reference/algorithm/lower_bound/ says: "On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average." In contrast, std::set::lower_bound is guaranteed to be O(log N), with no caveats. Note that nowadays in general the memory access is the slowest thing and cpu-intensive calculations (like short string comparison) are fast, so if std::lower_bound() is O(log N) in comparisons and O(N) in iterator traversal to find the things to compare, then O(N) it is, no wiggle room left. log2(1M)==20, so with 1M elements std::set::lower_bound() should be about 50,000 times faster than std::lower_bound(), plus-minus a couple of magnitudes for constant factors. It looks like with your complicated radix trie structures you have managed to offset some of the losses, but not all. > So the equivalent would be > if ( (*itChild)->m_value.compare(0, wordPart.size(), wordPart) == 0) ?? > and that would prevent one temporary from the substr return? Yep. Maybe it should be possible to optimize the temporary creation away, but I am not sure if the current compilers are smart enough for that. Potentially the most expensive part in temporary string construction is dynamic allocation. Luckily, nowadays most implementations use short-string optimization, meaning no dynamic allocation for strings less than 16 bytes, for example. Alas, for example gcc<5.0 does not have SSO. >> Paavo > How do I prepare myself for the transition of doing 10 years of mostly > back-end business services to doing more "high performance" code? Practice? > I've looked up custom memory management This is important indeed. > I've come to terms with people insisting on using c-style code in some > performance intensive areas, like sprintf vs stream. Streams are indeed awful in performance. Luckily I have never had a need for streams aka "formatted text" in my professional career, it has always been binary files or XML. I should note that sprintf() is also a bit slow and undeterministic because of locale dependencies. > I've reviewed the complexity of operations on the common data structures > every day this month. That's one good point! But next time read beyond the first sentence, at least for the complexity of std::lower_bound()! > I am not sure what else I can do. > I am a 40 year old man about to cry if I don't get a job soon. I bet you are pretty young compared to the average of this newsgroup ;-) Cheers Paavo |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 01:02AM +0100 On 09.02.2017 21:08, Christopher J. Pisz wrote: > C++ is getting REALLY performance orientated. I used to get a job within > 2 days just by knowing what virtual inheritance is. Now, I don't really > know how to "Get better" I didn't look at your code. I just implemented it straightforward using `lower_bound` to look for dictionary words in a vector. It's sort of instant. BUT I generally get more words than the solution file: [example] [C:\my\forums\clc++\057 boggle] > fc my_solution.txt data\board_1_solution.txt Comparing files my_solution.txt and DATA\BOARD_1_SOLUTION.TXT ***** my_solution.txt a as entry go hint ***** DATA\BOARD_1_SOLUTION.TXT entry hint ***** ***** my_solution.txt hits in is it its ***** DATA\BOARD_1_SOLUTION.TXT hits its ***** ***** my_solution.txt pen saint ***** DATA\BOARD_1_SOLUTION.TXT pen quit quits saint ***** ***** my_solution.txt sits so thin ***** DATA\BOARD_1_SOLUTION.TXT sits thin ***** ***** my_solution.txt try we went ***** DATA\BOARD_1_SOLUTION.TXT try went ***** [C:\my\forums\clc++\057 boggle] > _ [example] The weasel word "generally" indicates a bug, that my implementation didn't find "quit" and "quits" as it should. Still, have I misunderstood the rules? What are they, exactly? Cheers!, - Alf |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:15PM -0600 On 2/9/2017 6:02 PM, Alf P. Steinbach wrote: > Cheers!, > - Alf Don't forget to turn a board Q into QU when comparing to the dictionary. And diagonal Rules are, and I am retyping them from memory...: Given a board that is a grid of characters and an input file Given a dictionary of valid words as an input file Output all the words in the dictionary that can be found on the board, where "finding" means Start at any coordinate on the board and be able to trace the word through adjacent nodes without traveling to any node more than once. You may travel in all 8 directions. (diagonals included) The dictionary input and the output is ordered alphabetically. Some examples: G D L W O O Z E D E O S I Q S D Doe Go God Godless Good Quid Assuming those are all in the dictionary file. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 01:22AM +0100 On 10.02.2017 01:15, Christopher J. Pisz wrote: > Don't forget to turn a board Q into QU when comparing to the dictionary. Oh! > And diagonal Huh? > Good > Quid > Assuming those are all in the dictionary file. Okay I'm stilling getting more words than the solution file has. Those are short words, like "go". Here: Comparing files my_solution.txt and DATA\BOARD_1_SOLUTION.TXT ***** my_solution.txt a as entry go hint ***** DATA\BOARD_1_SOLUTION.TXT entry hint ***** ***** my_solution.txt hits in is it its ***** DATA\BOARD_1_SOLUTION.TXT hits its ***** ***** my_solution.txt sits so thin ***** DATA\BOARD_1_SOLUTION.TXT sits thin ***** ***** my_solution.txt try we went ***** DATA\BOARD_1_SOLUTION.TXT try went ***** I haven't tested more boards but it seems I get all the words in the solution file, plus some? Here's the code: #include <algorithm> // std::(is_sorted, remove, sort, transform) #include <assert.h> // assert #include <ctype.h> // tolower #include <fstream> // std::ifstream #include <iostream> #include <vector> // std::vector #include <stddef.h> // ptrdiff_t #include <stdexcept> // std::(exception, runtime_error) #include <stdlib.h> // EXIT_FAILURE, EXIT_SUCCESS #include <string > // std::string #include <utility> // std::move using Size = ptrdiff_t; using Index = Size; template< class Container > auto n_items_of( Container const& c ) -> Size { return c.size(); } namespace cppx { using std::ostream; using std::vector; using std::runtime_error; using std::string; using Byte = unsigned char; inline auto hopefully( bool const e ) -> bool { return e; } inline auto fail( string const& s ) -> bool { throw runtime_error( s ); } struct Position { Index x; Index y; }; inline auto operator<<( ostream& stream, Position const& pos ) -> ostream& { return stream << "{" << pos.x << ", " << pos.y << "}"; } template< class Item > class Matrix { private: struct Wrapped_item { Item object; }; // Avoid problems with vector<bool>. Size row_length_; vector<Wrapped_item> items_; auto index_for( Index x, Index y ) const -> Index { return y*row_length_ + x; } public: auto id_for_position( Index x, Index y ) const -> Index { return index_for( x, y ); } auto size() const -> Size { return row_length_; } auto item( Index const x, Index const y ) const -> Item { return items_[index_for( x, y )].object; } auto item( Index const x, Index const y ) -> Item& { return items_[index_for( x, y )].object; } Matrix(): Matrix{ 0 } {} explicit Matrix( Size const size ) : row_length_( size ) , items_( size*size ) {} }; auto inline to_lowercase( char const ch ) -> char { return tolower( static_cast<Byte>( ch ) ); } auto inline starts_with( string const& prefix, string const& s ) -> bool { return s.substr( 0, prefix.length() ) == prefix; } } // namespace cppx namespace app { using namespace std; using namespace cppx; auto load_dictionary( string const& file_path ) -> vector<string> { ifstream data{ file_path }; hopefully( not data.fail() ) || fail( "Failed to open dictionary `" + file_path + "`." ); vector<string> result; string word; while( getline( data, word ) ) { result.push_back( move( word ) ); } return result; } auto chars_from( string s ) -> string { string chars{ s.begin(), remove( s.begin(), s.end(), ' ' ) }; transform( chars.begin(), chars.end(), chars.begin(), to_lowercase ); return chars; } auto load_bubble_board( string const& file_path ) -> Matrix<char> { ifstream data{ file_path }; hopefully( not data.fail() ) || fail( "Failed to open dictionary `" + file_path + "`." ); string line; string chars; // Whitespace removed. getline( data, line ) || fail( "Unable to read first line of `" + file_path + "`." ); chars = chars_from( line ); Size const n = n_items_of( chars ); Matrix<char> result{ 2 + n }; for( Index x = 0; x < n; ++x ) { result.item( x + 1, 1 ) = chars[x]; } for( Index y = 1; y < n; ++y ) { getline( data, line ) || fail( "Unable to read line " + to_string( y + 1 ) + " of `" + file_path + "`." ); chars = chars_from( line ); hopefully( n_items_of( chars ) == n ) || fail( "Inconsistent row lengths in board data `" + file_path + "`" ); for( Index x = 0; x < n; ++x ) { result.item( x + 1, y + 1 ) = chars[x]; } } return result; } using Dictionary_iter = vector<string>::const_iterator; void recursively_append_words_from( Dictionary_iter const start_dict_range, Dictionary_iter const end_dict_range, Matrix<char> const& chars, Position const position, Matrix<bool>& visited, string& word, vector<string>& result ) { Size const original_word_length = word.length(); char const ch = chars.item( position.x, position.y ); if( ch == 'q' ) { word += "qu"; } else { word += ch; } auto const lower = lower_bound( start_dict_range, end_dict_range, word ); clog << position << " -> " << *lower << "\n"; if( starts_with( word, *lower ) ) { if( word == *lower ) { result.push_back( word ); } visited.item( position.x, position.y ) = true; for( int dx = -1; dx <= +1; ++dx ) { for( int dy = -1; dy <= +1; ++dy ) { Position const new_position = {position.x + dx, position.y + dy}; if( not visited.item( new_position.x, new_position.y ) ) { recursively_append_words_from( lower, end_dict_range, chars, new_position, visited, word, result ); } } } visited.item( position.x, position.y ) = false; } word.resize( original_word_length ); } void append_words_from( vector<string> const& dictionary, Matrix<char> const& chars, Position const& start, vector<string>& result ) { string word; Size const n = chars.size(); Matrix<bool> visited{ n }; for( Index i = 0; i < n; ++ i ) { visited.item( i, 0 ) = true; visited.item( 0, i ) = true; visited.item( i, n - 1 ) = true; visited.item( n - 1, i ) = true; } recursively_append_words_from( dictionary.begin(), dictionary.end(), chars, start, visited, word, result ); } void cpp_main( vector<string> const& args ) { hopefully( n_items_of( args ) == 3 ) || fail( "Usage: " + args[0] + " DICTIONARY BUBBLEBOARD" ); vector<string> const dictionary = load_dictionary( args[1] ); Matrix<char> const board = load_bubble_board( args[2] ); assert( is_sorted( dictionary.begin(), dictionary.end() ) ); Size const n = board.size(); vector<string> words; for( Index y = 0; y < n; ++y ) for( Index x = 0; x < n; ++x ) { append_words_from( dictionary, board, {x + 1, y + 1}, words ); } sort( words.begin(), words.end() ); words.erase( unique( words.begin(), words.end() ), words.end() ); for( auto const& word : words ) { cout << word << "\n"; } } } // namespace app auto main( int const n_args, char** args ) -> int { using namespace std; try { app::cpp_main( {args, args + n_args} ); return EXIT_SUCCESS; } catch( exception const& x ) { cerr << "! " << x.what() << "\n"; } return EXIT_FAILURE; } Cheers & hth., - Alf |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 06:29PM -0600 On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: > Cheers & hth., > - Alf Oh crap. I forgot a rule. The word has to be at least 3 letters long. Sorry. From memory sucks with old man memory. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 07:00PM -0600 On 2/9/2017 5:35 PM, Paavo Helde wrote: >>>> I did a depth first search and compared the path thus far with words in >>>> the dictionary using a Radix Trei. I don't know of any other possible >>>> algorithm to solve the problem, or how I would implement it "better." SNIP > additional linear complexity in N on average." > In contrast, std::set::lower_bound is guaranteed to be O(log N), with no > caveats. SNIP >> and that would prevent one temporary from the substr return? > Yep. Maybe it should be possible to optimize the temporary creation > away, but I am not sure if the current compilers are smart enough for that. I got about 30% better performance out of those changes. Significant, but they were claiming other implemented it 25X as fast as mine. There must be something else fundamentally wrong, like the algorithm itself. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 09 07:13PM -0600 On 2/9/2017 6:22 PM, Alf P. Steinbach wrote: > Cheers & hth., > - Alf As far as I can tell you are doing pretty much exactly what I did when I used lower_bound. Although, you're code is fancier with new C++11 and C++14 stuff. My dictionary was a set<string> at that time though, compared to your vector<string> I wonder if they would have kicked you out too :/ Or maybe I am just missing some glaring error in my code or its logic. I wish I could find a bigger board, dictionary, and solutions to test with somewhere. Mine solves in a matter of microseconds in release with the given data, but whatever they used, they claim it took 25 seconds. I need to ponder how I would generate a bigger set of data, but still have the solutions to test against. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 08:06AM +0200 On 10.02.2017 3:13, Christopher J. Pisz wrote: >> void recursively_append_words_from( >> Dictionary_iter const start_dict_range, >> Dictionary_iter const end_dict_range, [...] >> ) >> { [...] > C++14 stuff. > My dictionary was a set<string> at that time though, compared to your > vector<string> Alf is using a sorted vector, and this means the iterators are random access. This makes the big difference of O(log N) instead of O(N) in std::lower_bound(). |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 01:43AM -0600 On 2/9/2017 2:08 PM, Christopher J. Pisz wrote: > C++ is getting REALLY performance orientated. I used to get a job within > 2 days just by knowing what virtual inheritance is. Now, I don't really > know how to "Get better" So I just woke up and I can't sleep, because this is bothering me so much. The thought occurred to me: What if instead of traversing the board and checking if what we find on every node along our path is some word in our huge dictionary, we checked if words in the dictionary were on the board instead. I saw so much work on prefixes, that I envision the pattern: to top toppless topper topping Well, if we don't find the letters "to" then we can eliminate ALL those words! Since the dictionary size is obviously really large, this will greatly reduce it. Also, we can make a lookup by word that gives us a structure containing the last letter found, its location on the board, and a grid denoting which nodes were traversed. That way after searching for the word "to" we can pick up where we left off for any future searches for words beginning with "to" Eurika! I am implementing this now. I suspect it will greatly speed things up. |
"Öö Tiib" <ootiib@hot.ee>: Feb 10 01:39AM -0800 On Friday, 10 February 2017 09:43:54 UTC+2, Christopher J. Pisz wrote: > topping > Well, if we don't find the letters "to" then we can eliminate ALL those > words! Sure, yes. But beware! You may draw oversimplified conclusion from that. The dictionary structure that is used when that matters is suffix tree also named "trie". You can likely find insightful discussions in net about usage of trie for speeding such search algorithms. Public domain trie implementations in C++ that I have seen are not too great (or may be did not fit to my problems too well). > Since the dictionary size is obviously really large, this will greatly > reduce it. Why? No. That's what I meant with "beware" above. It does not pack data on general case. > beginning with "to" > Eurika! > I am implementing this now. I suspect it will greatly speed things up. :) Yes, usage of correct data layout can give order of magnitude or better performance improvements. But ... read about it a bit. |
Popping mad <rainbow@colition.gov>: Feb 10 04:30AM Has anyone ever seen any standard code for a Euler Path? |
"Öö Tiib" <ootiib@hot.ee>: Feb 09 11:33PM -0800 On Friday, 10 February 2017 06:30:09 UTC+2, Popping mad wrote: > Has anyone ever seen any standard code for a Euler Path? I have copy-pasted me such pseudo-code from somewhere: compute_euler(G, path, is_circuit) s := V[g][0] initialize the start vertex if (is_circuit) for each vertex u in V[G] if (is_odd(u)) s := u choose an odd vertex if computing a path break end if end for end if for each edge e in E[G] initialize edge colors color[e] := WHITE end for tmp_vertex := null PUSH(s, stack) push start vertex to stack while (SIZE(stack) != 0) u := TOP(stack) edge_available := false for each edge e in out_edges[u] search for an unused incident edge if (color[e] == WHITE) edge_available := true color[e] := BLACK neighbor := target(e) PUSH(neighbor, stack) push the edge's target vertex to the stack break end if end for if (NOT(edge_available) if edge was unavailable, backtrack and add if (tmp_vertex != null) to final path edge := edge(tmp_vertex, u, G) ADD(edge, path) end if tmp_vertex := u POP(stack) end if end while end |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Feb 10 01:15AM > Some implementations have this property: > A double has at least 14 significant digits (about 14-16) and > 9007199254740991. I.e. 2^53 - 1 or, in most C implementations these days (1llu << DBL_MANT_DIG) - 1 This works when FLT_RADIX is 2, but it need not be. > I wonder whether one can somehow find the smallest > omega for the type double that is required by the > standard. I don't think you can get more than an approximation to it and probably quite a crude one at that. The standard does not give minimum values for DBL_MANT_DIG as you have no doubt found out already. <snip> -- Ben. |
Robert Wessel <robertwessel2@yahoo.com>: Feb 09 11:58PM -0600 On Fri, 10 Feb 2017 01:15:50 +0000, Ben Bacarisse >I.e. 2^53 - 1 or, in most C implementations these days > (1llu << DBL_MANT_DIG) - 1 >This works when FLT_RADIX is 2, but it need not be. (pow(FLT_RADIX, DBL_MANT_DIG) - 1) Should get close. That does have the disadvantage that it can't be computed at compile time. And resulting in a FP number. >I don't think you can get more than an approximation to it and probably >quite a crude one at that. The standard does not give minimum values >for DBL_MANT_DIG as you have no doubt found out already. I'm not sure what the OP is asking, but if it's about the minimum possible "omega" on a conforming implementation (rather than the actual omega of a particular implementation), then the answer would appear to be approximately: (pow(10,DBL_DIG) - 1) |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 12:52AM Newsgroups: comp.lang.c,comp.lang.c++ #include <float.h> Some implementations have this property: A double has at least 14 significant digits (about 14-16) and 9007199254740991. is the largest double value that can be added to 1.0 giving the "next" double value 9007199254740992. (when one adds 1.0 again, the value will not change anymore). I don't know whether there is a common name for such a value. I will call it (just for this post) "omega". (In the above example, the omega is 9007199254740991.) Now, the standard does not require 14 significant digits, but 10 (DBL_DIG). I wonder whether one can somehow find the smallest omega for the type double that is required by the standard. Clearly, omega >= 0 and omega <= DBL_MAX. Newsgroups: comp.lang.c,comp.lang.c++ |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 01:25AM >collection of pointers vs collection of objects Collections of pointers /are/ collections of objects, because pointers /are/ objects! >std::vector<Employee> was probably not as good as >std::vector<Employee *> or std::vector<std::shared_ptr<Employee>> Which results do you get when you do a microbenchmark for both cases (sum the salaries of all employees) for some large number of employees. Which byte width and access times do the L1 cache, the L2 cache and the main memory have on your system? What does the processor have to do in both cases (you might compile the benchmark and look at the assembler output). |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 10 03:30AM >It's nearly the same. But it might not be an accurate test, because >everything is nearly contiguous since nothing else in the process is >being allocated. Yes, I just confirmed your results. The difference when one just fills the two vectors and then loops through each one is negligible. The common wisdom is that a vector with all data in place is much faster than a vector with pointers to heap instances. But in this special benchmark, it cannot be observed. Your interpretation about locality for the allocated blocks might be correct. When I vary the size of the objects on the heap (so that they do not all have the same size), then the runtime for the vector with pointers gets worse by a factor between 2 and 8. The contrast is not as large as expected by me. I then also used ::std::shuffle on the pointer vector before summing, with similar results (the vector pointer is slower, but only by a factor between something like 2 and 8). |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment