- A different algorithm - 9 Updates
- collection of pointers vs collection of objects - 6 Updates
- The required minimum omega for double - 2 Updates
- My solution to Christopher Pisz' bobbleboard, in new dialect Expressive C++ - 2 Updates
- cmsg cancel <MPG.33044775b84948e298f957@news.altopia.com> - 2 Updates
Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 02:17PM +0200 On 10.02.2017 9:43, Christopher J. Pisz wrote: > What if instead of traversing the board and checking if what we find on > every node along our path is some word in our huge dictionary, we > checked if words in the dictionary were on the board instead. Checking all words in the dictionary is O(N) again - i.e. not better. Why do I have a feeling that you are not actually reading our e-mails? Generate a dictionary of 1e6 random words and test your solutions vs Alf's solution and then it becomes immediately clear which solutions are faster, no need for brilliant dream ideas. If something is slow, profile it, find the bottlenecks and fix them. BTW, this kind of work is a good exercise for a real performance-related C++ job where one often needs to profile and optimize code. Plus, you get some insight about what things are slow and what are fast, on your current hardware at least. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 01:09PM On Fri, 10 Feb 2017 01:43:45 -0600 > topping > Well, if we don't find the letters "to" then we can eliminate ALL > those words! I haven't looked at your implementation and I may have misunderstood what you mean, nor have I ever played boggle, but as I understand the rules you have given, wouldn't either approach necessarily work that way, so I don't see how this could be an improvement. Take what you say was your original approach of "checking if what we find on every node along our path is some word in our huge dictionary", and take as an initial test point the simplest possible dictionary implementation of a sorted std::vector and the most naive and uncomplicated algorithm: to search for a word by picking a starting square on the board, take the letter given (say 't') and then restrict all further consideration of the dictionary for that starting square by reference to the (already sorted) bounded sub-range of those words in the dictionary beginning with 't'. You would then test all the possible second and subsequent letters of the word against the dictionary recursively by reference to ever more refined slices of the dictionary, until you have found all the words in the dictionary beginning on the chosen starting square; and then move on to the next starting square. That is not to say that there aren't better algorithms: there almost certainly are. Your idea of caching looks promising, so that already checked sub-ranges can be reused for different starting squares. But as I say, I may also have misunderstood the rules of the game that you posted or the implementation that you developed. Chris |
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:34PM >C++ is getting REALLY performance orientated. I used to get a job within >2 days just by knowing what virtual inheritance is. Now, I don't really >know how to "Get better" Your solution is way overkill. As someone who has interviewed hundreds of candidates for jobs that require C++ coding skills, I'd also be leary of such a solution. There is no need for trees or fancy data structures for a simple problem like this. I'm looking for programmers that think out-of-the-box and craft efficient solutions. This uses a rather clever technique to simplify the algorithm. 50 lines (more would be needed to read the character matrix dynamically): #include <stddef.h> #include <stdint.h> #include <stdlib.h> #include <stdio.h> const char matrix[7][6] = { { '@', '@', '@', '@', '@', '@' }, { '@', 'F', 'G', 'G', 'Y', '@' }, { '@', 'R', 'O', '6', '7', '@' }, { '@', 'P', 'O', 'P', 'E', '@' }, { '@', 'H', 'A', 'M', 'Z', '@' }, { '@', '@', '@', '@', '@', '@' }, { '\0', '\0', '\0', '\0', '\0', '\0' }, }; bool checkchar(const char *cp, size_t row, size_t col) { if (*++cp == '\0') return true; for (size_t r = row - 1; r <= (row + 1); r++) { for (size_t c = col - 1; c <= (col + 1); c++) { if ((r == row) && (c == col)) continue; if (matrix[r][c] == *cp) { if (checkchar(cp, r, c)) return true; } } } return false; } int main(int argc, const char **argv, const char **envp) { for (size_t arg = 1; arg < argc; arg++) { const char *cp = argv[arg]; const char *mp = &matrix[0][0]; while ((*mp != '\0') && (*mp != *cp)) mp++; if (*mp != '\0') { if (checkchar(cp, ((ptrdiff_t)mp - (ptrdiff_t)matrix)/6, ((ptrdiff_t)mp - (ptrdiff_t)matrix)%6)) { printf("match found for '%s'\n", argv[arg]); } } } return 0; } $ /tmp/abc GO FACE FROG FROGGY HAM HOPE POO POOP ZOO match found for 'GO' match found for 'FROG' match found for 'FROGGY' match found for 'HAM' match found for 'HOPE' match found for 'POO' match found for 'POOP' |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 04:30PM +0100 On 10.02.2017 15:34, Scott Lurndal wrote: > match found for 'HOPE' > match found for 'POO' > match found for 'POOP' I like the way you think here, of just demonstrating that one understands recursive search, and thus only producing a special case answer. After all it was an interview question. And assuming a competent evaluation this would give high score on "result-oriented". Still, this code produces a false positive for "GOGO", and it fails to find (i.e. it produces a false negative for) "G7". And I think that would lower one's score on "attention to detail". In the same direction, the unused and in C++ non-standard `envp`. So, it's not entirely clear to me that this short clever code would improve one's chances of being hired. Still it's impressive. I wish I would have thunk of that :D . Cheers!, - Alf |
scott@slp53.sl.home (Scott Lurndal): Feb 10 03:45PM >evaluation this would give high score on "result-oriented". >Still, this code produces a false positive for "GOGO", and it fails to >find (i.e. it produces a false negative for) "G7". Ah, missed the "not-yet-visited" clause for GOGO. I'm not the interview candidate, and I whipped that out in about 15 minutes. If I were doing production code, there would be much more testing. (I also missed the "Qu" -> "Q" transformation, but that's just an additional line: if (*mp != '\0') { if ((*cp == 'q') && (*(cp+1) == 'u')) cp++; >And I think that would lower one's score on "attention to detail". >In the same direction, the unused and in C++ non-standard `envp`. Forty years of habit are hard to break. >So, it's not entirely clear to me that this short clever code would >improve one's chances of being hired. Still it's impressive. I wish I >would have thunk of that :D . Like I said, If I were submitting it as a interview answer, it would have been much more rigorously checked. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 01:20PM -0600 On 2/10/2017 8:34 AM, Scott Lurndal wrote: > match found for 'HOPE' > match found for 'POO' > match found for 'POOP' I am not really able to follow the idea here. If it solves it in as little lines as written, then it is wonderful, but I am finding it hard to follow with that the bird's eye view of the "plan" in text. I am looking at the math without knowing the answer it is trying to find. |
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 02:58PM -0600 On 2/10/2017 1:20 PM, Christopher J. Pisz wrote: > little lines as written, then it is wonderful, but I am finding it hard > to follow with that the bird's eye view of the "plan" in text. > I am looking at the math without knowing the answer it is trying to find. I can see we are iterating through the matrix as a one dimensional array, but we're checking every single letter of every single word in the dictionary still, at least until a letter is rejected along the path. So, while compact, this is not necessarily going to perform any better, right? |
scott@slp53.sl.home (Scott Lurndal): Feb 10 09:13PM >the dictionary still, at least until a letter is rejected along the >path. So, while compact, this is not necessarily going to perform any >better, right? The algorithm first scans the matrix for the first character of the test string[*]. If not found, go to the next string. So worst case, we look at each element in the matrix once. Best case, we look at one single matrix entry to start the word. If the first character of the test string is in the matrix, then we check each of the 8 surrounding characters for a match against the second character, and bail if not found. If found, recurse. We never allocate or deallocate memory (which your algorithms do). The algorithm needs a small update to not visit the same matrix location twice for a given word while recursing. With a matrix this small, the entire thing fits into the processor cache, with a larger matrix, the prefetcher would kick in. It would be require a bit more space to handle multibyte UTF-8, UTF-16 or UTF-32 glyph data. [*] A really good compiler would generate a SCASB instruction for this on intel systems. |
scott@slp53.sl.home (Scott Lurndal): Feb 10 09:15PM >the dictionary still, at least until a letter is rejected along the >path. So, while compact, this is not necessarily going to perform any >better, right? The "trick" here (which I learned back in 1979) is to put a fence around the matrix so the algorithm doesn't have to special case cells at the borders. |
"Öö Tiib" <ootiib@hot.ee>: Feb 10 03:26AM -0800 On Friday, 10 February 2017 02:03:23 UTC+2, Christopher J. Pisz wrote: > std::vector<Employee *> or std::vector<std::shared_ptr<Employee>> > Because of the copy construction on insert. > The fellow did not seem to like that answer at all. Yes, since case where large sequence gets performance-affectingly frequent inserts and erases in middle of it is quite rare. When it is That Case then it is very likely that (may be intrusive) list or set will outperform all of given contenders by large margin. When it is not That Case then 'std::vector<Employee>' usually wins. > was silly. > What differences do you guys know about the two ways of containing > objects and why might you choose one over the other? :) What *two* ways? I will list few ways of how i imagine a "vector of employees". Note that it is limited to vector but we have *lot* of other containers. Assumption is that vector in general is what we need and now there are the constraints: 1) A 'std::vector<std::unique_ptr<Employee>>' I would pick when there are meant to be dynamic polymorphic classes derived from Employee and the vector owns the Employees. 2) A 'std::vector<std::shared_ptr<Employee>>' I would pick when ownership of Employee is shared, for example when same Employee may be simultaneously in several of those vectors and none of those is then owner of it. 3) A 'std::vector<std::reference_wrapper<Employee>>' I would pick when the Employees are owned by something else than that vector and life-time of those is known to last longer than that of vector. 4) A 'std::vector<std::weak_ptr<Employee>>' I would pick when the Employees are owned by something else than that vector and the life-times may last less than life-time of vector. 5) A 'std::vector<Employee*>' I can't imagine situation where I would be picking it (, but never say never). 6) A 'std::vector<Employee>' I would pick when the vector owns those Employees and there are no dynamic polymorphism of Employees. Like you see *none* are about performance. ;) |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 12:05PM On Fri, 10 Feb 2017 03:26:05 -0800 (PST) > > std::vector<Employee *> or std::vector<std::shared_ptr<Employee>> > > Because of the copy construction on insert. > > The fellow did not seem to like that answer at all. [snip] > 6) A 'std::vector<Employee>' I would pick when the vector owns those > Employees and there are no dynamic polymorphism of Employees. > Like you see *none* are about performance. ;) All very reasonable, but if the interviewer was unimpressed with Christopher's answer about how to "optimize some code snippet" and raised the issue of "cache misses", he wanted to know Christopher's views on designing for locality, and that is about performance. You get the job by dealing with the issues put to you. The "two" ways referred to by Christopher I read as meaning (i) hold by value, and (ii) hold by reference. Once you have decided you are going to hold by reference then all the issues you mention also become relevant of course. I agree that if the Employee type is polymorphic that forces your hand, but given that the code snippet Christopher was given to optimize seemed (if I read his posting correctly) to involve a std::vector<Employee> object in the context of optimization, I think one can take it that is wasn't. Chris |
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:48PM >I mentioned that depending on the size of the object a collection of >pointers might be better. >He asked about cache misses. The effectiveness of Caches (and TLB's) is dependent upon the degree of locality of references. A cache line on Intel is 64-bytes (on some ARM64 processors it is 128 bytes). Most processors have some internal HW prefetching algorithm in the cache subsystem to anticipate future requests, and most of those algorithms are stride-based. So, random pointer chasing is likely to defeat the prefetching algorithms (software can alleviate this somewhat by using prefetching instructions which are available on both intel and arm64 processors to hint to the cache that some line will be needed in the near future and the cache can load it in anticipation asynchronously to the execution of the thread). Walking through a vector or array of values will have a high cache hit rate (close to 100% if the prefetcher is working well), while chasing a pointer through memory will generally have a very low hit rate because the processor cannot anticipate the next memory reference. If an instance of Employee required 256 bytes in memory, and it had a 64-bit employee id field, for example, to walk through an array of Employee objects looking for a specific employee id, you'd load one cache line for the 64-byte region containing the employee id. A stride based prefetcher can anticipate this an have the next cache line ready by the time you're looking at the employee id field in the next sequential element. It can't do this if you're walking a vector of pointers. A cache hit (l1) is about 3 cycles (1ns) on skylake, a hit to L2 is 11 cycles (4ns at 3Ghz) and a hit to L3 is about 40 cycles. A miss at all three levels costs about 60 to 100ns to go to DRAM. So a miss is about 33x slower than a hit at L1. |
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:51PM >> collection in there anyway. >A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably >not storing that much info per Employee. Thats L3 cache. L1 is about 32k, L2 is generally 256k per core and L3 is shared by all cores. Doesn't matter unless you only have one Employee. If you have 10,000 employees, and each employee object is 256 bytes, that won't fit in L3 (even leaving aside code footprint and the other core(s)). And the vast majority of real-world datasets don't fit cache, not even close. ThunderX has 24Mb cache shared by 48 cores, for example, and 72kb split I/D at level 1. Any interesting dataset will be much larger. |
"Öö Tiib" <ootiib@hot.ee>: Feb 10 08:19AM -0800 On Friday, 10 February 2017 14:05:27 UTC+2, Chris Vine wrote: > raised the issue of "cache misses", he wanted to know Christopher's > views on designing for locality, and that is about performance. You > get the job by dealing with the issues put to you. I have been sort of self-employed past quarter of century so my advice is likely questionable there. What one has to do in interview is perhaps to find out what the interviewer wants and to deliver that. I generally ignore all such performance considerations until I have working thing that really does the actual work that was needed wholly AND also I have sets of actual data with what it does that. It is typically about 5% of product code that matters at all to performance and so grinding 95% of it preliminarily is waste of effort. Was he really offered such circumstances? I doubt it. Therefore answer what interviewer wants to hear however nonsensical it sounds. Later sort out if you want to work for that company or not. > given to optimize seemed (if I read his posting correctly) to involve a > std::vector<Employee> object in the context of optimization, I think one > can take it that is wasn't. These were not issues. There is just the nature of data about what I think when I pick one or other style of "vector of employees". It can be that I don't know the nature so clearly yet. Then I take 'std::vector<Employee>' until I find out. Code can be edited and will be edited and if I don't yet know nature of data clearly then there is no way how I can ensure that it performs optimally. |
Manfred <noname@invalid.add>: Feb 10 07:07PM +0100 On 02/10/2017 05:19 PM, Öö Tiib wrote: > On Friday, 10 February 2017 14:05:27 UTC+2, Chris Vine wrote: >> On Fri, 10 Feb 2017 03:26:05 -0800 (PST) >> Öö Tiib <ootiib@hot.ee> wrote: <snip> >> raised the issue of "cache misses", he wanted to know Christopher's >> views on designing for locality, and that is about performance. You >> get the job by dealing with the issues put to you. I think you also get the job by giving exhaustive answers. The points above seem pretty much on topic to me. <snip> > until I find out. Code can be edited and will be edited and if I don't yet > know nature of data clearly then there is no way how I can ensure that it > performs optimally. I agree that the first rationale for the choice is the nature of data, as you correctly indicated. I may add that it follows from your list that you would pick case 6) unless there is polymorphism or the vector does not own the Employees, which happens to be the most performing solution (not surprisingly, since C++ usually likes the simplest solution to be the most performing as well) |
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:57PM > (pow(FLT_RADIX, DBL_MANT_DIG) - 1) >Should get close. That does have the disadvantage that it can't be >computed at compile time. And resulting in a FP number. With GCC you can do this as a compile time constexpr. e.g., something similar to: constexpr unsigned int log2(uint64_t n) { return (n<=1) ? 0: (64-__builtin_clzll(n-1)); }; |
James Kuyper <jameskuyper@verizon.net>: Feb 10 01:06PM -0500 On 02/09/2017 07:52 PM, Stefan Ram wrote: > 9007199254740991. > is the largest double value that can be added to 1.0 > giving the "next" double value Do you really mean "next double value" or "next integer"? The next integer is 1 higher than Omega. The "next double value" will, in general, be Omega+FLT_RADIX. I'm going to explain below why "next double value" is problematic. Then I'll go ahead on the assumption that you meant "next integer". > omega for the type double that is required by the > standard. > Clearly, omega >= 0 and omega <= DBL_MAX. You've defined Omega in terms of what happens when you add 1.0 to the number. However, that will depend upon the rounding mode. Consider the smallest integer for which the next integer has no representation as a double, lets call it Omega1, to keep it distinct from your Omega. Omega1 and Omega1+FLT_RADIX will both be representable. If FLT_RADIX==2, those values will both be equally far from the mathematical value of Omega1 + 1. The C standard imposes NO requirements of it's own on the accuracy of such floating point expressions (5.2.4.2.2p6). None - whatsoever. In particular, this means that a fully conforming implementation of C is allowed to implement floating point math so inaccurately that DBL_MAX-DBL_MIN < DBL_MIN - DBL_MAX. However, if __STDC_IEC_559__ is pre#defined by the implementation, then the requirements of annex F apply (6.10.8.3p1), which are essentially equivalent to IEC 60559:1989, which is equivalent ANSI/IEEE 754-1985 (F1p1). In that case, if FLT_RADIX == 2, and you add 1.0 to Omega1, the result will be rounded to a value of either Omega1 or Omega1+2, depending upon the current rounding mode. If the rounding mode is FE_TONEAREST, FE_DOWNWARD, or FE_TOWARDZERO, then it will round to Omega1, and that will be true of every value up to 2*Omega1, so 2*Omega1 will be the quantity you define as Omega. On the other hand, if the rounding mode is FE_UPWARD, then Omega is the same as Omega1. If you meant "next integer", that ambiguity disappears. Regardless of rounding mode, Omega1 is the the smallest integer to which you can add 1.0 without getting the value of the next integer - because, by definition, that next integer cannot be represented. Section 5.2.4.2.2 specifies a model for how floating point types are represented. That model need not actually be followed, but the required behavior of floating point operations, and the ranges of representable values are specified in terms of that model; a floating point representation for which that model is sufficiently bad might not have any way of correctly implementing the standard's requirements. So I'll restrict my discussion to implementations for which that model is perfectly accurate. The standard uses subscripts, superscripts, and greek letters in it's description of the model, which don't necessarily display meaningfully in a usenet message. I'll indicate subscripts and superscripts by preceeding them with _ and ^, respectively, and I'll replace the summation with Sum(variable, lower limit, upper limit, expression). > f_k nonnegative integers less than b (the significand digits) > A floating-point number (x) is defined by the following model: > x = s b^e Sum(k, 1, p, f_k b^-k) e_min <= e <= e_max The term in that sum with the smallest value is the one for k==p. f_p gets multiplied by both b^e and b^-p, and therefore by b^(e-p). The lowest integer that cannot be represented exactly is one which would require a precision p+1 to represented, because b^(e-p) is greater than 1. Therefore, the Omega1 must have e == p+1, f_1 == 1, and f_k == 0 for all other values of k. Therefore, Omega1 = (+1) * b^(p+1) * 1*b^(-1) == b^p == b/b^(1-p) == FLT_RADIX/DBL_EPSILON Note: even on implementations that pre#define __STDC_IEC_559__, a certain amount of inaccuracy is still permitted in both the interpretation of floating point constants and in floating point expressions. Jut barely enough, in fact, to allow FLT_RADIX/DBL_EPSILON to come out as Omega1-1. However, that shouldn't happen if those macros are defined as hexadecimal floating point constants, which must be converted exactly if they can be represented exactly. The smallest permitted value for FLT_RADIX is 2, and the maximum permitted value for DBL_EPSILON is 1e-9, so omega1-1 cannot be less than 2/1e-9. A value for DBL_EPSILON of exactly 1e-9 is not possible if FLT_RADIX is 2; the closest you can get is with p=29, so DBL_EPSILON = 2^(-30) == 1.0/(1,073,741,824), so the smallest permitted value for omega1-1 is 2/(2^-30) == 2,147,483,648. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 03:59PM +0100 I took the fun-with-macros idea all the way to /a new dialect of/ C++, with much of the operator notation replaced with readable words. ¹Tentative name: "Expressive C++". The macros, which serve as pseudo keywords, all start with `$`, which is not valid in standard C++. I therefore believe that this logical namespace is entirely unused, as of yet, but strangely it is supported by the major compilers such as g++, Visual C++ and clang. That is, they all support using `$` in identifiers. I couldn't resist trying to use the feature. This code requires compiler support also for some other language extensions that I believe are present in both g++, Visual C++ and clang: • `$` in names. • `#pragma once`. • `#pragma push_macro` and `#pragma pop_macro`. The latter two pragmas are for use of the extended language in headers without imposing those macros on client code. • • • The below code for Christopher's bobble-board word generation problem is a rewrite of the pure C++11 solution that I posted earlier and does not include any of the Expressive C++ machinery. The machinery is quite a bit, so best discussed in small pieces. But for now: `$fail`, which throws an exception, adds the qualified function name at the start of the exception message (nice touch, if I may say so myself), and $start_with_arguments at the bottom of the code, invokes the specified function with the `main` arguments, catches any exception, and reports the exception message on the narrow standard error stream. Also worth noting up front, `$items_of` is so dirty that I haven't had the courage to include it in the Expressive set of macros. It's defined at the top of the source file. However, while dirty, it's delicious. [code] #include <p/expressive/all.hpp> $using_all_from_namespace( progrock::expressive ); #include <algorithm> // std::(is_sorted, remove, sort, transform) #include <assert.h> // assert #include <ctype.h> // tolower #include <fstream> // std::ifstream #include <iostream> #include <iterator> // std::(begin, end) #include <vector> // std::vector #include <string > // std::string #define $items_of( collection ) std::begin( collection ), std::end( collection ) namespace cppx { $using_from_namespace( std, ostream, vector, runtime_error, string ); struct Position { Index x; Index y; }; inline $function operator<<( ref_<ostream> stream, ref_<const Position> pos ) -> ref_<ostream> { return stream << "{" << pos.x << ", " << pos.y << "}"; } template< class Item > class Matrix { private: struct Wrapped_item { Item object; }; // Avoid problems with vector<bool>. Size row_length_; vector<Wrapped_item> items_; $function index_for( Index x, Index y ) const -> Index { return y*row_length_ + x; } public: $function size() const -> Size { return row_length_; } $function item( const Index x, const Index y ) const -> Item { return items_[index_for( x, y )].object; } $function item( const Index x, const Index y ) -> Item& { return items_[index_for( x, y )].object; } Matrix(): Matrix{ 0 } {} explicit Matrix( const Size size ) : row_length_( size ) , items_( size*size ) {} }; inline $function to_lowercase( const char ch ) -> char { return $as<char>( tolower( $as<Byte>( ch ) ) ); } inline $function starts_with( ref_<const string> prefix, ref_<const string> s ) -> bool { return s.substr( 0, prefix.length() ) == prefix; } } // namespace cppx namespace app { $using_all_from_namespace( std ); $using_all_from_namespace( cppx ); $function load_dictionary( ref_<const string> file_path ) -> vector<string> { ifstream data{ file_path }; $hopefully( not data.fail() ) or $fail( "Failed to open dictionary `" + file_path + "`." ); vector<string> result; string word; while( getline( data, word ) ) { result.push_back( move( word ) ); } return result; } $function chars_from( string s ) -> string { $let end_of_nonspaces = remove( $items_of( s ), ' ' ); string chars{ begin( s ), end_of_nonspaces }; transform( $items_of( chars ), begin( chars ), to_lowercase ); return chars; } $function load_bubble_board( ref_<const string> file_path ) -> Matrix<char> { ifstream data{ file_path }; $hopefully( not data.fail() ) or $fail( "Failed to open dictionary `" + file_path + "`." ); string line; string chars; // Whitespace removed. getline( data, line ) or fail( "Unable to read first line of `" + file_path + "`." ); chars = chars_from( line ); $let n = n_items_in( chars ); Matrix<char> result{ 2 + n }; for( Index x = 0; x < n; ++x ) { result.item( x + 1, 1 ) = chars[x]; } for( Index y = 1; y < n; ++y ) { getline( data, line ) or $fail( "Unable to read line " + to_string( y + 1 ) + " of `" + file_path + "`." ); chars = chars_from( line ); hopefully( n_items_in( chars ) == n ) or $fail( "Inconsistent row lengths in board data `" + file_path + "`" ); for( Index x = 0; x < n; ++x ) { result.item( x + 1, y + 1 ) = chars[x]; } } return result; } using Dictionary_iter = vector<string>::const_iterator; $procedure recursively_append_words_from( const Dictionary_iter start_dict_range, const Dictionary_iter end_dict_range, ref_<const Matrix<char>> chars, const Position position, ref_<Matrix<bool>> visited, ref_<string> word, ref_<vector<string>> result ) { $let original_word_length = word.length(); $let ch = chars.item( position.x, position.y ); if( ch == 'q' ) { word += "qu"; } else { word += ch; } $let it_dict_entry = lower_bound( start_dict_range, end_dict_range, word ); if( starts_with( word, *it_dict_entry ) ) { $let length = word.length(); if( length >= 3 and length == it_dict_entry->length() ) { result.push_back( word ); } visited.item( position.x, position.y ) = true; for( $each dy $in range( -1, +1 ) ) for( $each dx $in range( -1, +1 ) ) { Position const new_position = {position.x + dx, position.y + dy}; if( not visited.item( new_position.x, new_position.y ) ) { recursively_append_words_from( it_dict_entry, end_dict_range, chars, new_position, visited, word, result ); } } visited.item( position.x, position.y ) = false; } word.resize( original_word_length ); } $procedure append_words_from( ref_<const vector<string>> dictionary, ref_<const Matrix<char>> chars, ref_<const Position> start, ref_<vector<string>> result ) { string word; $let n = chars.size(); Matrix<bool> visited{ n }; for( $each i $in i_up_to( n ) ) { visited.item( i, 0 ) = true; visited.item( 0, i ) = true; visited.item( i, n - 1 ) = true; visited.item( n - 1, i ) = true; } recursively_append_words_from( $items_of( dictionary ), chars, start, visited, word, result ); } $procedure cpp_main( ref_<const vector<string>> args ) { $hopefully( n_items_in( args ) == 3 ) or $fail( "Usage: " + args[0] + " DICTIONARY BUBBLEBOARD" ); vector<string> const dictionary = load_dictionary( args[1] ); Matrix<char> const board = load_bubble_board( args[2] ); assert( is_sorted( $items_of( dictionary ) ) ); $let n = board.size(); vector<string> words; for( $each y $in i_up_to( n ) ) for( $each x $in i_up_to( n ) ) { append_words_from( dictionary, board, {x + 1, y + 1}, words ); } sort( $items_of( words ) ); $let start_of_duplicates = unique( $items_of( words ) ); words.erase( start_of_duplicates, end( words ) ); for( $each word $in words ) { cout << word << "\n"; } } } // namespace app $start_with_arguments( app::cpp_main ) [/code] • • • The above code, plus the supporting machinery not shown here, was compiled with MinGW g++ 6.3.0 and Visual C++ 2015 update 2. I haven't tried clang, but as I think clang requires an option in order to not warn about use of `$` in names. Cheers!, - Alf Notes: ¹ I am aware that the name "Expressive C++" and its natural abbreviations have been used for various things. E.g. "EC++" was/is (?) a standard for limited C++ on embedded devices. "XC++" is some mysterious Japanese C++ library project on SourceForge. And Eric Niebler, author of the C++20 ranges library, has written something he's called the Expressive C++ Series, but I don't know what that was. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 04:51PM +0100 On 10.02.2017 15:59, Alf P. Steinbach wrote: > } > return result; > } Oh, I forgot to fix the loops at the end there, to Expressive syntax: $function load_bubble_board( ref_<const string> file_path ) -> Matrix<char> { ifstream data{ file_path }; $hopefully( not data.fail() ) or $fail( "Failed to open dictionary `" + file_path + "`." ); string line; string chars; // Whitespace removed. getline( data, line ) or fail( "Unable to read first line of `" + file_path + "`." ); chars = chars_from( line ); $let n = n_items_in( chars ); Matrix<char> result{ 2 + n }; for( $each x $in i_up_to( n ) ) { result.item( x + 1, 1 ) = chars[x]; } for( $each y $in range( 1, n - 1 ) ) { getline( data, line ) or $fail( "Unable to read line " + to_string( y + 1 ) + " of `" + file_path + "`." ); chars = chars_from( line ); hopefully( n_items_in( chars ) == n ) or $fail( "Inconsistent row lengths in board data `" + file_path + "`" ); for( $each x $in i_up_to( n ) ) { result.item( x + 1, y + 1 ) = chars[x]; } } return result; } As I recall from some years ago the generated machine code is the same. So, this way of writing simple counting loops can be freely adopted even without the Expressive `$`-keywords. Cheers!, - Alf |
udpbot <bleachbot@httrack.com>: Feb 08 06:07AM +0100 |
udpbot <bleachbot@httrack.com>: Feb 08 05:55AM +0100 |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment