Friday, February 10, 2017

Digest for comp.lang.c++@googlegroups.com - 21 updates in 5 topics

Paavo Helde <myfirstname@osa.pri.ee>: Feb 10 02:17PM +0200

On 10.02.2017 9:43, Christopher J. Pisz wrote:
 
> What if instead of traversing the board and checking if what we find on
> every node along our path is some word in our huge dictionary, we
> checked if words in the dictionary were on the board instead.
 
Checking all words in the dictionary is O(N) again - i.e. not better.
 
Why do I have a feeling that you are not actually reading our e-mails?
Generate a dictionary of 1e6 random words and test your solutions vs
Alf's solution and then it becomes immediately clear which solutions are
faster, no need for brilliant dream ideas.
 
If something is slow, profile it, find the bottlenecks and fix them.
BTW, this kind of work is a good exercise for a real performance-related
C++ job where one often needs to profile and optimize code. Plus, you
get some insight about what things are slow and what are fast, on your
current hardware at least.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 01:09PM

On Fri, 10 Feb 2017 01:43:45 -0600
> topping
 
> Well, if we don't find the letters "to" then we can eliminate ALL
> those words!
 
I haven't looked at your implementation and I may have misunderstood
what you mean, nor have I ever played boggle, but as I understand the
rules you have given, wouldn't either approach necessarily work that
way, so I don't see how this could be an improvement.
 
Take what you say was your original approach of "checking if what we
find on every node along our path is some word in our huge dictionary",
and take as an initial test point the simplest possible dictionary
implementation of a sorted std::vector and the most naive and
uncomplicated algorithm: to search for a word by picking a starting
square on the board, take the letter given (say 't') and then restrict
all further consideration of the dictionary for that starting square by
reference to the (already sorted) bounded sub-range of those words in
the dictionary beginning with 't'. You would then test all the possible
second and subsequent letters of the word against the dictionary
recursively by reference to ever more refined slices of the dictionary,
until you have found all the words in the dictionary beginning on the
chosen starting square; and then move on to the next starting square.
 
That is not to say that there aren't better algorithms: there almost
certainly are. Your idea of caching looks promising, so that already
checked sub-ranges can be reused for different starting squares. But as
I say, I may also have misunderstood the rules of the game that you
posted or the implementation that you developed.
 
Chris
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:34PM


>C++ is getting REALLY performance orientated. I used to get a job within
>2 days just by knowing what virtual inheritance is. Now, I don't really
>know how to "Get better"
 
Your solution is way overkill. As someone who has interviewed hundreds
of candidates for jobs that require C++ coding skills, I'd also be leary
of such a solution. There is no need for trees or fancy data structures
for a simple problem like this. I'm looking for programmers that think
out-of-the-box and craft efficient solutions.
 
This uses a rather clever technique to simplify the algorithm.
 
50 lines (more would be needed to read the character matrix dynamically):
 
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
 
const char matrix[7][6] = {
{ '@', '@', '@', '@', '@', '@' },
{ '@', 'F', 'G', 'G', 'Y', '@' },
{ '@', 'R', 'O', '6', '7', '@' },
{ '@', 'P', 'O', 'P', 'E', '@' },
{ '@', 'H', 'A', 'M', 'Z', '@' },
{ '@', '@', '@', '@', '@', '@' },
{ '\0', '\0', '\0', '\0', '\0', '\0' },
};
 
 
bool
checkchar(const char *cp, size_t row, size_t col)
{
if (*++cp == '\0') return true;
 
for (size_t r = row - 1; r <= (row + 1); r++) {
for (size_t c = col - 1; c <= (col + 1); c++) {
if ((r == row) && (c == col)) continue;
if (matrix[r][c] == *cp) {
if (checkchar(cp, r, c)) return true;
}
}
}
return false;
}
 
int
main(int argc, const char **argv, const char **envp)
{
for (size_t arg = 1; arg < argc; arg++) {
const char *cp = argv[arg];
const char *mp = &matrix[0][0];
 
while ((*mp != '\0') && (*mp != *cp)) mp++;
if (*mp != '\0') {
if (checkchar(cp, ((ptrdiff_t)mp - (ptrdiff_t)matrix)/6, ((ptrdiff_t)mp - (ptrdiff_t)matrix)%6)) {
printf("match found for '%s'\n", argv[arg]);
}
}
}
 
return 0;
}
 
$ /tmp/abc GO FACE FROG FROGGY HAM HOPE POO POOP ZOO
match found for 'GO'
match found for 'FROG'
match found for 'FROGGY'
match found for 'HAM'
match found for 'HOPE'
match found for 'POO'
match found for 'POOP'
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 04:30PM +0100

On 10.02.2017 15:34, Scott Lurndal wrote:
> match found for 'HOPE'
> match found for 'POO'
> match found for 'POOP'
 
I like the way you think here, of just demonstrating that one
understands recursive search, and thus only producing a special case
answer. After all it was an interview question. And assuming a competent
evaluation this would give high score on "result-oriented".
 
Still, this code produces a false positive for "GOGO", and it fails to
find (i.e. it produces a false negative for) "G7".
 
And I think that would lower one's score on "attention to detail".
 
In the same direction, the unused and in C++ non-standard `envp`.
 
So, it's not entirely clear to me that this short clever code would
improve one's chances of being hired. Still it's impressive. I wish I
would have thunk of that :D .
 
 
Cheers!,
 
- Alf
scott@slp53.sl.home (Scott Lurndal): Feb 10 03:45PM

>evaluation this would give high score on "result-oriented".
 
>Still, this code produces a false positive for "GOGO", and it fails to
>find (i.e. it produces a false negative for) "G7".
 
Ah, missed the "not-yet-visited" clause for GOGO.
 
I'm not the interview candidate, and I whipped that out in
about 15 minutes. If I were doing production code, there
would be much more testing.
 
(I also missed the "Qu" -> "Q" transformation, but that's just
an additional line:
 
if (*mp != '\0') {
if ((*cp == 'q') && (*(cp+1) == 'u')) cp++;
 
 
>And I think that would lower one's score on "attention to detail".
 
>In the same direction, the unused and in C++ non-standard `envp`.
 
Forty years of habit are hard to break.
 
 
>So, it's not entirely clear to me that this short clever code would
>improve one's chances of being hired. Still it's impressive. I wish I
>would have thunk of that :D .
 
Like I said, If I were submitting it as a interview answer, it would
have been much more rigorously checked.
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 01:20PM -0600

On 2/10/2017 8:34 AM, Scott Lurndal wrote:
> match found for 'HOPE'
> match found for 'POO'
> match found for 'POOP'
 
I am not really able to follow the idea here. If it solves it in as
little lines as written, then it is wonderful, but I am finding it hard
to follow with that the bird's eye view of the "plan" in text.
 
I am looking at the math without knowing the answer it is trying to find.
"Christopher J. Pisz" <cpisz@austin.rr.com>: Feb 10 02:58PM -0600

On 2/10/2017 1:20 PM, Christopher J. Pisz wrote:
> little lines as written, then it is wonderful, but I am finding it hard
> to follow with that the bird's eye view of the "plan" in text.
 
> I am looking at the math without knowing the answer it is trying to find.
 
I can see we are iterating through the matrix as a one dimensional
array, but we're checking every single letter of every single word in
the dictionary still, at least until a letter is rejected along the
path. So, while compact, this is not necessarily going to perform any
better, right?
scott@slp53.sl.home (Scott Lurndal): Feb 10 09:13PM

>the dictionary still, at least until a letter is rejected along the
>path. So, while compact, this is not necessarily going to perform any
>better, right?
 
The algorithm first scans the matrix for the first character of the
test string[*]. If not found, go to the next string. So worst case,
we look at each element in the matrix once. Best case, we look at
one single matrix entry to start the word.
 
If the first character of the test string is in the matrix, then
we check each of the 8 surrounding characters for a match against
the second character, and bail if not found. If found, recurse.
 
We never allocate or deallocate memory (which your algorithms do).
 
The algorithm needs a small update to not visit the same matrix
location twice for a given word while recursing.
 
With a matrix this small, the entire thing fits into the processor cache,
with a larger matrix, the prefetcher would kick in.
 
It would be require a bit more space to handle multibyte UTF-8, UTF-16
or UTF-32 glyph data.
 
[*] A really good compiler would generate a SCASB instruction for this
on intel systems.
scott@slp53.sl.home (Scott Lurndal): Feb 10 09:15PM

>the dictionary still, at least until a letter is rejected along the
>path. So, while compact, this is not necessarily going to perform any
>better, right?
 
The "trick" here (which I learned back in 1979) is to put a fence
around the matrix so the algorithm doesn't have to special case
cells at the borders.
"Öö Tiib" <ootiib@hot.ee>: Feb 10 03:26AM -0800

On Friday, 10 February 2017 02:03:23 UTC+2, Christopher J. Pisz wrote:
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
 
> Because of the copy construction on insert.
> The fellow did not seem to like that answer at all.
 
Yes, since case where large sequence gets performance-affectingly
frequent inserts and erases in middle of it is quite rare.
When it is That Case then it is very likely that (may be
intrusive) list or set will outperform all of given contenders by
large margin. When it is not That Case then 'std::vector<Employee>'
usually wins.

> was silly.
 
> What differences do you guys know about the two ways of containing
> objects and why might you choose one over the other?
 
:) What *two* ways? I will list few ways of how i imagine a "vector of
employees". Note that it is limited to vector but we have *lot* of other
containers. Assumption is that vector in general is what we need and
now there are the constraints:
 
1) A 'std::vector<std::unique_ptr<Employee>>' I would pick when there
are meant to be dynamic polymorphic classes derived from Employee
and the vector owns the Employees.
2) A 'std::vector<std::shared_ptr<Employee>>' I would pick when
ownership of Employee is shared, for example when same Employee
may be simultaneously in several of those vectors and none of those
is then owner of it.
3) A 'std::vector<std::reference_wrapper<Employee>>' I would pick
when the Employees are owned by something else than that vector
and life-time of those is known to last longer than that of vector.
4) A 'std::vector<std::weak_ptr<Employee>>' I would pick when the
Employees are owned by something else than that vector and the
life-times may last less than life-time of vector.
5) A 'std::vector<Employee*>' I can't imagine situation where I would
be picking it (, but never say never).
6) A 'std::vector<Employee>' I would pick when the vector owns those
Employees and there are no dynamic polymorphism of Employees.
 
Like you see *none* are about performance. ;)
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 10 12:05PM

On Fri, 10 Feb 2017 03:26:05 -0800 (PST)
> > std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
 
> > Because of the copy construction on insert.
> > The fellow did not seem to like that answer at all.
[snip]
> 6) A 'std::vector<Employee>' I would pick when the vector owns those
> Employees and there are no dynamic polymorphism of Employees.
 
> Like you see *none* are about performance. ;)
 
All very reasonable, but if the interviewer was unimpressed with
Christopher's answer about how to "optimize some code snippet" and
raised the issue of "cache misses", he wanted to know Christopher's
views on designing for locality, and that is about performance. You
get the job by dealing with the issues put to you.
 
The "two" ways referred to by Christopher I read as meaning (i) hold by
value, and (ii) hold by reference. Once you have decided you are going
to hold by reference then all the issues you mention also become
relevant of course. I agree that if the Employee type is polymorphic
that forces your hand, but given that the code snippet Christopher was
given to optimize seemed (if I read his posting correctly) to involve a
std::vector<Employee> object in the context of optimization, I think one
can take it that is wasn't.
 
Chris
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:48PM


>I mentioned that depending on the size of the object a collection of
>pointers might be better.
 
>He asked about cache misses.
 
The effectiveness of Caches (and TLB's) is dependent upon
the degree of locality of references. A cache line on Intel
is 64-bytes (on some ARM64 processors it is 128 bytes).
 
Most processors have some internal HW prefetching algorithm in
the cache subsystem to anticipate future requests, and most of
those algorithms are stride-based. So, random pointer chasing is
likely to defeat the prefetching algorithms (software can alleviate
this somewhat by using prefetching instructions which are available on
both intel and arm64 processors to hint to the cache that some
line will be needed in the near future and the cache can load it
in anticipation asynchronously to the execution of the thread).
 
Walking through a vector or array of values will have a high cache
hit rate (close to 100% if the prefetcher is working well), while
chasing a pointer through memory will generally have a very low
hit rate because the processor cannot anticipate the next memory
reference.
 
If an instance of Employee required 256 bytes in memory, and it had
a 64-bit employee id field, for example, to walk through an array
of Employee objects looking for a specific employee id,
you'd load one cache line for the 64-byte region
containing the employee id. A stride based prefetcher can anticipate
this an have the next cache line ready by the time you're looking at
the employee id field in the next sequential element. It can't
do this if you're walking a vector of pointers.
 
A cache hit (l1) is about 3 cycles (1ns) on skylake, a hit to L2 is
11 cycles (4ns at 3Ghz) and a hit to L3 is about 40 cycles. A
miss at all three levels costs about 60 to 100ns to go to DRAM.
 
So a miss is about 33x slower than a hit at L1.
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:51PM

>> collection in there anyway.
 
>A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably
>not storing that much info per Employee.
 
Thats L3 cache. L1 is about 32k, L2 is generally 256k per core and L3
is shared by all cores.
 
Doesn't matter unless you only have one Employee. If you have 10,000
employees, and each employee object is 256 bytes, that won't fit in L3
(even leaving aside code footprint and the other core(s)).
 
And the vast majority of real-world datasets don't fit cache, not even
close. ThunderX has 24Mb cache shared by 48 cores, for example, and
72kb split I/D at level 1. Any interesting dataset will be much larger.
"Öö Tiib" <ootiib@hot.ee>: Feb 10 08:19AM -0800

On Friday, 10 February 2017 14:05:27 UTC+2, Chris Vine wrote:
> raised the issue of "cache misses", he wanted to know Christopher's
> views on designing for locality, and that is about performance. You
> get the job by dealing with the issues put to you.
 
I have been sort of self-employed past quarter of century so my advice
is likely questionable there. What one has to do in interview is perhaps
to find out what the interviewer wants and to deliver that.
 
I generally ignore all such performance considerations until I have
working thing that really does the actual work that was needed
wholly AND also I have sets of actual data with what it does that.
It is typically about 5% of product code that matters at all to
performance and so grinding 95% of it preliminarily is waste of
effort. Was he really offered such circumstances? I doubt it. Therefore
answer what interviewer wants to hear however nonsensical it sounds.
Later sort out if you want to work for that company or not.
 
 
> given to optimize seemed (if I read his posting correctly) to involve a
> std::vector<Employee> object in the context of optimization, I think one
> can take it that is wasn't.
 
These were not issues. There is just the nature of data about what I think
when I pick one or other style of "vector of employees". It can be that I
don't know the nature so clearly yet. Then I take 'std::vector<Employee>'
until I find out. Code can be edited and will be edited and if I don't yet
know nature of data clearly then there is no way how I can ensure that it
performs optimally.
Manfred <noname@invalid.add>: Feb 10 07:07PM +0100

On 02/10/2017 05:19 PM, Öö Tiib wrote:
> On Friday, 10 February 2017 14:05:27 UTC+2, Chris Vine wrote:
>> On Fri, 10 Feb 2017 03:26:05 -0800 (PST)
>> Öö Tiib <ootiib@hot.ee> wrote:
<snip>
>> raised the issue of "cache misses", he wanted to know Christopher's
>> views on designing for locality, and that is about performance. You
>> get the job by dealing with the issues put to you.
I think you also get the job by giving exhaustive answers. The points
above seem pretty much on topic to me.
 
<snip>
> until I find out. Code can be edited and will be edited and if I don't yet
> know nature of data clearly then there is no way how I can ensure that it
> performs optimally.
 
I agree that the first rationale for the choice is the nature of data,
as you correctly indicated.
I may add that it follows from your list that you would pick case 6)
unless there is polymorphism or the vector does not own the Employees,
which happens to be the most performing solution (not surprisingly,
since C++ usually likes the simplest solution to be the most performing
as well)
scott@slp53.sl.home (Scott Lurndal): Feb 10 02:57PM


> (pow(FLT_RADIX, DBL_MANT_DIG) - 1)
 
>Should get close. That does have the disadvantage that it can't be
>computed at compile time. And resulting in a FP number.
 
With GCC you can do this as a compile time constexpr.
 
e.g., something similar to:
 
constexpr
unsigned int log2(uint64_t n) { return (n<=1) ? 0: (64-__builtin_clzll(n-1)); };
James Kuyper <jameskuyper@verizon.net>: Feb 10 01:06PM -0500

On 02/09/2017 07:52 PM, Stefan Ram wrote:
 
> 9007199254740991.
 
> is the largest double value that can be added to 1.0
> giving the "next" double value
 
Do you really mean "next double value" or "next integer"? The next
integer is 1 higher than Omega. The "next double value" will, in
general, be Omega+FLT_RADIX. I'm going to explain below why "next double
value" is problematic. Then I'll go ahead on the assumption that you
meant "next integer".
 
> omega for the type double that is required by the
> standard.
 
> Clearly, omega >= 0 and omega <= DBL_MAX.
 
You've defined Omega in terms of what happens when you add 1.0 to the
number. However, that will depend upon the rounding mode.
 
Consider the smallest integer for which the next integer has no
representation as a double, lets call it Omega1, to keep it distinct
from your Omega.
 
Omega1 and Omega1+FLT_RADIX will both be representable. If FLT_RADIX==2,
those values will both be equally far from the mathematical value of
Omega1 + 1. The C standard imposes NO requirements of it's own on the
accuracy of such floating point expressions (5.2.4.2.2p6). None -
whatsoever. In particular, this means that a fully conforming
implementation of C is allowed to implement floating point math so
inaccurately that DBL_MAX-DBL_MIN < DBL_MIN - DBL_MAX.
However, if __STDC_IEC_559__ is pre#defined by the implementation, then
the requirements of annex F apply (6.10.8.3p1), which are essentially
equivalent to IEC 60559:1989, which is equivalent ANSI/IEEE 754-1985
(F1p1). In that case, if FLT_RADIX == 2, and you add 1.0 to Omega1, the
result will be rounded to a value of either Omega1 or Omega1+2,
depending upon the current rounding mode. If the rounding mode is
FE_TONEAREST, FE_DOWNWARD, or FE_TOWARDZERO, then it will round to
Omega1, and that will be true of every value up to 2*Omega1, so 2*Omega1
will be the quantity you define as Omega. On the other hand, if the
rounding mode is FE_UPWARD, then Omega is the same as Omega1.
 
If you meant "next integer", that ambiguity disappears. Regardless of
rounding mode, Omega1 is the the smallest integer to which you can add
1.0 without getting the value of the next integer - because, by
definition, that next integer cannot be represented.
 
Section 5.2.4.2.2 specifies a model for how floating point types are
represented. That model need not actually be followed, but the required
behavior of floating point operations, and the ranges of representable
values are specified in terms of that model; a floating point
representation for which that model is sufficiently bad might not have
any way of correctly implementing the standard's requirements. So I'll
restrict my discussion to implementations for which that model is
perfectly accurate.
The standard uses subscripts, superscripts, and greek letters in it's
description of the model, which don't necessarily display meaningfully
in a usenet message. I'll indicate subscripts and superscripts by
preceeding them with _ and ^, respectively, and I'll replace the
summation with Sum(variable, lower limit, upper limit, expression).
 
> f_k nonnegative integers less than b (the significand digits)
 
> A floating-point number (x) is defined by the following model:
> x = s b^e Sum(k, 1, p, f_k b^-k) e_min <= e <= e_max
 
The term in that sum with the smallest value is the one for k==p. f_p
gets multiplied by both b^e and b^-p, and therefore by b^(e-p). The
lowest integer that cannot be represented exactly is one which would
require a precision p+1 to represented, because b^(e-p) is greater than
1. Therefore, the Omega1 must have e == p+1, f_1 == 1, and f_k == 0 for
all other values of k. Therefore,
 
Omega1 = (+1) * b^(p+1) * 1*b^(-1) == b^p
 
== b/b^(1-p) == FLT_RADIX/DBL_EPSILON
 
Note: even on implementations that pre#define __STDC_IEC_559__, a
certain amount of inaccuracy is still permitted in both the
interpretation of floating point constants and in floating point
expressions. Jut barely enough, in fact, to allow FLT_RADIX/DBL_EPSILON
to come out as Omega1-1. However, that shouldn't happen if those macros
are defined as hexadecimal floating point constants, which must be
converted exactly if they can be represented exactly.
 
The smallest permitted value for FLT_RADIX is 2, and the maximum
permitted value for DBL_EPSILON is 1e-9, so omega1-1 cannot be less than
2/1e-9. A value for DBL_EPSILON of exactly 1e-9 is not possible if
FLT_RADIX is 2; the closest you can get is with p=29, so DBL_EPSILON =
2^(-30) == 1.0/(1,073,741,824), so the smallest permitted value for
omega1-1 is 2/(2^-30) == 2,147,483,648.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 03:59PM +0100

I took the fun-with-macros idea all the way to /a new dialect of/ C++,
with much of the operator notation replaced with readable words.
 
¹Tentative name: "Expressive C++".
 
The macros, which serve as pseudo keywords, all start with `$`, which is
not valid in standard C++.
 
I therefore believe that this logical namespace is entirely unused, as
of yet, but strangely it is supported by the major compilers such as
g++, Visual C++ and clang. That is, they all support using `$` in
identifiers. I couldn't resist trying to use the feature.
 
This code requires compiler support also for some other language
extensions that I believe are present in both g++, Visual C++ and clang:
 
• `$` in names.
• `#pragma once`.
• `#pragma push_macro` and `#pragma pop_macro`.
 
The latter two pragmas are for use of the extended language in headers
without imposing those macros on client code.
 
• • •
 
The below code for Christopher's bobble-board word generation problem is
a rewrite of the pure C++11 solution that I posted earlier and does not
include any of the Expressive C++ machinery. The machinery is quite a
bit, so best discussed in small pieces. But for now: `$fail`, which
throws an exception, adds the qualified function name at the start of
the exception message (nice touch, if I may say so myself), and
$start_with_arguments at the bottom of the code, invokes the specified
function with the `main` arguments, catches any exception, and reports
the exception message on the narrow standard error stream.
 
Also worth noting up front, `$items_of` is so dirty that I haven't had
the courage to include it in the Expressive set of macros. It's defined
at the top of the source file. However, while dirty, it's delicious.
 
[code]
#include <p/expressive/all.hpp>
$using_all_from_namespace( progrock::expressive );
 
#include <algorithm> // std::(is_sorted, remove, sort, transform)
#include <assert.h> // assert
#include <ctype.h> // tolower
#include <fstream> // std::ifstream
#include <iostream>
#include <iterator> // std::(begin, end)
#include <vector> // std::vector
#include <string > // std::string
 
#define $items_of( collection ) std::begin( collection ), std::end(
collection )
 
namespace cppx {
$using_from_namespace( std, ostream, vector, runtime_error, string );
 
struct Position
{
Index x;
Index y;
};
 
inline $function operator<<( ref_<ostream> stream, ref_<const
Position> pos )
-> ref_<ostream>
{ return stream << "{" << pos.x << ", " << pos.y << "}"; }
 
template< class Item >
class Matrix
{
private:
struct Wrapped_item { Item object; }; // Avoid problems
with vector<bool>.
 
Size row_length_;
vector<Wrapped_item> items_;
 
$function index_for( Index x, Index y ) const
-> Index
{ return y*row_length_ + x; }
 
public:
$function size() const
-> Size
{ return row_length_; }
 
$function item( const Index x, const Index y ) const
-> Item
{ return items_[index_for( x, y )].object; }
 
$function item( const Index x, const Index y )
-> Item&
{ return items_[index_for( x, y )].object; }
 
Matrix(): Matrix{ 0 } {}
 
explicit Matrix( const Size size )
: row_length_( size )
, items_( size*size )
{}
 
};
 
inline $function to_lowercase( const char ch )
-> char
{ return $as<char>( tolower( $as<Byte>( ch ) ) ); }
 
inline $function starts_with( ref_<const string> prefix, ref_<const
string> s )
-> bool
{ return s.substr( 0, prefix.length() ) == prefix; }
 
} // namespace cppx
 
namespace app {
$using_all_from_namespace( std );
$using_all_from_namespace( cppx );
 
$function load_dictionary( ref_<const string> file_path )
-> vector<string>
{
ifstream data{ file_path };
$hopefully( not data.fail() )
or $fail( "Failed to open dictionary `" + file_path + "`." );
 
vector<string> result;
string word;
while( getline( data, word ) )
{
result.push_back( move( word ) );
}
return result;
}
 
$function chars_from( string s )
-> string
{
$let end_of_nonspaces = remove( $items_of( s ), ' ' );
string chars{ begin( s ), end_of_nonspaces };
transform( $items_of( chars ), begin( chars ), to_lowercase );
return chars;
}
 
$function load_bubble_board( ref_<const string> file_path )
-> Matrix<char>
{
ifstream data{ file_path };
$hopefully( not data.fail() )
or $fail( "Failed to open dictionary `" + file_path + "`." );
 
string line;
string chars; // Whitespace removed.
getline( data, line )
or fail( "Unable to read first line of `" + file_path + "`." );
chars = chars_from( line );
$let n = n_items_in( chars );
Matrix<char> result{ 2 + n };
for( Index x = 0; x < n; ++x ) { result.item( x + 1, 1 ) =
chars[x]; }
for( Index y = 1; y < n; ++y )
{
getline( data, line )
or $fail( "Unable to read line " + to_string( y + 1 ) +
" of `" + file_path + "`." );
chars = chars_from( line );
hopefully( n_items_in( chars ) == n )
or $fail( "Inconsistent row lengths in board data `" +
file_path + "`" );
for( Index x = 0; x < n; ++x ) { result.item( x + 1, y + 1
) = chars[x]; }
}
return result;
}
 
using Dictionary_iter = vector<string>::const_iterator;
 
$procedure recursively_append_words_from(
const Dictionary_iter start_dict_range,
const Dictionary_iter end_dict_range,
ref_<const Matrix<char>> chars,
const Position position,
ref_<Matrix<bool>> visited,
ref_<string> word,
ref_<vector<string>> result
)
{
$let original_word_length = word.length();
$let ch = chars.item( position.x, position.y );
if( ch == 'q' ) { word += "qu"; } else { word += ch; }
$let it_dict_entry = lower_bound( start_dict_range,
end_dict_range, word );
 
if( starts_with( word, *it_dict_entry ) )
{
$let length = word.length();
if( length >= 3 and length == it_dict_entry->length() )
{
result.push_back( word );
}
visited.item( position.x, position.y ) = true;
for( $each dy $in range( -1, +1 ) ) for( $each dx $in
range( -1, +1 ) )
{
Position const new_position = {position.x + dx,
position.y + dy};
if( not visited.item( new_position.x, new_position.y ) )
{
recursively_append_words_from(
it_dict_entry, end_dict_range, chars, new_position,
visited, word, result
);
}
}
visited.item( position.x, position.y ) = false;
}
word.resize( original_word_length );
}
 
$procedure append_words_from(
ref_<const vector<string>> dictionary,
ref_<const Matrix<char>> chars,
ref_<const Position> start,
ref_<vector<string>> result
)
{
string word;
$let n = chars.size();
Matrix<bool> visited{ n };
 
for( $each i $in i_up_to( n ) )
{
visited.item( i, 0 ) = true;
visited.item( 0, i ) = true;
visited.item( i, n - 1 ) = true;
visited.item( n - 1, i ) = true;
}
 
recursively_append_words_from(
$items_of( dictionary ), chars, start,
visited, word, result
);
}
 
$procedure cpp_main( ref_<const vector<string>> args )
{
$hopefully( n_items_in( args ) == 3 )
or $fail( "Usage: " + args[0] + " DICTIONARY BUBBLEBOARD" );
 
vector<string> const dictionary = load_dictionary( args[1] );
Matrix<char> const board = load_bubble_board( args[2] );
 
assert( is_sorted( $items_of( dictionary ) ) );
$let n = board.size();
vector<string> words;
for( $each y $in i_up_to( n ) ) for( $each x $in i_up_to( n ) )
{
append_words_from( dictionary, board, {x + 1, y + 1}, words );
}
 
sort( $items_of( words ) );
$let start_of_duplicates = unique( $items_of( words ) );
words.erase( start_of_duplicates, end( words ) );
for( $each word $in words )
{
cout << word << "\n";
}
}
} // namespace app
 
$start_with_arguments( app::cpp_main )
[/code]
 
 
• • •
 
The above code, plus the supporting machinery not shown here, was
compiled with MinGW g++ 6.3.0 and Visual C++ 2015 update 2.
 
I haven't tried clang, but as I think clang requires an option in order
to not warn about use of `$` in names.
 
Cheers!,
 
- Alf
 
Notes:
¹ I am aware that the name "Expressive C++" and its natural
abbreviations have been used for various things. E.g. "EC++" was/is (?)
a standard for limited C++ on embedded devices. "XC++" is some
mysterious Japanese C++ library project on SourceForge. And Eric
Niebler, author of the C++20 ranges library, has written something he's
called the Expressive C++ Series, but I don't know what that was.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 10 04:51PM +0100

On 10.02.2017 15:59, Alf P. Steinbach wrote:
> }
> return result;
> }
 
Oh, I forgot to fix the loops at the end there, to Expressive syntax:
 
 
$function load_bubble_board( ref_<const string> file_path )
-> Matrix<char>
{
ifstream data{ file_path };
$hopefully( not data.fail() )
or $fail( "Failed to open dictionary `" + file_path + "`." );
 
string line;
string chars; // Whitespace removed.
getline( data, line )
or fail( "Unable to read first line of `" + file_path + "`." );
chars = chars_from( line );
$let n = n_items_in( chars );
Matrix<char> result{ 2 + n };
for( $each x $in i_up_to( n ) ) { result.item( x + 1, 1 ) =
chars[x]; }
for( $each y $in range( 1, n - 1 ) )
{
getline( data, line )
or $fail( "Unable to read line " + to_string( y + 1 ) +
" of `" + file_path + "`." );
chars = chars_from( line );
hopefully( n_items_in( chars ) == n )
or $fail( "Inconsistent row lengths in board data `" +
file_path + "`" );
for( $each x $in i_up_to( n ) ) { result.item( x + 1, y + 1
) = chars[x]; }
}
return result;
}
 
As I recall from some years ago the generated machine code is the same.
 
So, this way of writing simple counting loops can be freely adopted even
without the Expressive `$`-keywords.
 
 
Cheers!,
 
- Alf
udpbot <bleachbot@httrack.com>: Feb 08 06:07AM +0100

udpbot <bleachbot@httrack.com>: Feb 08 05:55AM +0100

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: