Monday, November 19, 2018

Digest for comp.lang.c++@googlegroups.com - 24 updates in 5 topics

Paul <pepstein5@gmail.com>: Nov 19 06:47AM -0800

I've begun a previous thread about a dictionary problem.
Basically we want the longest word that can be obtained by
choosing among a given set of characters.
So one idea is to sort the dictionary and we want to try the
longest words first. However, if a word is so long that it
contains too many characters then I want to ignore it.
So I came up with this sorting mechanism which does work when I tested
it but it seems problematic because of being potentially non-associative.
I know that that's not complete code but I'm interested in whether my
sorting predicate is bug-vulnerable.
Many thanks.
 
Paul
 
 
std::vector<int> characterCounts = charCounts(arrayOfLetters);
std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
// Test the longest words first but not the words that are too long.
const int length = arrayOfLetters.size();
auto cmp = [length](const std::string& word1, const std::string& word2)->bool{return word1.size() <= length && word2.size() > length || word1.size() > word2.size();};
std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);
Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 19 11:00PM

On Mon, 2018-11-19, Paul wrote:
> it but it seems problematic because of being potentially non-associative.
> I know that that's not complete code but I'm interested in whether my
> sorting predicate is bug-vulnerable.
...
 
(Formatted for readability)
 
> return word1.size() <= length && word2.size() > length
> || word1.size() > word2.size();
> };
 
Side note: my compiler gives me a warning about the precedence
between || and &&. I happily admit I had to look it up; I've never
needed to memorize that particular precedence rule.
 
Enable and pay attention to compiler warnings.
 
I tried to understand it, but failed. I don't know if it's safe; my
gut feeling is it's not. As you seem to know, it's easy to get a
sorting predicate subtly wrong and have the code break in subtle but
catastrophic way.
 
I'd try to solve the problem in a way which doesn't need an unusual
predicate. Where "unusual" means roughly anything but
std::lexicographical_compare(f(a), f(b)), or whatever the syntax is.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 18 10:47PM -0500

Chris M. Thomasson wrote:
 
> Hardcore nests of pain. Fwiw, sometimes I use a very simplistic method of
> interfaces in C:
 
> https://pastebin.com/raw/f52a443b1
That is actually similar to how kernel drivers are programmed (although
indirecttion via vtable is not used often). What's sometimes useful (and often
abused) is that with this approach, the programmer can alter vtable or function
pointers in the struct at runtime at will as opposed to when using C++
polymorphism with virtual functions -- where these pointers are changed by
implementation at some fixed points for no useful reason, but just to annoy
people and hurt performance :-( .
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 19 01:06AM -0800

On Friday, November 16, 2018 at 11:00:51 PM UTC-5, Chris M. Thomasson wrote:
> Hardcore nests of pain. Fwiw, sometimes I use a very simplistic method
> of interfaces in C:
 
> https://pastebin.com/raw/f52a443b1
 
I see this in there:
 
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};
 
In fp_write(), what's the difference between:
 
void* const
void const*
 
?
 
--
Rick C. Hodgin
David Brown <david.brown@hesbynett.no>: Nov 19 11:19AM +0100

On 19/11/18 10:06, Rick C. Hodgin wrote:
> };
 
> In fp_write(), what's the difference between:
 
> void* const
 
"void * const" says that the /parameter/ (unnamed here, but it will have
a name in the definition of such a function) is a "const" and cannot be
modified by the function. The parameter type is a pointer-to-void
(pointer to anything, if you like).
 
So we have:
 
void test1(void * const p) {
p++;
}
 
Error - increment of read-only pointer. (I think incrementing a void*
pointer is a gcc extension anyway, but that is beside the point here.)
 
But this is okay:
 
void test2(void * const p) {
int * q = (int *) p;
*q = 42;
}
 
 
> void const*
 
"void const *" says that the parameter is a pointer to a "void const" -
we don't know the type we are pointing to, but we are not supposed to
change it via the pointer.
 
Thus:
 
void test3(void const * p ) {
int * q = (int *) p;
*q = 42;
}
 
is highly questionable, and gcc gives "warning: initialisation discards
'const' qualifier from pointer target type". It is not an error - it is
allowed behaviour if the original thing pointed to was not declared as
"const". But it is likely to be wrong, and likely to be contrary to the
expectations of the code using test3, so a warning is good.
 
 
Declaring a parameter itself to be "const", as in the first "void*
const", is very strange for normal pass-by-value parameters, since it is
basically useless. (It is useful and common for reference parameters).
It is also inconsistent between the parameters in the code you showed.
So I'd assume there is a mistake here somewhere.
 
 
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 19 05:15AM -0800

On Monday, November 19, 2018 at 5:19:27 AM UTC-5, David Brown wrote:
> }
 
> Error - increment of read-only pointer. (I think incrementing a void*
> pointer is a gcc extension anyway, but that is beside the point here.)
 
Why wouldn't it be written as:
 
const void*
 
To indicate it's a pointer to void, and it's a constant? Is
that different than void* const?
 
--
Rick C. Hodgin
Juha Nieminen <nospam@thanks.invalid>: Nov 19 01:26PM


> const void*
 
> To indicate it's a pointer to void, and it's a constant? Is
> that different than void* const?
 
"const void*" (or "void const*") is a pointer to a value that can't
be modified. The pointer itself can be modified to point somewhere else.
 
"void *const" is a pointer to a value that *can* be modified, but
the pointer itself can't be modified to point somewhere else.
 
It can get more complicated than that. Consider all of these, which
are different types:
 
int **p1;
int **const p2;
int *const *p3;
int const **p4;
int *const *const p5;
int const **const p6;
int const *const *p7;
int const *const *const p8;
 
It helps when you read the declaration from the right to the left.
For example:
 
int *const p1;
 
can be read as: "p1 is a const pointer to an int"
 
int const *p2;
 
can be read as: "p2 is a pointer to a const int"
 
(You can also read "const int *p2;", which means the same thing, as
"p2 is a pointer to an int that's const".)
 
Thus, for example:
 
int **const p2;
 
can be read as: "p2 is a const pointer to a pointer to an int".
 
int const *const *const p8;
 
can be read as: "p8 is a const pointer to a const pointer to a const int".
David Brown <david.brown@hesbynett.no>: Nov 19 02:55PM +0100

On 19/11/18 14:15, Rick C. Hodgin wrote:
 
> const void*
 
> To indicate it's a pointer to void, and it's a constant? Is
> that different than void* const?
 
They are different.
 
"const void *" is a pointer to a const void - a pointer to something
with unspecified type, but which you can't use to modified the thing
pointed to. (It's easier to describe "const int *", for example, which
is a pointer to a const int - i.e., you can't use the pointer to modify
the pointed-to int.)
 
"const void *" is the same as "void const *". Some people like to write
"const" after the thing it affects (which can be done consistently, even
for pointers), others like to write it before the thing it affects
(which makes more sense to read, but can't be done for pointers).
 
It takes a little practice to see what this all means, but you can get
the hang of it. If types get really complicated, I always prefer to use
typedef's to break them up. And adding a const qualifier to the
parameter itself (when not using references) is rarely helpful - it does
not affect the calling function at all, but makes the types harder to
read. So I would never write "void * const" as a parameter type.
woodbrian77@gmail.com: Nov 19 09:46AM -0800

On Monday, November 19, 2018 at 7:56:05 AM UTC-6, David Brown wrote:
 
> "const void *" is the same as "void const *". Some people like to write
> "const" after the thing it affects (which can be done consistently, even
> for pointers),
 
See:
http://slashslash.info/2018/02/a-foolish-consistency/
http://slashslash.info/petition
 
for more info.
 
 
> It takes a little practice to see what this all means, but you can get
> the hang of it. If types get really complicated, I always prefer to use
> typedef's to break them up.
 
Consider also a using statement:
https://stackoverflow.com/questions/10747810/what-is-the-difference-between-typedef-and-using-in-c11
 
 
 
Brian
Ebenezer Enterprises - In G-d we trust.
https://github.com/Ebenezer-group/onwards
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 19 12:36PM -0800

On Monday, November 19, 2018 at 8:26:54 AM UTC-5, Juha Nieminen wrote:
 
> can be read as: "p2 is a const pointer to a pointer to an int".
 
> int const *const *const p8;
 
> can be read as: "p8 is a const pointer to a const pointer to a const int".
 
Thank you, Juha! Right-to-left is helpful.
 
--
Rick C. Hodgin
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 19 12:37PM -0800

On Monday, November 19, 2018 at 8:56:05 AM UTC-5, David Brown wrote:
> parameter itself (when not using references) is rarely helpful - it does
> not affect the calling function at all, but makes the types harder to
> read. So I would never write "void * const" as a parameter type.
 
I appreciate the explanation, David. Makes sense.
 
--
Rick C. Hodgin
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 19 02:56PM -0800

On 11/19/2018 5:55 AM, David Brown wrote:
>>>> void* const
 
>>> "void * const" says that the /parameter/ ... is a "const" and cannot be
>>> modified by the function. The parameter type is a pointer-to-void...
[...]
> parameter itself (when not using references) is rarely helpful - it does
> not affect the calling function at all, but makes the types harder to
> read. So I would never write "void * const" as a parameter type.
 
Fwiw, I put the const in the interface to attempt to get the point
across that this parameter shall never be mutated by a function. Sort of
a contract, documented within the API itself. The self parameters in my
code are basically akin to the this parameter in C++. The value of the
this pointer shall never be changed, just like the value of the self
pointer shall never be changed.
 
;^)
"Öö Tiib" <ootiib@hot.ee>: Nov 19 12:18AM -0800

On Saturday, 17 November 2018 23:20:47 UTC+2, Alf P. Steinbach wrote:
 
> That's very wrong, pure disinformation.
 
> A function defined inside a class is implicitly `inline`, yes.
 
> However, a freestanding function template is not, and a `constexpr` is not.
 
The constexpr has been implicitly inline all the time. [dcl.constexpr]
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 19 01:15PM +0100

On 19.11.2018 09:18, Öö Tiib wrote:
 
>> A function defined inside a class is implicitly `inline`, yes.
 
>> However, a freestanding function template is not, and a `constexpr` is not.
 
> The constexpr has been implicitly inline all the time. [dcl.constexpr]
 
No.
 
The paragraph you refer to does not support your claim.
 
Since you fail to provide a quote I must /guess/, and I think this is
the part you focused on, from C++17 §10.15.1/1:
 
"A function or static data member declared with the constexpr
specifier is implicitly an inline function or variable"
 
That does not cover ordinary `constexpr` variables.
 
 
Cheers & hth.,
 
- Alf
Juha Nieminen <nospam@thanks.invalid>: Nov 19 01:09PM

> That does not cover ordinary `constexpr` variables.
 
For that you would need an "extern constexpr" variable, which I don't
think would make any sense.
 
On that note, I have for long puzzled about the behavior and distinction
between a const integral and, for example, a const floating point value.
A "const int" that's initialized with a compile-time constant is (and AFAIK
has always been) pretty much equivalent to the modern "constexpr int".
In other words, it can be used wherever a compile-time constant is required.
However, "const double" is not (nowadays you would have to use
"constexpr double" for that).
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 19 05:07PM +0100

On 19.11.2018 14:09, Juha Nieminen wrote:
>> That does not cover ordinary `constexpr` variables.
 
> For that you would need an "extern constexpr" variable, which I don't
> think would make any sense.
 
No, you don't need an `extern` variable to see the difference. You can
take the address of a `constexpr` variable. That's all you need to have
an observable difference between inline and not.
 
 
> In other words, it can be used wherever a compile-time constant is required.
> However, "const double" is not (nowadays you would have to use
> "constexpr double" for that).
 
I don't understand it either.
 
But the arguments I've seen have had to do with cross-compilation and
how the compiler, in such a case, would be unreasonably required to
emulate every target system in order to produce 100% perfect values.
Though no explanation of why such perfect values would be required. Or
why, then, one could use e.g. the templated constant trick in C++03.
 
 
Cheers!,
 
- Alf
woodbrian77@gmail.com: Nov 19 08:53AM -0800

On Monday, November 19, 2018 at 6:15:39 AM UTC-6, Alf P. Steinbach wrote:
 
> "A function or static data member declared with the constexpr
> specifier is implicitly an inline function or variable"
 
> That does not cover ordinary `constexpr` variables.
 
In that case, it seems like it would be better if the
standard said:
 
A function or static data member declared with the constexpr
specifier is implicitly an inline function or static data member.
 
 
Brian
Ebenezer Enterprises
http://webEbenezer.net
Paul <pepstein5@gmail.com>: Nov 19 02:19AM -0800

Below is my solution to finding the longest word in a dictionary given
a collection of letters. For more (but perhaps still not total) clarity,
please see my code below. It's an interview question and interviewers
don't usually give total clarity either.
I would love to see what people think of my solution. In particular
is it better to use a vector or a set for the sorted container
(or doesn't it matter?)
[BTW, this solution wasn't actually presented at the interview itself,
so even if it's good it doesn't mean I got the job].
 
Many thanks.
 
// Practising the common interview question of finding a longest word in a given dictionary
// with all characters in a given array of letters. The array may contain duplicate characters.
// In that case, the number of times that a char, c, can occur = the number of times
// c is duplicated in the array
#include <string>
#include <unordered_set>
#include <vector>
#include <iostream>
#include <algorithm>
#include <set>
 
// First step is to acquire the character counts in the array of letters.
std::vector<int> charCounts(const std::string& arrayOfLetters)
{
std::vector<int>counts(256);
for(char c : arrayOfLetters)
++counts[c];
 
return counts;
}
 
std::string LongestWord(const std::string& arrayOfLetters, const std::unordered_set<std::string>& Dictionary)
{
// Preprocessing of array of letters
const std::vector<int>& characterCounts = charCounts(arrayOfLetters);
// Test the longest words first
auto cmp = [](const std::string& word1, const std::string& word2)->bool{return word1.size() > word2.size();};
 
const std::set<std::string, decltype(cmp)> sortedDictionary(Dictionary.begin(), Dictionary.end(), cmp);
// A vector solution would have been as below. Should I have done this?
/*std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);*/
 
 
// Sorted by length so we only need to find the first word that works.
for(const std::string& w : sortedDictionary)
{
std::vector<int> charPositions(256);
bool wordWorks = true;
for(char c : w)
if(++charPositions[c] > characterCounts[c])
{
wordWorks = false;
break;
}
if(wordWorks)
return w;
}
 
std::cout << "\No words in the dictionary satisfied the conditions";
return "";
}
 
// Testing code -- "acce" is returned which is correct.
// Needs more tests, though.
int main()
{
std::unordered_set<std::string> words = {"acceca", "ace", "acce", "accce" };
std::cout << LongestWord("acecad", words);
}
Paul <pepstein5@gmail.com>: Nov 19 02:26AM -0800

On Monday, November 19, 2018 at 10:19:51 AM UTC, Paul wrote:
> std::unordered_set<std::string> words = {"acceca", "ace", "acce", "accce" };
> std::cout << LongestWord("acecad", words);
> }
 
Funny how you always see instant problems as soon as you post something.
Big stylistic issue in the above in that I don't use continue; which
is the tailor-made tool. Will change this.
Please comment if you see anything else.
 
Paul
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 19 11:21AM

> a collection of letters. For more (but perhaps still not total) clarity,
> please see my code below. It's an interview question and interviewers
> don't usually give total clarity either.
 
I would have assumed you were to read a dictionary from a file. Having
a dictionary explicit in the code does not strike me as a likely
interview question.
 
> I would love to see what people think of my solution. In particular
> is it better to use a vector or a set for the sorted container
> (or doesn't it matter?)
 
I would not use a container at all, since I would be reading a file of
words (you can find many such lists on the Web). My guess is that the
interviewer would like to see that you do this without storing the
dictionary in memory.
 
 
> // Practising the common interview question of finding a longest word in a given dictionary
> // with all characters in a given array of letters. The array may
> contain duplicate characters.
 
"with all" sounds odd. The longest word with them all is a word that is
the length of the array, so the only test in the program will be if
there is such a word, but...
 
> // In that case, the number of times that a char, c, can occur = the number of times
> // c is duplicated in the array
 
... "can occur" clearly indicates that not all occurrences must be
matched.
 
I know this is a quibble about words, but interviewers are likely to
care about comments too.
 
<snip code>
--
Ben.
Juha Nieminen <nospam@thanks.invalid>: Nov 19 01:16PM

> Below is my solution to finding the longest word in a dictionary given
> a collection of letters.
 
The easiest solution to this, which at first glance might look like a rather
naive beginner solution, but which is, in fact, relatively efficient (at
least for this kind of task), is to simply go through each word in the
dictionary and see if you can form that word with those given letters.
(The most efficient way to do *that* is to sort the letters of the word
and compare it to the given letters (which have been themselves sorted
as a preprocessing step). I think you can figure out how to make this
comparison so that what's being checked is "can this word be formed
by using letters from this group of letters?")
 
This might not be the absolutely fastest way of doing it that's
possible, but in most cases it's efficient enough.
David Brown <david.brown@hesbynett.no>: Nov 19 08:56AM +0100

On 18/11/18 20:48, Ben Bacarisse wrote:
> version. Thanks for that tip.
 
> (Yet another way in which benchmarking code like this is increasingly
> complicated.)
 
It is not just benchmarking code like this that is complicated - it is
writing maximally efficient code that is complicated. For this kind of
thing, there is no universal "best" method. My preference for such
cases is to use compiler-specific features like __builtin_popcount()
where possible (with fall-backs to generic code if you want more
portability). And then tell the compiler as much as you can about the
target - using -march flags with gcc, and similar flags on other
compilers.
 
The important point is not to try to figure out clever tricks in your
own code, but to work /with/ the compiler as best you can. "Clever"
tricks have a tendency to work against the compiler, perhaps relying on
undefined behaviour (though not in this case as far as I have noticed)
or limit its ability to do wider optimisations such as constant
propagation. The place for clever tricks these days is in the compiler,
rather than in the user code.
 
(Of course, sometimes you play with clever tricks for fun.)
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 19 11:32AM

> portability). And then tell the compiler as much as you can about the
> target - using -march flags with gcc, and similar flags on other
> compilers.
 
That probably matches your typical usage. For software that gets built
using a generic compile (because a moderately generic binary package is
being built) you'll end up with a slowish popcount function (at least
with the examples I tried).
 
> or limit its ability to do wider optimisations such as constant
> propagation. The place for clever tricks these days is in the compiler,
> rather than in the user code.
 
That is generally true, and probably true in this case as well because a
factor of two (as here) is not much of a price to pay in most software
that will use a popcount function. It's fast even when it's not the
fastest.
 
There's also the problem that you might be writing code that can't
assume a particular compiler. It's then not going to be 100% clear what
working with the compiler really means.
 
> (Of course, sometimes you play with clever tricks for fun.)
 
Ack.
 
--
Ben.
David Brown <david.brown@hesbynett.no>: Nov 19 12:57PM +0100

On 19/11/18 12:32, Ben Bacarisse wrote:
> using a generic compile (because a moderately generic binary package is
> being built) you'll end up with a slowish popcount function (at least
> with the examples I tried).
 
Yes, on my typical usage I know exactly what cpu the code will run on,
and can tune precisely.
 
For PC software that is delivered in source form, then this will also
work - it is up to the user to have flags suitable for their system.
(You could put -march=native in makefiles, which is the ideal choice for
people using software on the same system as they use to build it, but
makes it less portable to other systems.)
 
For software delivered as binaries, it's a different matter. The best
you can usually do is guess a minimal requirement for typical users.
 
And if you have code that is particularly performance sensitive, and
particularly affected by the exact processor capabilities (such as
POPCNT instruction or SIMD versions), then you can use gcc "function
multiversioning" and the "target_clones" function attribute to
automatically generate several versions of a function, tuned to
different processors. This adds a layer of indirection to the call,
which spoils some of the benefits of having a fast popcnt instruction -
it is best used on a function that does a bit more work, such as one
that calls popcount() in a loop. But it is simple to use:
 
__attribute__((target_clones("default,popcnt")))
int popcount(unsigned int x) {
return __builtin_popcount(x);
}
 
Then you just use "popcount()" as a normal function in your code, and
you'll get either a generic version (using a lookup table, I think) or a
popcnt instruction if your cpu supports it.
 
 
> There's also the problem that you might be writing code that can't
> assume a particular compiler. It's then not going to be 100% clear what
> working with the compiler really means.
 
I would say "working with the compiler" means something like :
 
#if defined(__GNUC__)
// Optimised version using gcc extensions
#elif defined(_MSC_VER)
// Optimised version using MSVC extensions
#elif ...
#else
// Fallback generic version

No comments: