soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

string_view problem - 12 Updates

Ralf Goertz <me@myprovider.invalid>: Jun 27 08:43AM +0200

Am Wed, 26 Jun 2019 12:11:57 -0000 (UTC)
> > tester=permuter+permuter.substr(0,permuter.size()-1); //*

> If you are aiming for the utmost extreme speed and optimization, lines
> like this should *immediately* ring alarm bells in your head.

I thought a little about that. While I can see your point, I remember
having heard (I am no professional programer, so I never properly learnt
C++) that every variable should be declared where it is first used.
Which somewhat conflicts with your statement:

> heavy algorithms is going to kill the efficiency. (There are
> situations where it simply can't be avoided, but if you can avoid it,
> you should.)

I think I also heard that compilers usually generate code that optimises
away all but the first allocation. In my case the size of permuter
doesn't change within the loop so the memory allocated in the first run
could simply be reused in the subsequent runs, no?

Juha Nieminen <nospam@thanks.invalid>: Jun 27 07:57AM

> having heard (I am no professional programer, so I never properly learnt
> C++) that every variable should be declared where it is first used.
> Which somewhat conflicts with your statement:

That's a principle relating to syntax, not relating to where objects
should be created. In old C variables needed to be declared at the
beginning of the function, even if they were only needed later in
the code. In other words, you had to write like:

void foo()
{
int i, j;

// lots of code here...

i = 5; j = 10;
// etc...
}

In C++, and in modern C, it's recommended for clarity to do it like:

void foo()
{
// lots of code here...

int i = 5, j = 10;
// etc...
}

Obviously the advise is not saying that if you have some buffer of fixed
size for calculations, and this buffer is used inside the innermost loop
of a heavy algorithm, you should create the buffer again and again inside
that inner loop.

> away all but the first allocation. In my case the size of permuter
> doesn't change within the loop so the memory allocated in the first run
> could simply be reused in the subsequent runs, no?

You are not only creating the 'tester' string in the inner loop, but you
are also creating temporary strings. These strings get destroyed at the
end of the scope where they are created. The compiler isn't going
to optimize that away.

Ralf Goertz <me@myprovider.invalid>: Jun 27 10:37AM +0200

Am Thu, 27 Jun 2019 07:57:22 -0000 (UTC)
> you are also creating temporary strings. These strings get destroyed
> at the end of the scope where they are created. The compiler isn't
> going to optimize that away.

So the following would be okay and the allocation is done just once?
Also, will it be be initialised as the constructor says although this is
not needed at all?

size_t const size=permuter.size();
do {
bool found(false);
boyer_moore_searcher b(permuter.begin(),permuter.end());
for (int i=0;i<result.size();++i) {
string tester(size*2-2,' '); //or vector<char>
copy(result[i].begin()+1,result[i].end(),tester.begin());
copy(result[i].begin() ,result[i].end()-1,tester.begin()+size-1);
if (b(tester.begin(),tester.end()).first!=tester.end()) {
found=true;
break;
}
}
…
} while (next_permutation(permuter.begin()+1,permuter.end()));

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 27 07:26AM -0400

On 6/27/19 3:57 AM, Juha Nieminen wrote:
> Ralf Goertz <me@myprovider.invalid> wrote:
...
>> I thought a little about that. While I can see your point, I remember
>> having heard (I am no professional programer, so I never properly learnt
>> C++) that every variable should be declared where it is first used.

That's not always possible, if the scope for the variable needs to be
larger than the scope in which the first use occurs.
The key requirement is that it be declared no later than first use, and
preferably as late as possible, consistent with the way it will be used.

> should be created. In old C variables needed to be declared at the
> beginning of the function, even if they were only needed later in
> the code. ...

That may have been true in the very earliest days of C, but it hasn't
been true since at least K&R C, which allowed you to declare objects at
the start of any compound statement, not just at the start of the
outermost compound statement of a function:

int func(in)
int in;
{
if(in)
{
int total;
...

C++ added the ability to intermingle declarations and statements, which
was later adopted by C99 as well. However, the freedom to place
declarations in positions other than the start of the function dates
back at least to K&R C.

...
> size for calculations, and this buffer is used inside the innermost loop
> of a heavy algorithm, you should create the buffer again and again inside
> that inner loop.

Actually, I would not hesitate to define a buffer as a C style array
inside a loop, if there's no need for the contents of that buffer to
carry over from one pass of the loop to another. I would normally expect
the compiler to simply reuse the same piece of memory during each pass
through the loop, at no extra cost.

This is particularly true if there's a need to zero initialize the
buffer between passes, which can be achieved by simply providing a {0}
initializer for the buffer, rather than having to call memset().

In C++, I would hesitate to define a standard container in that
location, because I couldn't rely on the compiler to figure out how to
optimize away the constructor and destructor calls that nominally must
occur at the start and end of each pass through the loop.

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 05:49AM -0700

> My idea for the program was to go over all linear arrangements (with the
> first element fixed) and store those which were no circular permutations
> of any of the previously stored arrangements. [...]

This algorithm does a lot more work than is needed. The program
can be much faster (I ran a 14 14 input in about 0.65 seconds).

Hint: it isn't necessary to store any previous results to check
if next_permutation() gives a string none of whose rotations have
been seen before.

Manfred <noname@add.invalid>: Jun 27 04:15PM +0200

On 6/27/2019 1:26 PM, James Kuyper wrote:
> larger than the scope in which the first use occurs.
> The key requirement is that it be declared no later than first use, and
> preferably as late as possible, consistent with the way it will be used.

The "preferably" part is subject to debate, and I don't think it should
be stated as a general recommendation.
That is probably what you meant with "consistent with the way it will be
used", but it doesn't sound that clear.
Some experienced programmers consider the habit of intermingling
declarations and statements as sloppy practice, and this is not just a
matter of being old fashioned.

As an example, when I have to implement an algorithm of some sort, I
take care to identify the data elements that compose the state of the
logic involved.
Managing a component's state requires proper discipline, so I consider
such state data best declared at the top of the context where the
algorithm is defined (which may well be a nested scope), so as to make
clear that it belongs to a whole which is key to the functioning of the
following procedure.

Ralf Goertz <me@myprovider.invalid>: Jun 27 04:52PM +0200

Am Thu, 27 Jun 2019 05:49:37 -0700
> > of the previously stored arrangements. [...]

> This algorithm does a lot more work than is needed. The program
> can be much faster (I ran a 14 14 input in about 0.65 seconds).

What number of permutations did you get?

> Hint: it isn't necessary to store any previous results to check
> if next_permutation() gives a string none of whose rotations have
> been seen before.

I am surprised to hear that. How else do I know that when I get to the
permutation ABBA in the example above, it is the same as AABB so that I
must discard it? The example also shows that the equivalence classes are
in general not of the same size which is the very reason the formula is
rather complicated. (The reference I gave corrected an erroneous
previous attempt by another author.) 15 years ago I thought a lot about
circular permutations and I still don't think I have an intuitive
understanding of them. At that time I lacked the ability (and the
computer power) to enumerate them. 14 14 wouldn't have sufficed anyway.
The goal was to prove that for certain inputs there where permutations
without neighbouring elements of the same category. The smallest
relevant example (which is [probably] the only such example where there
is no such permutation) is 3 1 1. I ended up proving the theorem for
infinitely many (but not all) inputs with a completely different
approach. But the idea of coming to grips with the problem by looking at
the actual permutations somehow stuck.

So I am not only surprised that there is a much simpler algorithm but
also very impressed that you found it with apparent ease. Regrettably,
your hint doesn't help me enough. So could you please elaborate?

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 27 08:59AM -0700

On Thursday, June 27, 2019 at 10:15:55 AM UTC-4, Manfred wrote:
> > preferably as late as possible, consistent with the way it will be used.

> The "preferably" part is subject to debate, and I don't think it should
> be stated as a general recommendation.

I do.

> That is probably what you meant with "consistent with the way it will be
> used",

I don't think so.
The main thing I meant by that comment is that each variable must be
declared in a location that gives it sufficient scope to be referred to
by name in all of the parts of the code that need to refer to it by
name. That requirement sometimes constrains how close a declaration can
be to the point of first use. That isn't the issue you're talking about.

> Some experienced programmers consider the habit of intermingling
> declarations and statements as sloppy practice, and this is not just a
> matter of being old fashioned.

I know that they do, and I disagree. I think that declaring an
identifier as close as possible to the point of first use makes it
easier to find and understand, and minimizes the opportunities for
conflict with other identifiers. Some people feel that a different
location, such as the top of the function, would make it easier to find,
but I haven't seen a reasonable explanation for that feeling.

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 09:57AM -0700

> where the algorithm is defined (which may well be a nested scope),
> so as to make clear that it belongs to a whole which is key to the
> functioning of the following procedure.

I agree with your sentiments. The policy of declaring variables
"preferably as late as possible" is overly simplistic, and over
sold.

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 10:16AM -0700

> to the permutation ABBA in the example above, it is the same as
> AABB so that I must discard it? [...] Regrettably, your hint
> doesn't help me enough. So could you please elaborate?

Consider looking at ABBA. It has four rotations, which is to say
the original plus three non-zero rotations:

ABBA
BBAA
BAAB
AABB

The last of these comes lexicographically before ABBA. Therefore
we must have seen AABB earlier, and ABBA is a duplicate.

(The "earlier" in the above statement relies on next_permutation()
giving its results in lexicographic order, but if what we're doing
is counting them the order doesn't matter as long as we eventually
get every permutation - we will count the "smallest" rotations,
and skip the rest, in whatever order they might appear.)

Ralf Goertz <me@myprovider.invalid>: Jun 27 08:10PM +0200

Am Thu, 27 Jun 2019 10:16:17 -0700
> is counting them the order doesn't matter as long as we eventually
> get every permutation - we will count the "smallest" rotations,
> and skip the rest, in whatever order they might appear.)

Great, thanks! I still need 2.7 seconds for 14 14, but that was not
doable before. Maybe you've got some insight as to why I still seem to
be much slower than you (assuming our hardware is comparable).

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <string>

using namespace std;

template<typename T> bool has_smaller_permutation(T const &v) {
static T rotator(v.size(),' ');
for (auto i=1;i<v.size();++i) {
copy(v.begin(),v.begin()+i,copy(v.begin()+i,v.end(),rotator.begin()));
if (rotator<v) return true;
}
return false;
}

int main(int argc, char *argv[]) {
vector<char> permuter;
//string permuter;
char c='A';
for (auto i=1;i<argc;++i) {
auto n=stoi(argv[i]);
permuter.insert(permuter.end(),n,c++);
}
if (!permuter.size()) {
cout<<0<<endl;
return 0;
}
vector<string> result;
size_t sum=0;
do {
if (!has_smaller_permutation(permuter)) {
//copy(permuter.begin(),permuter.end(),ostream_iterator<char>(cout,"")); cout<<'\n';
++sum;
}
} while (next_permutation(permuter.begin()+1,permuter.end()));
cout<<sum<<endl;
}

Keith Thompson <kst-u@mib.org>: Jun 27 12:06PM -0700

> On 6/27/19 3:57 AM, Juha Nieminen wrote:
[...]
> was later adopted by C99 as well. However, the freedom to place
> declarations in positions other than the start of the function dates
> back at least to K&R C.
[...]

Yes. Historical note: In the 1975 C Reference Manual, declarations
are allowed only at the top of a function definition. Compound
statements cannot contain declarations. K&R1, 1978, redefined
compound statements to allow zero or more declarations followed by
zero or more statements.

https://www.bell-labs.com/usr/dmr/www/cman.pdf

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, June 27, 2019

Digest for comp.lang.c++@googlegroups.com - 12 updates in 1 topic

No comments:

Blog Archive

About Me