soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Elementary string handling -- trying to optimise big O - 8 Updates
size_t - 5 Updates
Need to #include <string>? - 5 Updates
Need to #include <string>? - 7 Updates

Elementary string handling -- trying to optimise big O

Paul <pepstein5@gmail.com>: Jun 05 09:10AM -0700

I am writing code to compress string representations by representing (for example) "aaa" by a3. The details of the problem, together with my solution, are provided below. I'm a bit concerned about my Append functions. The reason I wrote them this way, rather than simply using the normal library function, append, is that push_back guarantees amortized constant time (I think), meaning that my version of append is linear in the length of the appended word. In contrast to this, I couldn't find any guarantee that standard append is any better than being linear in the length of the concatenated word. Is this reasoning sound? Many thanks for your help.

Paul

#include <string>
// Implementing a method to perform basic string compression using the counts
// of repeated characters. For example, the string aabcccccaaa would become
// a2b1c5a3. If the compressed string would not become smaller than the original
// string, return the original string. You can assume the string has only upper
// and lower case letters (a - z).

void Append(string& str1, const string& str2)
{
for(auto c: str2)
str1.push_back(c);
}

void Append(string& str1, int x)
{
Append(str1, to_string(x));
}

string compression(string input)
{
const int inputSize = input.length();

if(inputSize <= 2)
return input;

string results;
int repeated = 1; // Counting occurrences of each character

for(int i = 1; i < inputSize; ++i)
if(input[i] == input[i-1])
++repeated;
else
{
results.push_back(input[i-1]);
Append(results, repeated);
repeated = 1;
}

results.push_back(input[inputSize - 1]);
Append(results, repeated);

return results.length() < inputSize ? results : input;

}

Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 11:54AM -0500

Paul <pepstein5@gmail.com> wrote in

> I am writing code to compress string representations by representing
> (for example) "aaa" by a3.

So, how can you tell later if the original string was "aaa" or "a3"? Oh,
I see, it's always a char and a count, so "abc" gets "compressed" to
"a1b1c1". OK, fine, but how do you uncompress "a111b5"? It might be
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbb" or "a1bbbbb".

> couldn't find any guarantee that standard append is any better than
> being linear in the length of the concatenated word. Is this
> reasoning sound? Many thanks for your help.

Regardless of standardise, your Append() is most probably slower than
library string append. Your Append() is pushing each char separately,
meaning that for each char std::string needs to check for free buffer
space and update the string length field. Standard append does these
things only once for the appended string.

But as always with optimizations, there is no other way to find out the
real truth than to measure it. Please post back your speed testing
results here, it would be interesting to know such things.

> Append(str1, to_string(x));
> }

> string compression(string input)

If you are concerned about speed, this should be const string& input.

> results.push_back(input[inputSize - 1]);
> Append(results, repeated);

> return results.length() < inputSize ? results : input;

Oh, I see you still cannot tell if "a3" should be uncompressed to "aaa"
or "a3" ;-)

> }

Cheers
Paavo

Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 12:12PM -0500

Paavo Helde <myfirstname@osa.pri.ee> wrote in
>> {
>> Append(str1, to_string(x));
>> }

Another note: It appears the string append is called only once per each
to_string() function call. If this undisclosed to_string() indeed does
conversion to the decimal string representation (equiv to printf("%d")),
then you can probably forget any concerns about the speed of your Append()
- the int-to-string conversion is a bit heavyweight operation and will
probably dominate in the timings. Your profiler will tell you more exactly.

hth
Paavo

Luca Risolia <luca.risolia@linux-projects.org>: Jun 05 07:21PM +0200

Il 05/06/2015 18:10, Paul ha scritto:
> to this, I couldn't find any guarantee that standard append is any
> better than being linear in the length of the concatenated word. Is
> this reasoning sound?

Theory is one thing, reality is (often) another thing. For example,
std::vector is known to be preferable than std::list in many practical
cases where complexity theory would otherwise suggest using the latter.
As a general rule, I suggest that you at least measure the performance
of your standard library implementation against concrete use cases
before considering reinventing it.

Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 12:50PM -0500

Paul <pepstein5@gmail.com> wrote in
> (for example) "aaa" by a3. The details of the problem, together with

> string compression(string input)
> {
[...]

> return results.length() < inputSize ? results : input;
> }

It is interesting to note that this approach attempts to defeat the
pigeonhole principle and compress all inputs to at least the same size than
the original. However, this is mathematically not possible (assuming
lossless compression). See:

http://en.wikipedia.org/wiki/Pigeonhole_principle#Uses_and_applications

So, it seems some more work is needed with the design of this compression
algorithm ;-)

Cheers
Paavo

Mel <mel@zzzzz.com>: Jun 05 09:49PM +0300

On Fri, 05 Jun 2015 12:50:03 -0500, Paavo Helde
> Paul <pepstein5@gmail.com> wrote in
> news:11bd72d2-fdbf-4447-9bf0-a32078836aa9@googlegroups.com:

> > I am writing code to compress string representations by
representing
> > (for example) "aaa" by a3. The details of the problem, together
with

> > }

> It is interesting to note that this approach attempts to defeat the
> pigeonhole principle and compress all inputs to at least the same
size than
> the original. However, this is mathematically not possible
(assuming
> lossless compression). See:

Question is: is this compession or hashing :)))

--
Press any key to continue or any other to quit

Paul <pepstein5@gmail.com>: Jun 05 11:51AM -0700

On Friday, June 5, 2015 at 6:50:18 PM UTC+1, Paavo Helde wrote:
> algorithm ;-)

> Cheers
> Paavo

Thanks, Paavo

You make a number of interesting points. I'll consider some of them in turn. Uncompression is uniquely determined by the constraint, given in the comments, that inputs only contain chars that are letters of the English alphabet. Furthermore, supposing that we ignore this constraint and thereby conclude that some strings can be uncompressed in different ways. I don't understand why this would be any type of problem -- we would simply have a non-injective function. Nobody said that we wanted to uncompress strings.

Excellent point about the function signature -- I was careless here.
I don't understand why you object to this code:

return results.length() < inputSize ? results : input;

If you believe that it sometimes results in a runtime error, please can you give inputs that cause incorrect output? If you believe that this is inefficient, please can you explain why?

Thank You,

Paul

Paul <pepstein5@gmail.com>: Jun 05 11:55AM -0700

On Friday, June 5, 2015 at 7:49:25 PM UTC+1, Mel wrote:
> (assuming
> > lossless compression). See:

> Question is: is this compession or hashing :)))

I am trying to implement the above function and "compression" is merely the name I am choosing to give the function. If I called the function "hashing" or "elephant" that wouldn't have changed my question. I don't understand the computer science theory of this at all, but I'm reading CLR on algorithms so they might explain this.

Paul

size_t

Doug Mika <dougmmika@gmail.com>: Jun 05 11:25AM -0700

I found this on a discussion group, and it said that the following will cause an infinite loop, but why?

for (size_t i = strlen (str) - 1; i >= 0; i--){
...
}

Wouter van Ooijen <wouter@voti.nl>: Jun 05 08:33PM +0200

Doug Mika schreef op 05-Jun-15 om 8:25 PM:

> for (size_t i = strlen (str) - 1; i >= 0; i--){
> ...
> }

siez_t is likely to be (I dunno, it might even be required to be) an
unsigned type. An unsigned variable will always be >= 0.

Wouter

scott@slp53.sl.home (Scott Lurndal): Jun 05 06:39PM

>for (size_t i = strlen (str) - 1; i >= 0; i--){
> ...
>}

size_t is an unsigned type, it can never be negative.

Doug Mika <dougmmika@gmail.com>: Jun 05 11:39AM -0700

On Friday, June 5, 2015 at 1:33:30 PM UTC-5, Wouter van Ooijen wrote:

> siez_t is likely to be (I dunno, it might even be required to be) an
> unsigned type. An unsigned variable will always be >= 0.

> Wouter

But of course, for the loop to stop we need to fall below 0, which is not going to be possible with size_t as our data type...I should have looked at it closer.

Mel <mel@zzzzz.com>: Jun 05 09:50PM +0300

On Fri, 5 Jun 2015 11:25:45 -0700 (PDT), Doug Mika
> I found this on a discussion group, and it said that the following
will cause an infinite loop, but why?

> for (size_t i = strlen (str) - 1; i >= 0; i--){
> ...
> }

size_t is unsigned...

--
Press any key to continue or any other to quit

Need to #include <string>?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 10:50AM

The program

#include <iostream>
#include <ostream>
using namespace ::std::literals;
int main()
{ ::std::cout <<( "a"s + "b"s )<< "\n"s;
::std::cout <<( "a"s < "b"s )<< "\n"s; }

prints

ab
1

here. But I am correct when I suspect that it is
actually wrong, because it actually needs to

#include <string>

?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 11:00AM

>{ ::std::cout <<( "a"s + "b"s )<< "\n"s;
> ::std::cout <<( "a"s < "b"s )<< "\n"s; }

And without the »+« and »<«, is »#include <string>«
already required for »::std:cout << "\n"s«?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 03:59PM

>>#include <string>
>>?
>Yes, you should include all the headers you need.

And /do/ I need the header »<string>« for

::std::cout << ""s;

? After all, AFAIK, I don't need »<cstring>« for

::std::cout << "";

. (And what about »""s+""s«?)

ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 05:36PM

>- the int-to-string conversion is a bit heavyweight operation and will
>probably dominate in the timings. Your profiler will tell you more exactly.

#include <iostream> // ::std::cout
#include <ostream> // <<
#include <string>

static constexpr char const * string[] =
{ "0", "1", "2", "3", "4", "5", "6" };

static inline char const * to_string
( int const i )
{ return i < 7 ? string[ i ] :
::std::to_string( i ).c_str(); }

static void test( ::std::ostream & out )
{ out << to_string( 0 )<< '\n';
out << to_string( 1 )<< '\n';
out << to_string( 2 )<< '\n';
out << to_string( 20 )<< '\n';
out << to_string( 40 )<< '\n'; }

int main(){ test( ::std::cout ); }

ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 06:38PM

>for (size_t i = strlen (str) - 1; i >= 0; i--){
> ...
>}

Works fine here and prints »alpha«.

#include <stdio.h>
#include <string.h>

int main()
{ char * str = "ahpla";
for (size_t i = strlen (str) - 1; i >= 0; i--){
if( i > 9999 )break;
printf( "%c", i[ str ] ); } puts( "" ); }

Need to #include <string>?

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 07:43AM -0400

On 6/5/2015 7:00 AM, Stefan Ram wrote:
>> ::std::cout <<( "a"s < "b"s )<< "\n"s; }

> And without the »+« and »<«, is »#include <string>«
> already required for »::std:cout << "\n"s«?

The 's' suffix is user-defined, isn't it? It's some kind of language
(library) extension or did I miss it in the Standard somewhere? If your
code that has the 's' suffix compiles OK without explicit inclusion of
<string>, why are you concerned with '+' needing it? Seems evident to
me that in order to use the 's' suffix you either rely on the indirect
inclusion of it, or do it explicitly...

V
--
I do not respond to top-posted replies, please don't ask

Louis Krupp <lkrupp@nospam.pssw.com.invalid>: Jun 05 06:03AM -0600

On Fri, 05 Jun 2015 07:43:54 -0400, Victor Bazarov
><string>, why are you concerned with '+' needing it? Seems evident to
>me that in order to use the 's' suffix you either rely on the indirect
>inclusion of it, or do it explicitly...

I found the 's' suffix here:

http://en.cppreference.com/w/cpp/language/user_literal

<string> is required, it seems, but if it happens to be #included by
<iostream> or <ostream>, then the program compiles without explicit
inclusion. It's not a good idea, though; it would be too easy for
another programmer to move the 'cout' somewhere else, remove the
<ostream> and <stream> #includes, and then be surprised when the
remaining code doesn't compile.

Louis

Daniel <danielaparker@gmail.com>: Jun 05 05:12AM -0700

On Friday, June 5, 2015 at 7:44:07 AM UTC-4, Victor Bazarov wrote:

> The 's' suffix is user-defined, isn't it? It's some kind of language
> (library) extension or did I miss it in the Standard somewhere?

It's in standard C++14, converts a character array literal to basic_string.

Daniel

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 08:57AM -0400

On 6/5/2015 8:12 AM, Daniel wrote:

>> The 's' suffix is user-defined, isn't it? It's some kind of language
>> (library) extension or did I miss it in the Standard somewhere?

> It's in standard C++14, converts a character array literal to basic_string.

OK, thanks. So, since 'basic_string' is a template declared in
<string>, it must be included to provide the conversion, so there is no
surprise that operator+ is also resolved, right?

V
--
I do not respond to top-posted replies, please don't ask

"Öö Tiib" <ootiib@hot.ee>: Jun 05 07:23AM -0700

On Friday, 5 June 2015 15:57:31 UTC+3, Victor Bazarov wrote:

> OK, thanks. So, since 'basic_string' is a template declared in
> <string>, it must be included to provide the conversion, so there is no
> surprise that operator+ is also resolved, right?

It is all about

Bo Persson <bop@gmb.dk>: Jun 05 05:40PM +0200

On 2015-06-05 12:50, Stefan Ram wrote:
> actually wrong, because it actually needs to

> #include <string>

> ?

Yes, you should include all the headers you need.

On the other hand, any standard C++ header is allowed to include any
other standard C++ header, so sometimes the header you need happens to
be included anyway.

Bo Persson

Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 12:31PM -0400

On 6/5/2015 11:59 AM, Stefan Ram wrote:
>> Yes, you should include all the headers you need.

> And /do/ I need the header »<string>« for

> ::std::cout << ""s;

You need it for

""s

AFAICT, regardless of how you use it.

> ? After all, AFAIK, I don't need »<cstring>« for

> ::std::cout << "";

No, this is part of <ostream>, included by <iostream>.

> . (And what about »""s+""s«?)

Since to use

""s

requires <string>, anything else is moot, isn't it?

V
--
I do not respond to top-posted replies, please don't ask

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, June 5, 2015

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me