soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Concatenating strings efficiently - 13 Updates
My C++ synchronization objects library was updated - 1 Update
Concatenating strings efficiently - 1 Update
Value of s[s.size()] when s is a string - 3 Updates

Paul <pepstein5@gmail.com>: Oct 17 02:43AM -0700

A popular coding-interview question is copy-pasted below.

String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).

At the end of the post, I will put my ultra-naive C++ solution. The standard
way of approaching this problem (which will be tackled by users of other
languages, too) assumes that operations like += are inefficient because
s+="12" involves moving everything in the string.
The conventional wisdom is therefore that you need to do something more
sophisticated like creating a stringBuilder class.
But aren't basic string operations like "+=" optimised for efficiency in C++?

Is there anything wrong with my solution below?
Thanks,

Paul

#include <iostream>
#include <string>
// Run length encoding as in CTCI
// Return a string which represents
// the number of times each letter occurred in a run.

std::string encode(const std::string& str)
{
std::string result;
if(str.empty())
return result;

char prev = str[0];
int counter = 1;
for(int i = 1; i < str.size(); ++i)
if(str[i] == prev)
++counter;
else
{
result += prev;
result += std::to_string(counter);
prev = str[i];
counter = 1;
}
result += prev;
result += std::to_string(counter);

return result;
}

int main() // Some basic tests which all pass.
{
std::cout << encode("a") << "\n" << encode ("aaa") << "\n" << encode ("aabbc");

}

Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 17 10:39AM

On Wed, 2018-10-17, Paul wrote:
> inefficient because s+="12" involves moving everything in the
> string. The conventional wisdom is therefore that you need to do
> something more sophisticated like creating a stringBuilder class.

There's one already: std::ostringstream.

> But aren't basic string operations like "+=" optimised for
> efficiency in C++?

Yes, in the sense that it's almost never something to worry about.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paul <pepstein5@gmail.com>: Oct 17 04:37AM -0700

On Wednesday, October 17, 2018 at 11:39:41 AM UTC+1, Jorgen Grahn wrote:

> > But aren't basic string operations like "+=" optimised for
> > efficiency in C++?

> Yes, in the sense that it's almost never something to worry about.

Hi Jorgen,

Thanks for your reply.
But should ostringstream be used or is my solution fine?
Is the text wrong for suggesting that this is a problem.
Admittedly, the main audience is java coders but it should
at least say something like "If you're using C++, the naive
solution works perfectly fine, as is."
Here's what the text says about a java solution that basically
translates my solution to java.

The runtime is O(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in O( n^2) time (see StringBuilder on pg 89).
We can fix this by using a StringBuilder. ...

Paul

"Öö Tiib" <ootiib@hot.ee>: Oct 17 05:19AM -0700

On Wednesday, 17 October 2018 14:37:20 UTC+3, Paul wrote:
> translates my solution to java.

> The runtime is O(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in O( n^2) time (see StringBuilder on pg 89).
> We can fix this by using a StringBuilder. ...

The string::operator+= is implemented using
string::append not using operator+(string,string)
like the naive document seems to claim. If you want to
optimize away any mid-way allocations then do
result.reserve(str.size()*2) first since increasing
by factor of 2 is what that "packing" usually
achieves.

"Öö Tiib" <ootiib@hot.ee>: Oct 17 05:47AM -0700

On Wednesday, 17 October 2018 15:19:26 UTC+3, Öö Tiib wrote:
> result.reserve(str.size()*2) first since increasing
> by factor of 2 is what that "packing" usually
> achieves.

Forgot to mention that this approach (reserve +
appends) usually beats all those ostringstream
solutions by some margin.

Paul <pepstein5@gmail.com>: Oct 17 06:29AM -0700

On Wednesday, October 17, 2018 at 1:19:26 PM UTC+1, Öö Tiib wrote:
> result.reserve(str.size()*2) first since increasing
> by factor of 2 is what that "packing" usually
> achieves.

In case we attack the "naive document" too strongly,
the context is that this is a programming problem to be solved
in any language. The majority of people would choose java.
So the claim about needing a StringBuilder is aimed at java
coders. Is it still wrong? Why would java and C++ have different
+= functions?

Paul

Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 17 01:39PM

On Wed, 2018-10-17, Paul wrote:

>> Yes, in the sense that it's almost never something to worry about.

> Thanks for your reply.
> But should ostringstream be used or is my solution fine?

I like ostringstreams because they're streams, and I like Unix
pipelines. If your function read from an istream and wrote to an
ostream, it would be useful not only for short strings but for
gigabytes of data.

I also prefer ostream formatting to foo + std::to_string(bar) which
smells like Java and doesn't let you control the formatting of bar.

On the other hand, if you just need to concatenate three strings into
a fourth string, it's clearer to write a + b + c than to create an
ostringstream, stream into it, and pull out the resulting string.

> sequences. It's slow because string concatenation operates in O(
> n^2) time (see StringBuilder on pg 89). We can fix this by using a
> StringBuilder. ...

I think Tiib answered this. AFAICT, that text is about Java, and you
cannot apply it to programming in general.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Bo Persson <bop@gmb.dk>: Oct 17 04:58PM +0200

On 2018-10-17 15:29, Paul wrote:
> So the claim about needing a StringBuilder is aimed at java
> coders. Is it still wrong? Why would java and C++ have different
> += functions?

Why would they not be different?

A C++ std::string is mutable, so can be modified in place. If it has
enough extra room at the end, appending a (small) string can be
extremely fast.

And replacing "aaa" with the shorter "a3" can always be done in place,
without any reallocation.

A Java string cannot be modified once created, so we need a separate
StringBuilder to build strings.

Bo Persson

Manfred <noname@add.invalid>: Oct 17 05:19PM +0200

On 10/17/2018 2:19 PM, Öö Tiib wrote:
> result.reserve(str.size()*2) first since increasing
> by factor of 2 is what that "packing" usually
> achieves.

In fact that is the worst case length of the encoded string.

boltar@cylonhq.com: Oct 17 03:25PM

On Wed, 17 Oct 2018 16:58:00 +0200
>A Java string cannot be modified once created, so we need a separate
>StringBuilder to build strings.

The more I find out about Java the more I wonder what Gosling was smoking
when he "designed" it.

legalize+jeeves@mail.xmission.com (Richard): Oct 17 05:19PM

[Please do not mail me a copy of your followup]

Paul <pepstein5@gmail.com> spake the secret code

>Is there anything wrong with my solution below?

No. It's fine because it passes your tests and it's not worth
optimizing further unless your profiling shows this to be a
bottleneck.

As an exercise, I took your problem and wrote an alternative solution
just for the fun of it. I'm not saying mine is "better", just an
alternative. I had fun writing it, thanks for the problem idea!

#include <algorithm>
#include <cassert>
#include <iostream>
#include <string>

std::string encode(const std::string & str)
{
std::string result{str};
size_t dest = 0;
int counter = 0;
for (size_t source = 0; source < str.size(); ++source)
{
if (str[source] == str[dest])
{
if (counter == 0)
{
result[dest++] = str[source];
}
++counter;
}
else
{
if (counter > 1)
{
std::string count = std::to_string(counter);
std::copy(count.begin(), count.end(), &result[dest]);
dest += count.size();
counter = 1;
}
result[dest++] = str[source];
}
}
if (counter > 1)
{
std::string count = std::to_string(counter);
std::copy(count.begin(), count.end(), &result[dest]);
dest += count.size();
}
if (dest < str.size())
{
result.erase(dest);
}
return result;
}

int main()
{
assert(std::string{"a"} == encode("a"));
assert(std::string{"a3"} == encode("aaa"));
assert(std::string{"a2b2c"} == encode("aabbc"));
assert(std::string{"a3bc3d"} == encode("aaabcccd"));
}
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

"Öö Tiib" <ootiib@hot.ee>: Oct 17 11:15AM -0700

On Wednesday, 17 October 2018 16:29:23 UTC+3, Paul wrote:
> So the claim about needing a StringBuilder is aimed at java
> coders. Is it still wrong? Why would java and C++ have different
> += functions?

Because Java does not have keywords to denote immutability. Therefore
in Java StringBuilder is a mutable sequence of characters and String
is immutable sequence of characters. You assume that since Java needs
other class but String for efficiency concatenating strings then C++
should have same problem.

That is not the case, std::string does it alone and already very
efficiently. In C++ std::string is mutable sequence of characters
and const std::string is immutable sequence of characters.

Additionally C++17 added std::string_view that can be used to
refer to any string, character array or sub-part of such. When
using it with string literals it can be compile-time constant.
So from C++ job candidate who claims senior level I might ask to
implement a program that does such "compression" compile-time.
Such request would not make even sense in Java. ;)

"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Oct 17 02:34PM -0700

On 10/17/2018 2:43 AM, Paul wrote:
> But aren't basic string operations like "+=" optimised for efficiency in C++?

> Is there anything wrong with my solution below?
> Thanks,
[snip code]

That works fine. Since we can assume the string only has upper and lower
case (a - z), we can store abcdxxxxxefgh as abcdx5efgh instead of:
a1b1c1d1x5e1f1g1h1.

The decoder can check for bytes that are out-of-range, perhaps
isdigit(). If a byte is outside the range, then it is reading a
character count.

My C++ synchronization objects library was updated

Horizon68 <horizon@horizon.com>: Oct 17 09:39AM -0700

Hello..

My C++ synchronization objects library was updated,
now it is much more stable and fast.

You can download the new updated version from:

https://sites.google.com/site/scalable68/c-synchronization-objects-library

Thank you,
Amine Moulay Ramdane.

Concatenating strings efficiently

ram@zedat.fu-berlin.de (Stefan Ram): Oct 17 04:37PM

>A Java string cannot be modified once created, so we need a separate
>StringBuilder to build strings.

Instances of the class »java.lang.String« can be
concatenated with not StringBuilder in sight.

Main.java

public final class Main
{ public static void main( final java.lang.String[] args )
{ final java.lang.String a = "a";
final java.lang.String b = "b";
final java.lang.String ab = a + b;
java.lang.System.out.println( ab ); }}

transcript

ab

Value of s[s.size()] when s is a string

Juha Nieminen <nospam@thanks.invalid>: Oct 17 07:33AM

> AFAIK C++-strings are not guaranteed to be zero-terminated without
> calling c_str(). I.e. when you do s[s.size()] you may be access the
> string out of bounds.

If you take a raw pointer to the contents with s.data() or &s[0] then
you are correct: It's not guaranteed to be a null terminated raw string.

However, the question was about s[s.size()], which is different.

boltar@cylonhq.com: Oct 17 08:55AM

On Tue, 16 Oct 2018 09:40:30 -0700 (PDT)

>Can you post how your preprocessor implementation of
>std::initializer_list looks like and how it behaves in similar
>situation?

I've never found a use for constexpr in real world code so that fact that
it can't be emulated in C is irrelevant to me and auto is dynamic typing and
if you want that use a scripting language, I hate it. It saves the coder a few
seconds but makes debugging a pain. As for complex type or function parameter
initialisation, C99 introduced compound literals.

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 17 10:50AM +0100

On Wed, 17 Oct 2018 07:33:54 -0000 (UTC)
> > string out of bounds.

> If you take a raw pointer to the contents with s.data() ... then
> you are correct: It's not guaranteed to be a null terminated raw string.

std::string::data() does terminate the string with null from C++11
onwards, so there seems to be no difference from std::string::c_str().
This change probably arises because, from C++11 onwards,
std::string::size() must be of constant complexity so null termination
won't have any significant effect on efficiency.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, October 17, 2018

Digest for comp.lang.c++@googlegroups.com - 18 updates in 4 topics

No comments:

Blog Archive

About Me