- Concatenating strings efficiently - 13 Updates
- My C++ synchronization objects library was updated - 1 Update
- Concatenating strings efficiently - 1 Update
- Value of s[s.size()] when s is a string - 3 Updates
Paul <pepstein5@gmail.com>: Oct 17 02:43AM -0700 A popular coding-interview question is copy-pasted below. String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z). At the end of the post, I will put my ultra-naive C++ solution. The standard way of approaching this problem (which will be tackled by users of other languages, too) assumes that operations like += are inefficient because s+="12" involves moving everything in the string. The conventional wisdom is therefore that you need to do something more sophisticated like creating a stringBuilder class. But aren't basic string operations like "+=" optimised for efficiency in C++? Is there anything wrong with my solution below? Thanks, Paul #include <iostream> #include <string> // Run length encoding as in CTCI // Return a string which represents // the number of times each letter occurred in a run. std::string encode(const std::string& str) { std::string result; if(str.empty()) return result; char prev = str[0]; int counter = 1; for(int i = 1; i < str.size(); ++i) if(str[i] == prev) ++counter; else { result += prev; result += std::to_string(counter); prev = str[i]; counter = 1; } result += prev; result += std::to_string(counter); return result; } int main() // Some basic tests which all pass. { std::cout << encode("a") << "\n" << encode ("aaa") << "\n" << encode ("aabbc"); } |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 17 10:39AM On Wed, 2018-10-17, Paul wrote: > inefficient because s+="12" involves moving everything in the > string. The conventional wisdom is therefore that you need to do > something more sophisticated like creating a stringBuilder class. There's one already: std::ostringstream. > But aren't basic string operations like "+=" optimised for > efficiency in C++? Yes, in the sense that it's almost never something to worry about. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Paul <pepstein5@gmail.com>: Oct 17 04:37AM -0700 On Wednesday, October 17, 2018 at 11:39:41 AM UTC+1, Jorgen Grahn wrote: > > But aren't basic string operations like "+=" optimised for > > efficiency in C++? > Yes, in the sense that it's almost never something to worry about. Hi Jorgen, Thanks for your reply. But should ostringstream be used or is my solution fine? Is the text wrong for suggesting that this is a problem. Admittedly, the main audience is java coders but it should at least say something like "If you're using C++, the naive solution works perfectly fine, as is." Here's what the text says about a java solution that basically translates my solution to java. The runtime is O(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in O( n^2) time (see StringBuilder on pg 89). We can fix this by using a StringBuilder. ... Paul |
"Öö Tiib" <ootiib@hot.ee>: Oct 17 05:19AM -0700 On Wednesday, 17 October 2018 14:37:20 UTC+3, Paul wrote: > translates my solution to java. > The runtime is O(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in O( n^2) time (see StringBuilder on pg 89). > We can fix this by using a StringBuilder. ... The string::operator+= is implemented using string::append not using operator+(string,string) like the naive document seems to claim. If you want to optimize away any mid-way allocations then do result.reserve(str.size()*2) first since increasing by factor of 2 is what that "packing" usually achieves. |
"Öö Tiib" <ootiib@hot.ee>: Oct 17 05:47AM -0700 On Wednesday, 17 October 2018 15:19:26 UTC+3, Öö Tiib wrote: > result.reserve(str.size()*2) first since increasing > by factor of 2 is what that "packing" usually > achieves. Forgot to mention that this approach (reserve + appends) usually beats all those ostringstream solutions by some margin. |
Paul <pepstein5@gmail.com>: Oct 17 06:29AM -0700 On Wednesday, October 17, 2018 at 1:19:26 PM UTC+1, Öö Tiib wrote: > result.reserve(str.size()*2) first since increasing > by factor of 2 is what that "packing" usually > achieves. In case we attack the "naive document" too strongly, the context is that this is a programming problem to be solved in any language. The majority of people would choose java. So the claim about needing a StringBuilder is aimed at java coders. Is it still wrong? Why would java and C++ have different += functions? Paul |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 17 01:39PM On Wed, 2018-10-17, Paul wrote: >> Yes, in the sense that it's almost never something to worry about. > Thanks for your reply. > But should ostringstream be used or is my solution fine? I like ostringstreams because they're streams, and I like Unix pipelines. If your function read from an istream and wrote to an ostream, it would be useful not only for short strings but for gigabytes of data. I also prefer ostream formatting to foo + std::to_string(bar) which smells like Java and doesn't let you control the formatting of bar. On the other hand, if you just need to concatenate three strings into a fourth string, it's clearer to write a + b + c than to create an ostringstream, stream into it, and pull out the resulting string. > sequences. It's slow because string concatenation operates in O( > n^2) time (see StringBuilder on pg 89). We can fix this by using a > StringBuilder. ... I think Tiib answered this. AFAICT, that text is about Java, and you cannot apply it to programming in general. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Bo Persson <bop@gmb.dk>: Oct 17 04:58PM +0200 On 2018-10-17 15:29, Paul wrote: > So the claim about needing a StringBuilder is aimed at java > coders. Is it still wrong? Why would java and C++ have different > += functions? Why would they not be different? A C++ std::string is mutable, so can be modified in place. If it has enough extra room at the end, appending a (small) string can be extremely fast. And replacing "aaa" with the shorter "a3" can always be done in place, without any reallocation. A Java string cannot be modified once created, so we need a separate StringBuilder to build strings. Bo Persson |
Manfred <noname@add.invalid>: Oct 17 05:19PM +0200 On 10/17/2018 2:19 PM, Öö Tiib wrote: > result.reserve(str.size()*2) first since increasing > by factor of 2 is what that "packing" usually > achieves. In fact that is the worst case length of the encoded string. |
boltar@cylonhq.com: Oct 17 03:25PM On Wed, 17 Oct 2018 16:58:00 +0200 >A Java string cannot be modified once created, so we need a separate >StringBuilder to build strings. The more I find out about Java the more I wonder what Gosling was smoking when he "designed" it. |
legalize+jeeves@mail.xmission.com (Richard): Oct 17 05:19PM [Please do not mail me a copy of your followup] Paul <pepstein5@gmail.com> spake the secret code >Is there anything wrong with my solution below? No. It's fine because it passes your tests and it's not worth optimizing further unless your profiling shows this to be a bottleneck. As an exercise, I took your problem and wrote an alternative solution just for the fun of it. I'm not saying mine is "better", just an alternative. I had fun writing it, thanks for the problem idea! #include <algorithm> #include <cassert> #include <iostream> #include <string> std::string encode(const std::string & str) { std::string result{str}; size_t dest = 0; int counter = 0; for (size_t source = 0; source < str.size(); ++source) { if (str[source] == str[dest]) { if (counter == 0) { result[dest++] = str[source]; } ++counter; } else { if (counter > 1) { std::string count = std::to_string(counter); std::copy(count.begin(), count.end(), &result[dest]); dest += count.size(); counter = 1; } result[dest++] = str[source]; } } if (counter > 1) { std::string count = std::to_string(counter); std::copy(count.begin(), count.end(), &result[dest]); dest += count.size(); } if (dest < str.size()) { result.erase(dest); } return result; } int main() { assert(std::string{"a"} == encode("a")); assert(std::string{"a3"} == encode("aaa")); assert(std::string{"a2b2c"} == encode("aabbc")); assert(std::string{"a3bc3d"} == encode("aaabcccd")); } -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Terminals Wiki <http://terminals-wiki.org> The Computer Graphics Museum <http://computergraphicsmuseum.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com> |
"Öö Tiib" <ootiib@hot.ee>: Oct 17 11:15AM -0700 On Wednesday, 17 October 2018 16:29:23 UTC+3, Paul wrote: > So the claim about needing a StringBuilder is aimed at java > coders. Is it still wrong? Why would java and C++ have different > += functions? Because Java does not have keywords to denote immutability. Therefore in Java StringBuilder is a mutable sequence of characters and String is immutable sequence of characters. You assume that since Java needs other class but String for efficiency concatenating strings then C++ should have same problem. That is not the case, std::string does it alone and already very efficiently. In C++ std::string is mutable sequence of characters and const std::string is immutable sequence of characters. Additionally C++17 added std::string_view that can be used to refer to any string, character array or sub-part of such. When using it with string literals it can be compile-time constant. So from C++ job candidate who claims senior level I might ask to implement a program that does such "compression" compile-time. Such request would not make even sense in Java. ;) |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Oct 17 02:34PM -0700 On 10/17/2018 2:43 AM, Paul wrote: > But aren't basic string operations like "+=" optimised for efficiency in C++? > Is there anything wrong with my solution below? > Thanks, [snip code] That works fine. Since we can assume the string only has upper and lower case (a - z), we can store abcdxxxxxefgh as abcdx5efgh instead of: a1b1c1d1x5e1f1g1h1. The decoder can check for bytes that are out-of-range, perhaps isdigit(). If a byte is outside the range, then it is reading a character count. |
Horizon68 <horizon@horizon.com>: Oct 17 09:39AM -0700 Hello.. My C++ synchronization objects library was updated, now it is much more stable and fast. You can download the new updated version from: https://sites.google.com/site/scalable68/c-synchronization-objects-library Thank you, Amine Moulay Ramdane. |
ram@zedat.fu-berlin.de (Stefan Ram): Oct 17 04:37PM >A Java string cannot be modified once created, so we need a separate >StringBuilder to build strings. Instances of the class »java.lang.String« can be concatenated with not StringBuilder in sight. Main.java public final class Main { public static void main( final java.lang.String[] args ) { final java.lang.String a = "a"; final java.lang.String b = "b"; final java.lang.String ab = a + b; java.lang.System.out.println( ab ); }} transcript ab |
Juha Nieminen <nospam@thanks.invalid>: Oct 17 07:33AM > AFAIK C++-strings are not guaranteed to be zero-terminated without > calling c_str(). I.e. when you do s[s.size()] you may be access the > string out of bounds. If you take a raw pointer to the contents with s.data() or &s[0] then you are correct: It's not guaranteed to be a null terminated raw string. However, the question was about s[s.size()], which is different. |
boltar@cylonhq.com: Oct 17 08:55AM On Tue, 16 Oct 2018 09:40:30 -0700 (PDT) >Can you post how your preprocessor implementation of >std::initializer_list looks like and how it behaves in similar >situation? I've never found a use for constexpr in real world code so that fact that it can't be emulated in C is irrelevant to me and auto is dynamic typing and if you want that use a scripting language, I hate it. It saves the coder a few seconds but makes debugging a pain. As for complex type or function parameter initialisation, C99 introduced compound literals. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 17 10:50AM +0100 On Wed, 17 Oct 2018 07:33:54 -0000 (UTC) > > string out of bounds. > If you take a raw pointer to the contents with s.data() ... then > you are correct: It's not guaranteed to be a null terminated raw string. std::string::data() does terminate the string with null from C++11 onwards, so there seems to be no difference from std::string::c_str(). This change probably arises because, from C++11 onwards, std::string::size() must be of constant complexity so null termination won't have any significant effect on efficiency. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment