- Elementary string handling -- trying to optimise big O - 8 Updates
- size_t - 5 Updates
- Need to #include <string>? - 5 Updates
- Need to #include <string>? - 7 Updates
Paul <pepstein5@gmail.com>: Jun 05 09:10AM -0700 I am writing code to compress string representations by representing (for example) "aaa" by a3. The details of the problem, together with my solution, are provided below. I'm a bit concerned about my Append functions. The reason I wrote them this way, rather than simply using the normal library function, append, is that push_back guarantees amortized constant time (I think), meaning that my version of append is linear in the length of the appended word. In contrast to this, I couldn't find any guarantee that standard append is any better than being linear in the length of the concatenated word. Is this reasoning sound? Many thanks for your help. Paul #include <string> // Implementing a method to perform basic string compression using the counts // of repeated characters. For example, the string aabcccccaaa would become // a2b1c5a3. If the compressed string would not become smaller than the original // string, return the original string. You can assume the string has only upper // and lower case letters (a - z). void Append(string& str1, const string& str2) { for(auto c: str2) str1.push_back(c); } void Append(string& str1, int x) { Append(str1, to_string(x)); } string compression(string input) { const int inputSize = input.length(); if(inputSize <= 2) return input; string results; int repeated = 1; // Counting occurrences of each character for(int i = 1; i < inputSize; ++i) if(input[i] == input[i-1]) ++repeated; else { results.push_back(input[i-1]); Append(results, repeated); repeated = 1; } results.push_back(input[inputSize - 1]); Append(results, repeated); return results.length() < inputSize ? results : input; } |
Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 11:54AM -0500 Paul <pepstein5@gmail.com> wrote in > I am writing code to compress string representations by representing > (for example) "aaa" by a3. So, how can you tell later if the original string was "aaa" or "a3"? Oh, I see, it's always a char and a count, so "abc" gets "compressed" to "a1b1c1". OK, fine, but how do you uncompress "a111b5"? It might be "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbb" or "a1bbbbb". > couldn't find any guarantee that standard append is any better than > being linear in the length of the concatenated word. Is this > reasoning sound? Many thanks for your help. Regardless of standardise, your Append() is most probably slower than library string append. Your Append() is pushing each char separately, meaning that for each char std::string needs to check for free buffer space and update the string length field. Standard append does these things only once for the appended string. But as always with optimizations, there is no other way to find out the real truth than to measure it. Please post back your speed testing results here, it would be interesting to know such things. > Append(str1, to_string(x)); > } > string compression(string input) If you are concerned about speed, this should be const string& input. > results.push_back(input[inputSize - 1]); > Append(results, repeated); > return results.length() < inputSize ? results : input; Oh, I see you still cannot tell if "a3" should be uncompressed to "aaa" or "a3" ;-) > } Cheers Paavo |
Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 12:12PM -0500 Paavo Helde <myfirstname@osa.pri.ee> wrote in >> { >> Append(str1, to_string(x)); >> } Another note: It appears the string append is called only once per each to_string() function call. If this undisclosed to_string() indeed does conversion to the decimal string representation (equiv to printf("%d")), then you can probably forget any concerns about the speed of your Append() - the int-to-string conversion is a bit heavyweight operation and will probably dominate in the timings. Your profiler will tell you more exactly. hth Paavo |
Luca Risolia <luca.risolia@linux-projects.org>: Jun 05 07:21PM +0200 Il 05/06/2015 18:10, Paul ha scritto: > to this, I couldn't find any guarantee that standard append is any > better than being linear in the length of the concatenated word. Is > this reasoning sound? Theory is one thing, reality is (often) another thing. For example, std::vector is known to be preferable than std::list in many practical cases where complexity theory would otherwise suggest using the latter. As a general rule, I suggest that you at least measure the performance of your standard library implementation against concrete use cases before considering reinventing it. |
Paavo Helde <myfirstname@osa.pri.ee>: Jun 05 12:50PM -0500 Paul <pepstein5@gmail.com> wrote in > (for example) "aaa" by a3. The details of the problem, together with > string compression(string input) > { [...] > return results.length() < inputSize ? results : input; > } It is interesting to note that this approach attempts to defeat the pigeonhole principle and compress all inputs to at least the same size than the original. However, this is mathematically not possible (assuming lossless compression). See: http://en.wikipedia.org/wiki/Pigeonhole_principle#Uses_and_applications So, it seems some more work is needed with the design of this compression algorithm ;-) Cheers Paavo |
Mel <mel@zzzzz.com>: Jun 05 09:49PM +0300 On Fri, 05 Jun 2015 12:50:03 -0500, Paavo Helde > Paul <pepstein5@gmail.com> wrote in > news:11bd72d2-fdbf-4447-9bf0-a32078836aa9@googlegroups.com: > > I am writing code to compress string representations by representing > > (for example) "aaa" by a3. The details of the problem, together with > > } > It is interesting to note that this approach attempts to defeat the > pigeonhole principle and compress all inputs to at least the same size than > the original. However, this is mathematically not possible (assuming > lossless compression). See: Question is: is this compession or hashing :))) -- Press any key to continue or any other to quit |
Paul <pepstein5@gmail.com>: Jun 05 11:51AM -0700 On Friday, June 5, 2015 at 6:50:18 PM UTC+1, Paavo Helde wrote: > algorithm ;-) > Cheers > Paavo Thanks, Paavo You make a number of interesting points. I'll consider some of them in turn. Uncompression is uniquely determined by the constraint, given in the comments, that inputs only contain chars that are letters of the English alphabet. Furthermore, supposing that we ignore this constraint and thereby conclude that some strings can be uncompressed in different ways. I don't understand why this would be any type of problem -- we would simply have a non-injective function. Nobody said that we wanted to uncompress strings. Excellent point about the function signature -- I was careless here. I don't understand why you object to this code: return results.length() < inputSize ? results : input; If you believe that it sometimes results in a runtime error, please can you give inputs that cause incorrect output? If you believe that this is inefficient, please can you explain why? Thank You, Paul |
Paul <pepstein5@gmail.com>: Jun 05 11:55AM -0700 On Friday, June 5, 2015 at 7:49:25 PM UTC+1, Mel wrote: > (assuming > > lossless compression). See: > Question is: is this compession or hashing :))) I am trying to implement the above function and "compression" is merely the name I am choosing to give the function. If I called the function "hashing" or "elephant" that wouldn't have changed my question. I don't understand the computer science theory of this at all, but I'm reading CLR on algorithms so they might explain this. Paul |
Doug Mika <dougmmika@gmail.com>: Jun 05 11:25AM -0700 I found this on a discussion group, and it said that the following will cause an infinite loop, but why? for (size_t i = strlen (str) - 1; i >= 0; i--){ ... } |
Wouter van Ooijen <wouter@voti.nl>: Jun 05 08:33PM +0200 Doug Mika schreef op 05-Jun-15 om 8:25 PM: > for (size_t i = strlen (str) - 1; i >= 0; i--){ > ... > } siez_t is likely to be (I dunno, it might even be required to be) an unsigned type. An unsigned variable will always be >= 0. Wouter |
scott@slp53.sl.home (Scott Lurndal): Jun 05 06:39PM >for (size_t i = strlen (str) - 1; i >= 0; i--){ > ... >} size_t is an unsigned type, it can never be negative. |
Doug Mika <dougmmika@gmail.com>: Jun 05 11:39AM -0700 On Friday, June 5, 2015 at 1:33:30 PM UTC-5, Wouter van Ooijen wrote: > siez_t is likely to be (I dunno, it might even be required to be) an > unsigned type. An unsigned variable will always be >= 0. > Wouter But of course, for the loop to stop we need to fall below 0, which is not going to be possible with size_t as our data type...I should have looked at it closer. |
Mel <mel@zzzzz.com>: Jun 05 09:50PM +0300 On Fri, 5 Jun 2015 11:25:45 -0700 (PDT), Doug Mika > I found this on a discussion group, and it said that the following will cause an infinite loop, but why? > for (size_t i = strlen (str) - 1; i >= 0; i--){ > ... > } size_t is unsigned... -- Press any key to continue or any other to quit |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 10:50AM The program #include <iostream> #include <ostream> using namespace ::std::literals; int main() { ::std::cout <<( "a"s + "b"s )<< "\n"s; ::std::cout <<( "a"s < "b"s )<< "\n"s; } prints ab 1 here. But I am correct when I suspect that it is actually wrong, because it actually needs to #include <string> ? |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 11:00AM >{ ::std::cout <<( "a"s + "b"s )<< "\n"s; > ::std::cout <<( "a"s < "b"s )<< "\n"s; } And without the »+« and »<«, is »#include <string>« already required for »::std:cout << "\n"s«? |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 03:59PM >>#include <string> >>? >Yes, you should include all the headers you need. And /do/ I need the header »<string>« for ::std::cout << ""s; ? After all, AFAIK, I don't need »<cstring>« for ::std::cout << ""; . (And what about »""s+""s«?) |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 05:36PM >- the int-to-string conversion is a bit heavyweight operation and will >probably dominate in the timings. Your profiler will tell you more exactly. #include <iostream> // ::std::cout #include <ostream> // << #include <string> static constexpr char const * string[] = { "0", "1", "2", "3", "4", "5", "6" }; static inline char const * to_string ( int const i ) { return i < 7 ? string[ i ] : ::std::to_string( i ).c_str(); } static void test( ::std::ostream & out ) { out << to_string( 0 )<< '\n'; out << to_string( 1 )<< '\n'; out << to_string( 2 )<< '\n'; out << to_string( 20 )<< '\n'; out << to_string( 40 )<< '\n'; } int main(){ test( ::std::cout ); } |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 05 06:38PM >for (size_t i = strlen (str) - 1; i >= 0; i--){ > ... >} Works fine here and prints »alpha«. #include <stdio.h> #include <string.h> int main() { char * str = "ahpla"; for (size_t i = strlen (str) - 1; i >= 0; i--){ if( i > 9999 )break; printf( "%c", i[ str ] ); } puts( "" ); } |
Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 07:43AM -0400 On 6/5/2015 7:00 AM, Stefan Ram wrote: >> ::std::cout <<( "a"s < "b"s )<< "\n"s; } > And without the »+« and »<«, is »#include <string>« > already required for »::std:cout << "\n"s«? The 's' suffix is user-defined, isn't it? It's some kind of language (library) extension or did I miss it in the Standard somewhere? If your code that has the 's' suffix compiles OK without explicit inclusion of <string>, why are you concerned with '+' needing it? Seems evident to me that in order to use the 's' suffix you either rely on the indirect inclusion of it, or do it explicitly... V -- I do not respond to top-posted replies, please don't ask |
Louis Krupp <lkrupp@nospam.pssw.com.invalid>: Jun 05 06:03AM -0600 On Fri, 05 Jun 2015 07:43:54 -0400, Victor Bazarov ><string>, why are you concerned with '+' needing it? Seems evident to >me that in order to use the 's' suffix you either rely on the indirect >inclusion of it, or do it explicitly... I found the 's' suffix here: http://en.cppreference.com/w/cpp/language/user_literal <string> is required, it seems, but if it happens to be #included by <iostream> or <ostream>, then the program compiles without explicit inclusion. It's not a good idea, though; it would be too easy for another programmer to move the 'cout' somewhere else, remove the <ostream> and <stream> #includes, and then be surprised when the remaining code doesn't compile. Louis |
Daniel <danielaparker@gmail.com>: Jun 05 05:12AM -0700 On Friday, June 5, 2015 at 7:44:07 AM UTC-4, Victor Bazarov wrote: > The 's' suffix is user-defined, isn't it? It's some kind of language > (library) extension or did I miss it in the Standard somewhere? It's in standard C++14, converts a character array literal to basic_string. Daniel |
Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 08:57AM -0400 On 6/5/2015 8:12 AM, Daniel wrote: >> The 's' suffix is user-defined, isn't it? It's some kind of language >> (library) extension or did I miss it in the Standard somewhere? > It's in standard C++14, converts a character array literal to basic_string. OK, thanks. So, since 'basic_string' is a template declared in <string>, it must be included to provide the conversion, so there is no surprise that operator+ is also resolved, right? V -- I do not respond to top-posted replies, please don't ask |
"Öö Tiib" <ootiib@hot.ee>: Jun 05 07:23AM -0700 On Friday, 5 June 2015 15:57:31 UTC+3, Victor Bazarov wrote: > OK, thanks. So, since 'basic_string' is a template declared in > <string>, it must be included to provide the conversion, so there is no > surprise that operator+ is also resolved, right? It is all about |
Bo Persson <bop@gmb.dk>: Jun 05 05:40PM +0200 On 2015-06-05 12:50, Stefan Ram wrote: > actually wrong, because it actually needs to > #include <string> > ? Yes, you should include all the headers you need. On the other hand, any standard C++ header is allowed to include any other standard C++ header, so sometimes the header you need happens to be included anyway. Bo Persson |
Victor Bazarov <v.bazarov@comcast.invalid>: Jun 05 12:31PM -0400 On 6/5/2015 11:59 AM, Stefan Ram wrote: >> Yes, you should include all the headers you need. > And /do/ I need the header »<string>« for > ::std::cout << ""s; You need it for ""s AFAICT, regardless of how you use it. > ? After all, AFAIK, I don't need »<cstring>« for > ::std::cout << ""; No, this is part of <ostream>, included by <iostream>. > . (And what about »""s+""s«?) Since to use ""s requires <string>, anything else is moot, isn't it? V -- I do not respond to top-posted replies, please don't ask |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment