- Concatenating strings efficiently - 6 Updates
- Concatenating strings efficiently - 2 Updates
- Why isn't this an incomplete class definition? - 3 Updates
- In template class, temporaries are substituted for const ref arguments -- trying to diagnose why this causes a crash. - 1 Update
- 'Cannot bind packed field' - 2 Updates
- C++ 2017 -- win, lose and draw - 3 Updates
- std::rotate substitute - 1 Update
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Oct 20 02:53AM -0400 Paul wrote: > sophisticated like creating a stringBuilder class. > But aren't basic string operations like "+=" optimised for efficiency in C++? > Is there anything wrong with my solution below? It depends on your definition of "wrong". Below are a few thoughts: 1. encode() should provide required functionality most of the time (assuming the current locale-dependent encoding of counts was agreed with the interviewer). This is not wrong. 2. encode() may fail under extreme memory pressure where more memory-efficient solution would work. Whether this is wrong depends on whether the interviewee asked the interviewer all the correct questions and on interviewer answers. (*) 3. encode() is not most memory-efficient and, at most architectures, not most time-efficient. Whether this is wrong depends on (*). A more memory-efficient (and probably more time-efficient) solution would calculate the exact capacity required by the result, then use only one memory allocation to reserve that capacity in the result, then actually encode. The interviewee would have to solve a couple of challenges to implement it that would allow him or her demonstrate his or her more advanced skills in this solution, specifically: - how to avoid code duplication (demonstrate knowledge of "template method" design pattern and maybe of lambdas); - how to correctly and efficiently compute the encoded size of to-be-encoded string::size_type value, possibly making it a reusable [template] function; - reason about whether locale-independent (or fixed "C" locale) encoding of size_type is more appropriate for this problem; demonstrate how to encode string::size_type to a pre-existing memory buffer without additional allocations; - managerial skills (start with a simple solution, accurately estimate for the interviewer the benefits and implementation cost for every following improvement). - maybe more (throw in parallelism?) Long story short, this interview question can turn out more or less simple (and the ultra-naive C++ solution -- more or less wrong) depending on how far the interviewer plans to go with it and whether interviewee is willing to play along. This may be one reason for why it is popular. HTH -Pavel |
Ralf Goertz <me@myprovider.invalid>: Oct 20 09:03AM +0200 Am 19 Oct 2018 19:36:30 GMT > Yes. I didn't mean to disagree with you, but Ralf seemed to have the > misconception that there is a terminating \0 in any std::string -- > thus carrying an unnecessary handicap over from C into C++. Actually, I didn't think that. Having programmed in C++ for more than two decades (although not as a professional programmer but in order to do my job efficiently) I could not imagine it would be legal to access a string in that way. My "misconception" was that I thought that string was basically a class which (among others, notably iterators) contains two size_t variables, one containing the capacity and the other the size of the string. Accessing it beyond its size would be UB but wouldn't fail as long as it is within the bounds of the capacity. Therefore, operator[](size_t pos) would do nothing more than returning the character stored at "pos". The legality of using str.size() as index complicates things as Manfred put it and I still see no point in being allowed to do so. (I guess this is true for you as well if I read your other post correctly.) I know very little about optimisation but I think the need to check for one special value of "pos" can a heavy burden performance-wise. > (Not that there's really a '\0' in any C string either; it's halfway > there, halfway not. Traditionally, a fertile source of bugs.) Is there not? I thought that's how strings are terminated in C. |
Paul <pepstein5@gmail.com>: Oct 20 08:23AM -0700 On Saturday, October 20, 2018 at 7:54:06 AM UTC+1, Pavel wrote: > the ultra-naive C++ solution -- more or less wrong) depending on how far the > interviewer plans to go with it and whether interviewee is willing to play > along. This may be one reason for why it is popular. Very interesting thoughts. Any ideas for references to implement some of these more advanced ideas? I'm googling the template design method and everything else in your post to try to create a better solutions to learn from this interview, and I think I would do much better with more concrete guidance. Many thanks, Paul |
Manfred <noname@add.invalid>: Oct 20 08:45PM +0200 On 10/20/2018 8:53 AM, Pavel wrote: >> But aren't basic string operations like "+=" optimised for efficiency in C++? >> Is there anything wrong with my solution below? > It depends on your definition of "wrong". Below are a few thoughts: [...] > - managerial skills (start with a simple solution, accurately estimate for the > interviewer the benefits and implementation cost for every following improvement). > - maybe more (throw in parallelism?) OTOH the proposal would have the side effect of being unusable with stream I/O of indefinite length. Although this is not a requirement posed by the assignment as is, it is relevant in the sense that such a RLE algorithm is independent of the length of the string, i.e. it can be implemented as an inline filter. Constraining it to strings whose length is to be known in advance would be a significant limitation of its potential applications. This is one example where optimization should be evaluated in the light of the specific application. For the sake of curiosity, here is how it would like to make it entirely streamlined: =============================================================== #include <iterator> #include <cstdlib> namespace RleIt { using Char = char; using InputIterator = std::istreambuf_iterator<Char>; using OutputIterator = std::ostreambuf_iterator<Char>; // this is one solution to avoid allocating a string for the // formatted counter static void encodeDecimal(OutputIterator& outIt, long val) { if(0 < val) { std::ldiv_t d = std::div(val, 10L); encodeDecimal(outIt, d.quot); // this only works if ascii is a subset of Char *outIt++ = OutputIterator::traits_type::to_char_type('0' + d.rem); } } void encode(InputIterator inIt, OutputIterator outIt) { static const InputIterator endIt{}; // end() static const Char nul{}; // '\0' size_t cnt{0}; Char cur{nul}; while(endIt != inIt) { cur = *inIt++; if(endIt!= inIt && *inIt == cur) { ++cnt; } else { *outIt++ = cur; encodeDecimal(outIt, cnt+1); cur = endIt != inIt ? *inIt : nul; cnt = 0; } } } } // namespace RleIt #include <iostream> int main() { RleIt::encode( RleIt::InputIterator{std::cin}, RleIt::OutputIterator{std::cout}); } =============================================================== $ echo "abaabbcaaaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq" | ./a.out a1b1a2b2c1a3q50 1$ (the last 1 is the terminating LF inserted by 'echo') =============================================================== |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Oct 20 05:39PM -0400 Paul wrote: > On Saturday, October 20, 2018 at 7:54:06 AM UTC+1, Pavel wrote: ... > everything else in your post to try to create a better solutions to learn > from this interview, and I think I would do much better with more > concrete guidance. Please see below. I implemented the more efficient solution I referred to. Admittedly, it is difficult to go all the way to this solution during the interview; but it is still possible to keep moving in the right direction by incremental improvements: the interviewee will either gain himself/herself more time to finish the problem or the interviewer moves on to another problem when it is obvious for for them the interviewee knows what s/he is doing and will finish the job eventually. It is sometimes difficult to write a generic template method like RunDumbRle below correctly off the bat. I usually write at least 2 instances and make them as similar as possible after which the correct line between generic and specific behaviors usually becomes more clear. As an illustration, I left the two original concrete functions in the comments below the code. HTH, -Pavel > Many thanks, > Paul // ------------ cut here --------- #include <algorithm> #include <cassert> #include <iostream> #include <string> #include <vector> using namespace std; typedef string::size_type Size; Size CalcEncodedCounterSize(Size counter) { for (Size result = 1; ; ++result) if ((counter /= 10) == 0) return result; } vector<Size> CalcEncodedCounterSizeLimits(Size maxSize) { const Size MaxEncodedSize = CalcEncodedCounterSize(maxSize); vector<Size> result; result.push_back(9); for (vector<Size>::size_type nNines = 2; nNines < MaxEncodedSize; ++nNines) result.push_back(result.back() * 10 + 9); return result; } Size GetEncodedCounterSize(Size counter) { static const Size MaxCounter = string().max_size(); assert(counter <= MaxCounter); static const vector<Size> EncodedCounterSizeLimits( CalcEncodedCounterSizeLimits(MaxCounter)); const vector<Size>::size_type nLimits = EncodedCounterSizeLimits.size(); for (vector<Size>::size_type i = 0; i < nLimits; ++i) { if (counter <= EncodedCounterSizeLimits[i]) return i + 1; } return nLimits; } void EncodeCounterToString(string& result, Size counter) { const static char Digits[] = "0123456789"; Size fromIndex = result.size(); do { result += Digits[counter % 10]; counter /= 10; } while (!!counter); reverse(result.begin() + fromIndex, result.end()); } template<typename R/*result type*/, typename I/*result initializer*/, typename U/*result updater*/> R RunDumbRle(const string& str, I initFunc, U updateFunc) { assert(!str.empty()); R result(initFunc()); char prevChar = str[0]; const Size strLen = str.size(); if (strLen == 1) { updateFunc(result, 1, prevChar); return result; } Size counter = 1; for (Size i = 1; ; ++i) { if (i == strLen) { updateFunc(result, counter, prevChar); break; } const char curChar = str[i]; if (curChar == prevChar) { ++counter; continue; } updateFunc(result, counter, prevChar); prevChar = curChar; counter = 1; } return result; } string DumbRleEncode(const string& str) { if (str.empty()) return str; const Size resultSize = RunDumbRle<Size>(str, []{return 0;}, [](Size& result, Size counter, char){ result += GetEncodedCounterSize(counter) + 1; }); return RunDumbRle<string>(str, [&resultSize](){ string result; result.reserve(resultSize); return result; }, [](string& result, Size counter, char c) { result += c; EncodeCounterToString(result, counter); }); } int main(int, char*[]) { assert(CalcEncodedCounterSize(1) == 1); assert(GetEncodedCounterSize(1) == 1); assert(DumbRleEncode("") == ""); assert(DumbRleEncode("a") == "a1"); assert(DumbRleEncode("ab") == "a1b1"); assert(DumbRleEncode("abc") == "a1b1c1"); assert(DumbRleEncode("abbbbbbc") == "a1b6c1"); assert(DumbRleEncode("aaaaaaaaaac") == "a10c1"); assert(DumbRleEncode(string(1000, 'b')) == "b1000"); return 0; } /* string DumbRleEncodeOrig(const string& str) { if (str.empty()) return string(); string result; result.reserve(CalcDumbRleEncodedSize(str)); char prevChar = str[0]; const Size strLen = str.size(); if (strLen == 1) { result += prevChar; EncodeCounterToString(result, 1); return result; } Size counter = 1; for (Size i = 1; ; ++i) { if (i == strLen) { result += prevChar; EncodeCounterToString(result, counter); break; } const char curChar = str[i]; if (curChar == prevChar) { ++counter; continue; } result += prevChar; EncodeCounterToString(result, counter); prevChar = curChar; counter = 1; } return result; } Size CalcDumbRleEncodedSize(const string& str) { assert(!str.empty()); Size result = 0; char prevChar = str[0]; Size counter = 1; const Size strLen = str.size(); for (Size i = 1; ; ++i) { if (i == strLen) { result += GetEncodedCounterSize(counter) + 1; break; } const char curChar = str[i]; if (curChar == prevChar) { ++counter; continue; } result += GetEncodedCounterSize(counter) + 1; prevChar = curChar; counter = 1; } return result; } */ |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Oct 20 06:11PM -0400 Manfred wrote: >> - maybe more (throw in parallelism?) > OTOH the proposal would have the side effect of being unusable with stream I/O > of indefinite length. Agree. This is another direction where the interviewee can try steer the interview after providing quick-and-dirty solution to the original problem. The interviewer will less likely play along as this changes his requirements -- but who knows what they have in mind. An interview is as much psychological as a technical challenge... Neat solution, BTW. I like your recursive encodeDecimal, very haskellish. There is no need to compare inIt to endIt more than once though. -Pavel |
ram@zedat.fu-berlin.de (Stefan Ram): Oct 20 03:41PM >result.reserve(2*str.size()) will guarantee that result will accommodate >"a1b1" (i.e. 4 chars) with no reallocation (see also capacity()). >In both cases the terminating '\0' is handled transparently by std::string. BTW: P0966 intends to change the semantics of ::std::basic_string::reserve() in C++2x so that it does not shrink to fit anymore and behaves like ::std::vector::reserve(). |
ram@zedat.fu-berlin.de (Stefan Ram): Oct 20 04:09PM >BTW: P0966 intends to change the semantics of P0966R1 >::std::basic_string::reserve() in C++2x so that >it does not shrink to fit anymore and behaves >like ::std::vector::reserve(). The authors explain: An append function might look like: void Append ( string& out, string_view arg1, string_view arg2, string_view arg3 ) { out.reserve(out.size() + arg1.size() + arg2.size() + arg3.size()); out += arg1; out += arg2; out += arg3; } . But when the reserved capacity for »out« was already increased by a previous »+«, this can actually /shrink/ it again, worsening the overall performance as long as »reserve« has that shrinking behavior. Mitigation for now: if (len > out.capacity()) { out.reserve(len); } . |
Paul <pepstein5@gmail.com>: Oct 20 03:38AM -0700 Below code (after CODE BEGINS marker) compiles and runs successfully with gcc but I would have expected an error. See the part between the asterisks. Here we have an operation returning AndSpecification<T> before AndSpecification<T> has been fully defined. So I would expect the compiler to complain because it doesn't know (at this stage) that first and second are members of AndSpecification<T> I don't know why the compiler accepts the code between the asterisks. I would have said it's an error because AndSpecification<T> has not been fully defined so we don't know what return {first, second} means. Many thanks for your help, Paul *************************************************************************** template <typename T> AndSpecification<T> operator&& (const Specification<T>& first, const Specification<T>& second) { return { first, second }; } ************************************************************************* CODE BEGINS // open closed principle // open for extension, closed for modification #include <string> #include <vector> #include <iostream> using namespace std; enum class Color { red, green, blue }; enum class Size { small, medium, large }; struct Product { string name; Color color; Size size; }; struct ProductFilter { typedef vector<Product*> Items; Items by_color(Items items, const Color color) { Items result; for (auto& i : items) if (i->color == color) result.push_back(i); return result; } Items by_size(Items items, const Size size) { Items result; for (auto& i : items) if (i->size == size) result.push_back(i); return result; } Items by_size_and_color(Items items, const Size size, const Color color) { Items result; for (auto& i : items) if (i->size == size && i->color == color) result.push_back(i); return result; } }; template <typename T> struct AndSpecification; template <typename T> struct Specification { virtual ~Specification() = default; virtual bool is_satisfied(T* item) const = 0; // new: breaks OCP if you add it post-hoc /*AndSpecification<T> operator&&(Specification<T>&& other) { return AndSpecification<T>(*this, other); }*/ }; // new: template <typename T> AndSpecification<T> operator&& (const Specification<T>& first, const Specification<T>& second) { return { first, second }; } template <typename T> struct Filter { virtual vector<T*> filter(vector<T*> items, Specification<T>& spec) = 0; }; struct BetterFilter : Filter<Product> { vector<Product*> filter(vector<Product*> items, Specification<Product> &spec) override { vector<Product*> result; for (auto& p : items) if (spec.is_satisfied(p)) result.push_back(p); return result; } }; struct ColorSpecification : Specification<Product> { Color color; ColorSpecification(Color color) : color(color) {} bool is_satisfied(Product *item) const override { return item->color == color; } }; struct SizeSpecification : Specification<Product> { Size size; explicit SizeSpecification(const Size size) : size{ size } { } bool is_satisfied(Product* item) const override { return item->size == size; } }; template <typename T> struct AndSpecification : Specification<T> { const Specification<T>& first; const Specification<T>& second; AndSpecification(const Specification<T>& first, const Specification<T>& second) : first(first), second(second) {} bool is_satisfied(T *item) const override { return first.is_satisfied(item) && second.is_satisfied(item); } }; int main() { Product apple{"Apple", Color::green, Size::small}; Product tree{"Tree", Color::green, Size::large}; Product house{"House", Color::blue, Size::large}; const vector<Product*> all { &apple, &tree, &house }; BetterFilter bf; ColorSpecification green(Color::green); auto green_things = bf.filter(all, green); for (auto& x : green_things) cout << x->name << " is green\n"; SizeSpecification large(Size::large); AndSpecification<Product> green_and_large(green, large); //auto big_green_things = bf.filter(all, green_and_large); // use the operator instead (same for || etc.) auto spec = green && large; for (auto& x : bf.filter(all, spec)) cout << x->name << " is green and large\n"; // warning: the following will compile but will NOT work // auto spec2 = SizeSpecification{Size::large} && // ColorSpecification{Color::blue}; getchar(); return 0; } |
"Öö Tiib" <ootiib@hot.ee>: Oct 20 04:14AM -0700 On Saturday, 20 October 2018 13:38:40 UTC+3, Paul wrote: > I would have said it's an error because AndSpecification<T> has not > been fully defined so we don't know what return {first, second} means. > Many thanks for your help, With templates it does not matter so much where it was declared or where it was defined. More important is that it is defined at place where it is instantiated. Template of operator&& is instantiated where it is called. If it is called nowhere then even syntax errors in its definition may compile. Standard says ill-formed, no diagnostics required. Similar are references or pointers to some type. These can be declared to incomplete type. Complete type is required where such reference is used or pointer is dereferenced. That makes programming in C++ not linear process but also it gives plenty of ways to break out of circular references between types and functions without sacrificing any type-safety. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 20 06:04PM +0200 On 20.10.2018 13:14, Öö Tiib wrote: > That makes programming in C++ not linear process but also it gives > plenty of ways to break out of circular references between types > and functions without sacrificing any type-safety. Also, both beginners and more experienced C++ programmers may guess wrong about where the first diagnostic, if any, in this code occurs: struct Gah; auto foo() -> Gah; void bar( Gah&& ); auto main() -> int { bar( foo() ); } I think that's nice. In a way. Cheers!, - Alf |
Paul <pepstein5@gmail.com>: Oct 20 08:36AM -0700 The below code compiles but crashes runtime with an unexpected termination. I'm not sure why but it has something to do with the scope of temporaries or something. Please could someone explain exactly what the problem is? Thank you. Paul #include <string> #include <vector> #include <iostream> using namespace std; enum class Color { red, green, blue }; enum class Size { small, medium, large }; struct Product { string name; Color color; Size size; }; template <typename T> struct AndSpecification; template <typename T> struct Specification { virtual ~Specification() = default; virtual bool is_satisfied(T* item) const = 0; }; // new: template <typename T> AndSpecification<T> operator&& (const Specification<T>& first, const Specification<T>& second) { return { first, second }; } template <typename T> struct Filter { virtual vector<T*> filter(vector<T*> items, Specification<T>& spec) = 0; }; struct BetterFilter : Filter<Product> { vector<Product*> filter(vector<Product*> items, Specification<Product> &spec) override { vector<Product*> result; for (auto& p : items) if (spec.is_satisfied(p)) result.push_back(p); return result; } }; struct ColorSpecification : Specification<Product> { Color color; ColorSpecification(Color color) : color(color) {} bool is_satisfied(Product *item) const override { return item->color == color; } }; struct SizeSpecification : Specification<Product> { Size size; explicit SizeSpecification(const Size size) : size{ size } { } bool is_satisfied(Product* item) const override { return item->size == size; } }; template <typename T> struct AndSpecification : Specification<T> { const Specification<T>& first; const Specification<T>& second; AndSpecification(const Specification<T>& first, const Specification<T>& second) : first(first), second(second) {} bool is_satisfied(T *item) const override { return first.is_satisfied(item) && second.is_satisfied(item); } }; int main() { Product apple{"Apple", Color::green, Size::small}; Product tree{"Tree", Color::green, Size::large}; Product house{"House", Color::blue, Size::large}; const vector<Product*> all { &apple, &tree, &house }; BetterFilter bf; // warning: the following will compile but will NOT work auto spec2 = SizeSpecification(Size::large)&& ColorSpecification(Color::green); for (auto& x : bf.filter(all, spec2)){} } |
Christian Gollwitzer <auriocus@gmx.de>: Oct 20 10:21AM +0200 Am 20.10.18 um 00:35 schrieb Alf P. Steinbach: > just overload the function that unpacks, with a variant that accepts the > whole struct as argument. > I would not go as far as to copy bytes. My "practical solution" now is a C-style cast to cast away the "packed"-thing. What bugs me most is that basically this downgrades the applicability of a template. The reason to have the template at all was to copy, e.g. the content of a std::vector<cl_double4> with a length of 4 to a, e.g. std::array<cl_float4, 4>. There are at least three different, but functionally equivalent data types in this program. If I'd write it out by hand, it would work. If I used a macro, it would work. If I could overload the operator= from the "outside" for a std::array, it would work, but it must be a member (why?). If I drop the packed, it compiles but breaks with invalid memory accesses due to different alignments between the two compilers. Only because of a pedantic compiler I'm forced to jump through these hoops. Christian |
"Öö Tiib" <ootiib@hot.ee>: Oct 20 02:37AM -0700 On Saturday, 20 October 2018 11:21:28 UTC+3, Christian Gollwitzer wrote: > to copy, e.g. the content of a std::vector<cl_double4> with a length of > 4 to a, e.g. std::array<cl_float4, 4>. There are at least three > different, but functionally equivalent data types in this program. Templates are very useful for bit-crunching and hacking with packed structs and bitfields in structs. Doing it by hand or with macros is way bigger headache. The templates just have to be made for that. The issue as you posted wasn't related to templates at all. It was attempting to take address of element of packed struct. GCC can't ignore it since it is often used to compile for platforms that immediately raise signals on wrongly aligned references. It is not always safe with visual studio either. For example the XMVECTOR and XMMATRIX of DirectX11 must be 16-byte aligned and will result with confusing run-time SEH exceptions (crashes) when not aligned. There it would be lot better if visual studio was also pedantic about alignment. ;) |
woodbrian77@gmail.com: Oct 19 09:55PM -0700 On Wednesday, February 15, 2017 at 11:53:10 AM UTC-6, Scott Lurndal wrote: > >> deallocation, but not without it's own cost. In a single > >> threaded program perhaps unique_ptr would be better. > Why do you consider 588 bytes significant? Originally in this thread I called std::variant a "win". But after watching this: CppCon 2018: Rishi Wani "Datum: A Compact Bitwise Copyable Variant Type" https://duckduckgo.com/?q=datum+cppcon&t=ffab&ia=videos&iax=videos&iai=YdzbrFerlRY my opinion of std::variant is not so high. Bloomberg has done something interesting here with Datum. I'm glad I haven't spent much time working on serialization support for std::variant as datum is superior to it in a number of ways. Brian Ebenezer Enterprises - Enjoying programming again. https://github.com/Ebenezer-group/onwards |
"Öö Tiib" <ootiib@hot.ee>: Oct 19 11:54PM -0700 > done something interesting here with Datum. I'm glad I > haven't spent much time working on serialization support for > std::variant as datum is superior to it in a number of ways. That lecture concentrates on how to pack stuff into the 52 often unused bits of signaling NaN of double on 32-bit platform. Sure, we can store lot of stuff into 52 bits on 32-bit platform if we so want. I did not know that Bloomberg was so heavily into embedded development. |
"Öö Tiib" <ootiib@hot.ee>: Oct 20 01:23AM -0700 > But after watching this: > CppCon 2018: Rishi Wani "Datum: A Compact Bitwise Copyable Variant Type" > https://duckduckgo.com/?q=datum+cppcon&t=ffab&ia=videos&iax=videos&iai=YdzbrFerlRY Note that benchmark at 26:17 is clearly defective. Worst case mentioned tag+string is 40 bytes on 64 bit platform so 100 000 variants should take 4 000 000 bytes. His chart however shows memory usage of 28 000 000 bytes. What was the object with size of 280 bytes? How it was packed into 6.5 bytes of signaling NaNs? I trust that I would stick with StarOffice or MSOffice. No way I would consider BloombergOffice since the guys there can't calculate. |
bitrex <user@example.net>: Oct 20 01:24AM -0400 On 10/19/2018 02:42 AM, David Brown wrote: > No. Just.. no. > (Alf has already posted the ideas I had.) Ah, right, memmove! I've definitely got string.h, it's been a long time since I've had to resort to dusty ol' C... |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment