soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

neoGFX C++ ECS - 4 Updates
Concatenating strings efficiently - 10 Updates
Copy on write strings - 5 Updates
std::rotate substitute - 6 Updates

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Oct 19 06:42PM +0100

Hi,

My C++ ECS is now rendering scenes! Now to add collision detection and
multi-threaded support so that different ECS systems can run in different
threads.

https://github.com/i42output/neoGFX

\o/

/Flibble

--
"You won't burn in hell. But be nice anyway." – Ricky Gervais

"I see Atheists are fighting and killing each other again, over who
doesn't believe in any God the most. Oh, no..wait.. that never happens." –
Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 19 11:04AM -0700

On Friday, October 19, 2018 at 1:42:13 PM UTC-4, Mr Flibble wrote:
> My C++ ECS is now rendering scenes! Now to add collision detection and
> multi-threaded support so that different ECS systems can run in different
> threads.

FWIW, I'm not a fan of the dark/brown color theme you've chosen.

Also, the Unreal Gaming engine is available for free for non-
commericial apps, and for a fee-based model for commercial
apps:

https://en.wikipedia.org/wiki/Unreal_Engine

https://www.unrealengine.com

Why would someone use your engine when they can develop in a pro-
fessional gaming engine for free, and when they start selling
their product move to a fee-based model?

What benefits will MrAndersonGFX have when it's released in 9+
months?

--
Rick C. Hodgin

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 19 11:11AM -0700

On Friday, October 19, 2018 at 2:04:34 PM UTC-4, Rick C. Hodgin wrote:
> > multi-threaded support so that different ECS systems can run in different
> > threads.

> FWIW, I'm not a fan of the dark/brown color theme you've chosen.

I do think your buttons look awesome though. They are among the
nicest I've seen. I may implement that bright border effect on
the themes I use, with then a shadow around it because I prefer
light back colors, and darker fore colors.

I also like the blue highlights around things. Seems to be a
common UI theme in recent years.

--
Rick C. Hodgin

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Oct 19 07:14PM +0100

On 19/10/2018 19:04, Rick C. Hodgin wrote:
>> multi-threaded support so that different ECS systems can run in different
>> threads.

> FWIW, I'm not a fan of the dark/brown color theme you've chosen.

People can choose whatever colour they want (this should be obvious) and
slate grey is the default.

> Why would someone use your engine when they can develop in a pro-
> fessional gaming engine for free, and when they start selling
> their product move to a fee-based model?

I will decide on an appropriate licensing model at the appropriate time.

> What benefits will MrAndersonGFX have when it's released in 9+
> months?

Benefits are already listed on the website and it isn't just a game engine
it will, upon release, also be a direct competitor to Qt for traditional
GUI application development.

/Flibble

--
"You won't burn in hell. But be nice anyway." – Ricky Gervais

"I see Atheists are fighting and killing each other again, over who
doesn't believe in any God the most. Oh, no..wait.. that never happens." –
Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."

Concatenating strings efficiently

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 02:39AM +0200

On 17.10.2018 11:43, Paul wrote:
> compression using the counts of repeated characters. For example, the
> string aabcccccaaa would become a2blc5a3. You can assume the string
> has only uppercase and lowercase letters (a - z).

For most strings encountered by a program this will be string
/expansion/, not compression.

I'm not sure about what interviewers of fresh-from-college new employees
know or understand these days, but as an interviewer I'd expect the
prospective employee to note that there is a definite potential for
improvement in the scheme.

> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.

I've never heard of a "standard way" to do that, and the assumption that
`+=` is inefficient, is wrong.

> The conventional wisdom is therefore that you need to do something more
> sophisticated like creating a stringBuilder class.
> But aren't basic string operations like "+=" optimised for efficiency in C++?

They are.

And from C++11 there are overloads that pick up the temporary nature of
partial results in e.g. `s = string() + "a" + "b" + "c" + "d" + "e";`.
The complexity is linear in the final result size. Maybe the very
incorrect "wisdom" you heard was for C++03, or Java.

C++17 § 24.3.3.1 paragraphs 1 and 2:
<paraphrase>
template<class charT, class traits, class Allocator>
auto operator+(
basic_string<charT, traits, Allocator>&& lhs,
const basic_string<charT, traits, Allocator>& rhs
)
-> basic_string<charT, traits, Allocator>;

Returns: std::move(lhs.append(rhs))
</paraphrase>

Also do ignore the advice you've got about using `ostringstream`. That
introduces inefficiencies in a number of ways, including that it
needlessly involves the C++ locale machinery.

> Is there anything wrong with my solution below?

Well, let's look at it.

> #include <iostream>

No no, don't include that in a header, if you can make do with `<iosfwd>`.

I guess it's only for the `main` function.

Then I'd put that `#include` just before the `main` function.

> std::string result;
> if(str.empty())
> return result;

My eyes find `str == ""` to be a more clear condition.

Less words to digest.

> char prev = str[0];
> int counter = 1;

At this point you know that the result size will be roughly 2*n where n
is the size of `str`. Maybe a little more. So I'd do a

const int n = str.size();
result.reserve( 3*n );

> for(int i = 1; i < str.size(); ++i)

Comparing signed and unsigned will make compilers cough up a warning.

If you didn't already have a definition of `n` then a good way to both
avoid the warning and guarantee a single call to `size` would be

for( int i = 1, n = str.size(); i < n; ++i )

But let's assume `n` is defined, as shown. Also, since you can /assume/
that the string contains only letters, and not arbitrary binary data,
I'd make that

for( int i = 1; i <= n; ++i )

... which takes care of processing also the last character within the
loop body.

And which gives an opportunity to explain to the interviewer how that is
well-defined behavior.

> prev = str[i];
> counter = 1;
> }

Uhm, indentation. Tip: free programs such AStyle can format your code
for you. Also most programmers' editors have that functionality, or can
use e.g. that the AStyle program as a plug-in.

> result += prev;
> result += std::to_string(counter);

See above; it can be avoided.

> {
> std::cout << encode("a") << "\n" << encode ("aaa") << "\n" << encode ("aabbc");

> }

Cheers & hth.,

- Alf

Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 19 05:33AM

On Thu, 2018-10-18, bitrex wrote:
...
> abstract and not add a lot of my own verbage as to how the test might
> relate to some real world task it cuts down on chance for
> mis-interpretation.

YMMV, but I think the interview is more useful (to both parts) if
it's about more than coding to specifications, no matter how silly.
(Unless the job is like that, which no job should be.)

> ask for that, specifically. i.e. in the OP's post the subject says
> "efficient" but the text of the post says nothing about efficiency
> requirement.

If I was the interviewer I'd want the interviewee to ask for
efficiency requirements, or go for (as you say) quick-and-correct, but
explain that that's what she does. Anyway, I wouldn't like to see her
getting bogged down in micro-optimizations noone asked for.

> Maybe this seems a little nitpicky and it may be but damn people lose
> out on good jobs they want over stuff like this!

Don't know about where you are, but here it's just as much about the
interviewee finding out if the job is any good. Although it's unclear
if test like this one helps anyone.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ralf Goertz <me@myprovider.invalid>: Oct 19 09:26AM +0200

> > by factor of 2 is what that "packing" usually
> > achieves.

> In fact that is the worst case length of the encoded string.

I certainly see why that is the case. If there are no runs (immediate
repetitions) one would need to write a 1 after each letter. OTOH each
run gets either encoded to a sequence of the same (length 2) or shorter
(length >2) size. But what about the '\0' at the end of the string?
Since I learned elsethread that str[str.size()] is valid, doesn't it
need to be stored as well? So if str is "ab" in reality three chars are
stored, right? If we reserve four chars for result it is not enough for
the five chars needed for "a1b1". I guess this is taken care of by
always reserving one char more than asked for.

Manfred <noname@add.invalid>: Oct 19 12:57PM +0200

On 10/19/2018 9:26 AM, Ralf Goertz wrote:
> stored, right? If we reserve four chars for result it is not enough for
> the five chars needed for "a1b1". I guess this is taken care of by
> always reserving one char more than asked for.

In the context of std::string "ab" is 2 chars long, and calling
result.reserve(2*str.size()) will guarantee that result will accommodate
"a1b1" (i.e. 4 chars) with no reallocation (see also capacity()).
In both cases the terminating '\0' is handled transparently by std::string.

Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 19 11:56AM

On Fri, 2018-10-19, Manfred wrote:
> result.reserve(2*str.size()) will guarantee that result will accommodate
> "a1b1" (i.e. 4 chars) with no reallocation (see also capacity()).
> In both cases the terminating '\0' is handled transparently by std::string.

A better way of stating it is: there *is* no zero-termination with
std::string, except in the C strings compatibility part of the
interface.

And you can have a std::string which /contains/ '\0'. It's just
another character.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ralf Goertz <me@myprovider.invalid>: Oct 19 01:58PM +0200

Am Fri, 19 Oct 2018 12:57:22 +0200
> accommodate "a1b1" (i.e. 4 chars) with no reallocation (see also
> capacity()). In both cases the terminating '\0' is handled
> transparently by std::string.

Hm, transparently means that when I access str[str.size()] it will
internally be handled differently than str[str.size()-1] (assuming that
str is not empty)? I always thought operator [] does unchecked access so
that it doesn't care whether it crosses boundaries. How can that work at
runtime when the str is indexed is a variable whose value happens to be
str.size()?

jameskuyper@alumni.caltech.edu: Oct 19 06:35AM -0700

On Friday, October 19, 2018 at 7:56:50 AM UTC-4, Jorgen Grahn wrote:
...
> A better way of stating it is: there *is* no zero-termination with
> std::string, except in the C strings compatibility part of the
> interface.

Do you consider std::string::operator[] and std::string::at() to be in
the C strings compatibility part of the interface? 20.3.2.5p2 requires
that, given std::string<charT> str, then str[str.size()] return "a
reference to an object of type charT with value charT()". Paragraph 6
requires the same of str.at(str.size()).

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 06:36PM +0200

On 19.10.2018 02:39, Alf P. Steinbach wrote:
> is the size of `str`. Maybe a little more. So I'd do a

> const int n = str.size();
> result.reserve( 3*n );

Uhm, that was dumb. The result size will never be >2n. So,

result.reserve( 2*n );

Sorry, not sure where that blindness came from. :(

Cheers!,

- Alf

Manfred <noname@add.invalid>: Oct 19 06:50PM +0200

On 10/19/2018 1:58 PM, Ralf Goertz wrote:

> Hm, transparently means that when I access str[str.size()] it will
> internally be handled differently than str[str.size()-1] (assuming that
> str is not empty)?

No, transparently means that reserve(X) would internally reserve (X+1),
as an example, in the same way as capacity() would return
(__internal_bufsize-1).

I always thought operator [] does unchecked access so
> that it doesn't care whether it crosses boundaries. How can that work at
> runtime when the str is indexed is a variable whose value happens to be
> str.size()?

Pre C++11 std::string's /could/ work this way (having something like: if
pos == size() return __nulchar; in the implementation of
operator[](size_t pos)).
Nonetheless the additional requirements of C++11 and after do complicate
things.
In any case, this is about implementation details that are out of the
scope of the dictate of the standard.

Manfred <noname@add.invalid>: Oct 19 06:57PM +0200

On 10/19/2018 1:56 PM, Jorgen Grahn wrote:

> A better way of stating it is: there *is* no zero-termination with
> std::string, except in the C strings compatibility part of the
> interface.

I was pretty much tempted to write so, but then I realized that the
example given by cppreference.com to access the string buffer:

const char* c = &str[0];

(https://en.cppreference.com/w/cpp/string/basic_string/operator_at)
does make it /very/ difficult to have a std::string buffer without a
terminating '\0'.
That said, also because of your note below, I agree that std::string's
are best manipulated without assuming a terminating '\0';
This just yields to better code, since the rationale of std::string was
meant to be alternative to C strings.

Copy on write strings

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 02:47AM +0200

On 16.10.2018 01:44, Richard wrote:
> [snip]
> Alf, are you aware of any CoW string implementation that is
> maintained?

No, sorry.

I guess Mr. Google can help, or the weekly posted list of C++ libraries.

Off the cuff, aspects to look for in a COW string type that doesn't aim
to conform to the `std::string` specification, but instead to take
maximum advantage:

* Does it construct from literal in constant time?
If not then it's clearly ungood.

* Does it support repeated concatenation in linear time, as in CPython?
If not then it's clearly ungood.

* Does it support guaranteed cloning for thread safety?
If not then it's clearly ungood.

* Does it avoid imposing silly and inefficient thread safety measures?
If not then it's clearly ungood.

* Does it provide substrings in constant time?
Nice if it does, but also we now have `std::string_view`.

Cheers!,

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 02:59AM +0200

On 16.10.2018 15:23, Öö Tiib wrote:
>> Hardcore fans of this or that are usually immune to facts and logic, and
>> I've encountered that a great many times so I should have known, but.

> I do not understand what is "bug" there?

It was possible to invalidate a pointer into a string's buffer, by
operations that one would not expect to do that. I'm a bit hazy on the
details, except that it necessarily involves making a (logical) copy.

> string is part of core language.

> However the non-sanctimonious arguments that I have heard are
> also plenty:

Hm, I had to google that word, "sanctimonious". Learned something. :-)

> lot of algorithms (and can make those more efficient than with
> zero terminated strings of C) but with CoW it is behind additional
> level of indirection.

These are all wrong. COW and short buffer optimization are orthogonal
things. Copying of COW strings need not be more thread safe than copying
of non-COW strings; copying of `std::string` is not thread safe. COW
string efficiency has little or nothing to do with `const` correctness.
Nothing prevents `size()` available directly in a COW string: the string
size can't be changed by other COW strings that refer to the buffer.

> So the major shops voted for making one-pointer CoW string illegal
> and now there are all (usually ... 32 byte) non-CoW strings.

Well, the offered arguments do not cut it. And I think my hypothesis has
at least some chance of being at least partially right. But the battle
for possible COW implementations of `std::string` is lost so it doesn't
matter much except for how to relate to discussions about it...

> performance optimization. But in 95% of code the std::string is
> efficient enough and non-CoW is bit more robust and scalable
> than CoW for that code.

For the `std::string` specification yes, I agree the C++11 spec makes
for more robust client code than the COW-supporting C++03 spec.

So something was gained, and something was lost. :)

Cheers!,

- Alf

"Öö Tiib" <ootiib@hot.ee>: Oct 19 04:43AM -0700

On Friday, 19 October 2018 03:59:23 UTC+3, Alf P. Steinbach wrote:

> It was possible to invalidate a pointer into a string's buffer, by
> operations that one would not expect to do that. I'm a bit hazy on the
> details, except that it necessarily involves making a (logical) copy.

Can't find cites of it. Was it about thread safety?

> > However the non-sanctimonious arguments that I have heard are
> > also plenty:

> Hm, I had to google that word, "sanctimonious". Learned something. :-)

That indicates I have read needlessly lot of philosophical and political
stuff. :D

> string efficiency has little or nothing to do with `const` correctness.
> Nothing prevents `size()` available directly in a COW string: the string
> size can't be changed by other COW strings that refer to the buffer.

The arguments were particularly about C++ std::string in different
implementations not CoW in general. C++03 did not formally have
threads but in practice we had more and more multi-core systems to
work with. There 2) and 3) are very convincing. 2) CoW string has to have
unexposed reference count, that can be shared between threads without
any external ways to indicate if it is or not. 3) CoW obviously provides
benefits only when refcount > 1 often and efficient usage avoids altering
that when not needed. Calls of non-const methods have to check that
refcount == 1 and to make actual copy when it is not. Interface of string
has lot of non-const overloads: at, [], front, back, begin, end, rbegin,
rend and since C++17 also data(). Avoiding unneeded calls of those
means const correctness. The 1) and 4) were perhaps critique of
some concrete CoW string implementations.

> at least some chance of being at least partially right. But the battle
> for possible COW implementations of `std::string` is lost so it doesn't
> matter much except for how to relate to discussions about it...

Specialists of major shops can also buy political, philosophical and
scientific arguments so it may be. The decisions have to be also
supported with explanations why profiling shows about the actual
products what it shows and what it will cost to improve it. thing about Usenet is that here we are free of that.

> For the `std::string` specification yes, I agree the C++11 spec makes
> for more robust client code than the COW-supporting C++03 spec.

> So something was gained, and something was lost. :)

Yes and surgery that makes things simpler isn't typically worse
than the disease that it cures. :D

Juha Nieminen <nospam@thanks.invalid>: Oct 19 02:24PM

> Copying of COW strings need not be more thread safe than copying
> of non-COW strings; copying of `std::string` is not thread safe.

I think one problem with an "unofficial" (in the sense that the standard
doesn't mandate it, nor define how exactly it should work) copy-on-write
scheme in std::string is that, by the virtue of being "unofficial",
there's no "standard" way of checking if the current string is being
shared by another instance, and to make a deep copy if that's so.

In some situations you want to make a deep copy of a data structure
(eg. to have a copy that's local to a thread, and not shared with
any other thread), but you may want to do this deep-copying only
if it's necessary, ie. if the data is being currently shared.
Always doing a deep copy "just in case", even if it's not needed,
could be inefficient.

And, of course, there's also the issue that if you have something
like this:

void foo(std::string s)
{
std::size_t length = s.length();
...
}

you normally wouldn't expect having to enclose those reads in
mutexes because 's' is being taken by value and should be a
local copy. When you suddenly can't trust that a parameter
taken by value cannot be handled without mutexes, it raises all
kinds of problems.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 06:22PM +0200

On 19.10.2018 16:24, Juha Nieminen wrote:
> if it's necessary, ie. if the data is being currently shared.
> Always doing a deep copy "just in case", even if it's not needed,
> could be inefficient.

Agreed. The apparently simple solution of just making a copy and
pretend-modifying it, would introduce inefficiency for non-COW strings.
So that's not portable.

> local copy. When you suddenly can't trust that a parameter
> taken by value cannot be handled without mutexes, it raises all
> kinds of problems.

But here it appears that you confuse responsibilities. It's not foo's
responsibility to make a deep copy of s in order to support execution of
foo in a separate thread. It's the caller's responsibility.

In C++03 there were no standard threads, so it was not addressed by the
language.

In C++11 threads were introduced and the COW support of the design
removed, and so in C++11 this issue not addressed by the language
either. One way it could have been addressed is, as you note at the
start, by providing a deep copy operation. Then the caller of foo would
have to use that for using foo as a thread function.

Cheers!,

- Alf

std::rotate substitute

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 19 03:15AM +0200

On 19.10.2018 01:23, bitrex wrote:
> have C++11 compiler but really nothing else available to work with. I'd
> like to be able to rotate the elements of a short array of characters of
> arbitrary length e.g. ABCD -> BCDA -> CDAB etc.

If the microprocessor is 32-bit or better, just use its rotate instruction.

> Something like the following should be good enough, but what's good to
> use for "swap"? "algorithm" is a non-starter here, too.

Don't move data by repeated swapping. For moving, move.

> swap(s[i-1], s[i]);
> return s;
> }

You want a mutating routine for efficiency, not a function.

Like,

void rotate_buffer( char* buf, int size )
{
// Use memmov for the moving.
}

void rotate( string& s )
{
rotate_buffer( s.data(), s.size() );
}

Cheers & hth.,

- Alf

bitrex <user@example.net>: Oct 18 07:23PM -0400

I need an implementation of rotate for a microprocessor platform where I
have C++11 compiler but really nothing else available to work with. I'd
like to be able to rotate the elements of a short array of characters of
arbitrary length e.g. ABCD -> BCDA -> CDAB etc.

Something like the following should be good enough, but what's good to
use for "swap"? "algorithm" is a non-starter here, too.

#include <algorithm>

string rotate(string s) {
for (int i = 1; i < s.size(); i++)
swap(s[i-1], s[i]);
return s;
}

bitrex <user@example.net>: Oct 18 07:27PM -0400

On 10/18/2018 07:23 PM, bitrex wrote:
> swap(s[i-1], s[i]);
> return s;
> }

Perhaps the old "hack":

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

?

David Brown <david.brown@hesbynett.no>: Oct 19 08:42AM +0200

On 19/10/18 01:27, bitrex wrote:
>> }

> Perhaps the old "hack":

> #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

No. Just.. no.

(Alf has already posted the ideas I had.)

Juha Nieminen <nospam@thanks.invalid>: Oct 19 02:27PM

> Perhaps the old "hack":

> #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

Which at least in modern architectures is slower than the "naive"
swap.

Juha Nieminen <nospam@thanks.invalid>: Oct 19 02:30PM

> swap(s[i-1], s[i]);
> return s;
> }

Why swapping? You are performing tons of unnecessary assignments.
In fact, you are performing twice as many as would be needed.
(For a string of length n, you only need n+1 assignments,
not (n-1)*3 assignments as you are doing there.)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, October 19, 2018

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me