Wednesday, September 29, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

red floyd <no.spam.here@its.invalid>: Sep 28 05:05PM -0700

On 9/28/2021 2:42 PM, Chris M. Thomasson wrote:
 
> working with it.
 
> I wonder if she knows what a #LoadStore | #LoadLoad barrier is. That
> MEMBAR instruction was fun on the SPARC. ;^)
 
Not quite a membar, but on PowerPC, the EIEIO instruction was also fun.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 28 07:28PM -0700

On 9/28/2021 5:05 PM, red floyd wrote:
 
>> I wonder if she knows what a #LoadStore | #LoadLoad barrier is. That
>> MEMBAR instruction was fun on the SPARC. ;^)
 
> Not quite a membar, but on PowerPC, the EIEIO instruction was also fun.
 
Yup. Iirc, it was for IO. Its been a while since I programmed a PPC. I
wrote about some of the pitfalls of using LL/SC on it a while back on
comp.arch:
 
https://groups.google.com/g/comp.arch/c/yREvvvKvr6k/m/nRZ5tpLwDNQJ
 
Wow, this was way back in 2005! Jeeze! ;^o
Branimir Maksimovic <branimir.maksimovic@gmail.com>: Sep 29 02:31AM


> Waiting on something is acquire by nature. Show me where its not?
> Actually, show me where to place the acquire and release barriers in a
> semaphore? Can you do it? Should be a piece of cake, right?
 
semaphore wait then acquire
 
--
 
7-77-777
Evil Sinner!
Branimir Maksimovic <branimir.maksimovic@gmail.com>: Sep 29 02:32AM

> want to discuss it with people to check the details. Since Bonita
> responds with nothing but insults and arrogance, I think it is safe to
> assume her code is flawed and leave her with it.
She is clever, but newb. Forgive her for that.
 
--
 
7-77-777
Evil Sinner!
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 28 08:14PM -0700

On 9/28/2021 7:31 PM, Branimir Maksimovic wrote:
>> Actually, show me where to place the acquire and release barriers in a
>> semaphore? Can you do it? Should be a piece of cake, right?
 
> semaphore wait then acquire
 
Yup! You got it. I like how C++ has standalone membars via
std::atomic_thread_fence. Makes me reminisce about the SPARC where all
atomic ops are naked, or relaxed in C++ terms. An acquire on the SPARC
was MEMBAR #LoadStore | #LoadLoad, ah the good ol' days. ;^)
David Brown <david.brown@hesbynett.no>: Sep 29 08:35AM +0200

On 29/09/2021 02:05, red floyd wrote:
 
>> working with it.
 
>> I wonder if she knows what a #LoadStore | #LoadLoad barrier is. That
>> MEMBAR instruction was fun on the SPARC. ;^)
 
The most challenging architecture, AFAIK, was the Alpha. (I have no
experience of it myself.)
 
 
> Not quite a membar, but on PowerPC, the EIEIO instruction was also fun.
 
It is certainly the best named instruction around!
 
It was useful on early PPC microcontroller cores, to avoid unwanted
reordering or buffering of accesses to hardware registers. Later cores
had an MPU that supported setting up memory areas for direct unbuffered
accesses, which is a more convenient and safer method.
Bonita Montero <Bonita.Montero@gmail.com>: Sep 29 12:41PM +0200

Am 28.09.2021 um 23:42 schrieb Chris M. Thomasson:
 
> Yeah. Damn. She does not seem to know a whole lot about memory barriers,
 
I use them a lot and correctly.
Bonita Montero <Bonita.Montero@gmail.com>: Sep 29 12:42PM +0200

Am 29.09.2021 um 08:35 schrieb David Brown:
 
> The most challenging architecture, AFAIK, was the Alpha.
 
The Alpha isn't challenging because it has the most relaxed memory
-ordering of all CPUs: If there would be a C++11-compiler for the
alpha you would use conventional membars and the code virtually
executes like on every other CPU.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 29 12:04PM -0700

On 9/29/2021 3:42 AM, Bonita Montero wrote:
> -ordering of all CPUs: If there would be a C++11-compiler for the
> alpha you would use conventional membars and the code virtually
> executes like on every other CPU.
 
Oh my. You cannot even implement the read side of RCU without a damn
membar on an Alpha! You don't think that is challenging at all? I don't
think you ever implemented RCU. Also, have you implemented SMR? That
actually requires an explicit membar on an x86! Joe Seigh came up with a
way to combine RCU and SMR to get rid of the membar by putting SMR in a
read side RCU critical section. It improved the performance of loading a
SMR pointer by orders of magnitude. Well done Joe.
 
:^)
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 29 12:05PM -0700

On 9/29/2021 3:41 AM, Bonita Montero wrote:
> Am 28.09.2021 um 23:42 schrieb Chris M. Thomasson:
 
>> Yeah. Damn. She does not seem to know a whole lot about memory barriers,
 
> I use them a lot and correctly.
 
A wait on a semaphore does not need release semantics. It only needs
acquire.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 29 12:05PM -0700

On 9/28/2021 7:32 PM, Branimir Maksimovic wrote:
>> responds with nothing but insults and arrogance, I think it is safe to
>> assume her code is flawed and leave her with it.
> She is clever, but newb. Forgive her for that.
 
She is clever.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 29 12:09PM -0700

On 9/20/2021 1:43 PM, Bonita Montero wrote:
> So this is the complete function but i put two empty lines around the
> code I mentioned.
 
> void dual_monitor::wait( bool b )
[...]
 
Have you posted the whole dual_monitor code? If you did, I missed it.
Sorry.
Paavo Helde <myfirstname@osa.pri.ee>: Sep 29 12:18PM +0300

29.09.2021 00:34 Ian Collins kirjutas:
> real problem here.  Presumably this would be the same for C printing
> functions as well.  I can see why std::to_chars would have an advantage
> here.
 
Right. Actually it is not so important where the output is collected,
it's the formatting step what is the bottleneck. It is even possible to
use the standard stream for collecting the output, for those who repel
the idea of a string buffer. The trick is to ignore the stream part and
only use the streambuf part, see the test program below.
 
> I haven't seen such an high overhead on Unix/Linux, I wonder of the
> Windows way of doing things is more burdensome?  An example wit timings
> would be useful!
 
Here you are. These timings are for 50 million ints. You can try the
demo program out by yourself with your favorite compiler and hardware.
On my Windows the "traditional" streaming is ca 10 times slower than
alternatives, on my Linux the difference is smaller, just ca 2 times.
 
MSVC++ 2019 on Windows, x64 Release build:
 
Traditional streaming: 16479 ms
Streaming with std::to_chars() directly into streambuf: 1612 ms
Collecting the content in a string buffer of 418 MB: 1126 ms
Writing the string buffer of 418 MB into a disk file: 796 ms
 
 
g++ 8.3 on Linux:
$ g++ -Wall -O3 -std=c++17 test6.cpp
$ ./a.out
Traditional streaming: 2080 ms
Streaming with std::to_chars() directly into streambuf: 903 ms
Collecting the content in a string buffer of 418 MB: 655 ms
Writing the string buffer of 418 MB into a disk file: 146 ms
 
Source code:
#include <iostream>
#include <string>
#include <vector>
#include <numeric>
#include <charconv>
#include <chrono>
#include <cstdint>
#include <fstream>
 
class A {
public:
A();
 
// traditional operator<<
friend std::ostream& operator<<(std::ostream& os, const A& a);
 
// tostring() operator
std::string to_string() const;
 
// Select whether operator<< uses stream or streambuf interface.
bool useStreamBufOnly = false;
 
private:
std::vector<int> data;
};
 
A::A() {
// Initialize data to 50 million ints.
data.resize(50000000);
std::iota(data.begin(), data.end(), 0);
}
 
std::ostream& operator<<(std::ostream& os, const A& a) {
if (a.useStreamBufOnly) {
// Ignore stream, use streambuf only.
auto streamBuf = os.rdbuf();
const size_t k = 64;
char buff[k];
for (auto& x: a.data) {
auto q = std::to_chars(buff, buff+k, x).ptr;
*q++ = ' ';
streamBuf->sputn(buff, q-buff);
}
} else {
// Traditional stream output
for (auto& x: a.data) {
os << x << ' ';
}
}
return os;
}
 
std::string A::to_string() const {
const size_t k = 64;
char buffer[k];
std::string result;
for (auto x: data) {
auto q = std::to_chars(buffer, buffer+k, x).ptr;
*q++ = ' ';
result.append(buffer, q-buffer);
}
return result;
}
 
using sclock = std::chrono::steady_clock;
 
std::int64_t ms(sclock::duration lapse) {
return
std::chrono::duration_cast<std::chrono::milliseconds>(lapse).count();
}
 
int main() {
 
A a;
 
std::ofstream sink1("sink1.txt"), sink2("sink2.txt"),
sink3("sink3.txt");
 
// traditional streaming
a.useStreamBufOnly = false;
sclock::time_point start1 = sclock::now();
sink1 << a;
sink1.close();
sclock::time_point finish1 = sclock::now();
std::cout << "Traditional streaming: " << ms(finish1-start1) << "
ms\n";
 
// streaming with to_chars() and streambuf
a.useStreamBufOnly = true;
sclock::time_point start2 = sclock::now();
sink2 << a;
sink2.close();
sclock::time_point finish2 = sclock::now();
std::cout << "Streaming with std::to_chars() directly into
streambuf: " << ms(finish2-start2) << " ms\n";
 
// to_string()
sclock::time_point start3 = sclock::now();
std::string s3 = a.to_string();
sclock::time_point finish3 = sclock::now();
std::cout << "Collecting the content in a string buffer of " <<
s3.length()/(1024*1024) << " MB: " << ms(finish3-start3) << " ms\n";
 
sclock::time_point start4 = sclock::now();
sink3.rdbuf()->sputn(s3.data(), s3.length());
sink3.close();
sclock::time_point finish4 = sclock::now();
std::cout << "Writing the string buffer of " <<
s3.length()/(1024*1024) << " MB into a disk file: " <<
ms(finish4-start4) << " ms\n";
 
}
Ian Collins <ian-news@hotmail.com>: Sep 29 10:46PM +1300

On 29/09/2021 22:18, Paavo Helde wrote:
> Streaming with std::to_chars() directly into streambuf: 903 ms
> Collecting the content in a string buffer of 418 MB: 655 ms
> Writing the string buffer of 418 MB into a disk file: 146 ms
 
Interesting, thanks for posting. It's a similar ratio on my machine. I
can see that the Windows way of doing things is definitely more
burdensome.
 
It also gives me another argument for upgrading our embedded target
compiler to one which supports C++17!
 
<snip>
 
--
Ian.
Paavo Helde <myfirstname@osa.pri.ee>: Sep 29 01:07PM +0300

29.09.2021 12:46 Ian Collins kirjutas:
 
> It also gives me another argument for upgrading our embedded target
> compiler to one which supports C++17!
 
Beware that some g++ versions do not support std::to_chars() with
floating-point, even when otherwise supporting C++17.
Juha Nieminen <nospam@thanks.invalid>: Sep 29 12:10PM


> Because, for most objects that are not already strings, assembling into
> a local temporary string, then dumping the whole string at once, might
> be faster than calling some external character-at-a-time routine.
 
I did not ask whether outputting one concatenated string is faster than
outputting two strings one character at a time.
 
I asked if concatenating the two strings and then outputting the result
is faster than just outputting the two strings.
Juha Nieminen <nospam@thanks.invalid>: Sep 29 12:16PM

>> Suppose that the 'a' object above consists of 2 large strings, and you
>> want to write them to 'os' concatenated.
 
> This is another task.
 
No, it isn't.
 
I just said "custom type" which wants to output its contents. I did not
specify what the content exactly is. You seem to be assuming that I was
talking about, for example, a data container containing numbers.
 
> then of course these can be written directly to a file. But for that you
> don't need a C++ std::ostream interface, you can write directly into the
> file descriptor, or C++ streambuf().
 
No you can't, if you don't have a reference to the ostream object (or FILE
pointer).
 
Sure, you could just call a member function of the class giving it
that object, but then you cannot use that in generic code. You would
merely be trying to work your way around not overloading operator<<,
which does exactly that thing, in a way that allows it to be used in
generic code.
 
So my question is, once again: If overloading operator<< for this
is such a horrible thing, what exactly is your alternative?
Bart <bc@freeuk.com>: Sep 29 02:34PM +0100

On 29/09/2021 13:10, Juha Nieminen wrote:
> outputting two strings one character at a time.
 
> I asked if concatenating the two strings and then outputting the result
> is faster than just outputting the two strings.
 
I was talking about the net effect when outputting lots of different
objects, since the gains can offset the losses.
 
But, OK, the answer to your specific question: I guess it depends. On
the overheads of calling the o/p routine, and the efficiency of string
concatenation.
Bart <bc@freeuk.com>: Sep 29 03:54PM +0100

On 29/09/2021 14:34, Bart wrote:
 
> But, OK, the answer to your specific question: I guess it depends. On
> the overheads of calling the o/p routine, and the efficiency of string
> concatenation.
 
 
Here's a random observation using script code. A and B are both
100-character strings:
 
to 1 million do
println A,,B
od
 
The above prints 1M lines of A and B (the ",," means no gap), so outputs
2M strings of 100 chars each.
 
The following combines A+B into one string each time, and writes 1M
strings of 200 chars each:
 
to 1 million do
println A + B
od
 
The first took around 4.5 seconds, the second about 4.3 seconds. (Run on
Windows and directing output to a file.)
 
If I instead printed 10,000 strings of 10,000 chars each, the results
were much closer (approx 3.5 second for both).
 
I didn't observe a slow-down due to having to 'pointlessly' create a
temporary string object and then tear it down again.
 
So in answer to this:
 
>> I asked if concatenating the two strings and then outputting the result
>> is faster than just outputting the two strings.
 
Yes, it can be. At least, you shouldn't just dismiss the possibility.
Paavo Helde <myfirstname@osa.pri.ee>: Sep 29 08:15PM +0300

29.09.2021 15:16 Juha Nieminen kirjutas:
>>> want to write them to 'os' concatenated.
 
>> This is another task.
 
> No, it isn't.
 
Yes it is. Outputting two strings is another task than formatting *and*
outputting millions of numbers.
 
 
> I just said "custom type" which wants to output its contents. I did not
> specify what the content exactly is. You seem to be assuming that I was
> talking about, for example, a data container containing numbers.
 
Yes, that's the scenario which was in my mind as this is the problematic
one.
 
>> file descriptor, or C++ streambuf().
 
> No you can't, if you don't have a reference to the ostream object (or FILE
> pointer).
 
In that case you can concatenate them and return as a string to the
point where they need to be used (not necessarily a file).
 
> generic code.
 
> So my question is, once again: If overloading operator<< for this
> is such a horrible thing, what exactly is your alternative?
 
I have nothing against operator<<, it just tends to be slow if
implemented as taught in the books. I believe it was you who stressed
the importance of efficiency in C++.
 
Here are two examples of operator<< which are potentially faster:
 
std::ostream& operator<<(std::ostream& os, const A& a) {
std::string buff = a.to_string();
os.rdbuf()->sputn(buff.data(), buff.length());
}
 
std::ostream& operator<<(std::ostream& os, const A& a) {
auto streamBuf = os.rdbuf();
const size_t k = 64;
char buff[k];
for (auto x: a.data) {
auto q = std::to_chars(buff, buff+k, x).ptr;
*q++ = ' ';
streamBuf->sputn(buff, q-buff);
}
}
Branimir Maksimovic <branimir.maksimovic@gmail.com>: Sep 29 02:26AM

> another type. Other operations would be unsafe because the compiler
> cannot know what kind of object is really pointed to. Consequently,
> other operations result in compile-time errors."
 
Problem is that you can convert to unknown type, but compiler doesn't
just converto to enyother type. This is simply pointless as compiler
cannot possibly know what is unknown type, cast or not :P
Let's take *any* implicit conversion then, like new languages DO :P
 
 
--
 
7-77-777
Evil Sinner!
Branimir Maksimovic <branimir.maksimovic@gmail.com>: Sep 29 02:29AM

> has that hypothetical bug. It can require some trivial extra work
> for an implementation that shares headers between C and C++, but
> implementers are well aware of the issue.
 
We have problem off overoading resolution then with constant zero.
Still, despite nullptr. You can accidentally mean 0 the integer
and have overload with pointer argument, and hav COMPILER BUG.
This should be corrected in future standard...
 
--
 
7-77-777
Evil Sinner!
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Sep 28 08:34PM -0700

> Still, despite nullptr. You can accidentally mean 0 the integer
> and have overload with pointer argument, and hav COMPILER BUG.
> This should be corrected in future standard...
 
I don't know what "COMPILER BUG" you're talking about.
 
Using 0 as a null pointer constant in the presence of overloaded
functions can lead to a *programming* bug. I'm not aware of any C++
compiler that has a bug in this area (i.e., handles it in a way that's
inconsistent with what the language requires).
 
That kind of programming bug can be avoided by using nullptr rather than
0 (something that wasn't possible before C++11).
 
If you're suggesting changing the language so that 0 is no longer a null
pointer constant, that would break tons of existing code.
 
Programming bugs, compiler bugs, and language bugs are three very
different things.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */
David Brown <david.brown@hesbynett.no>: Sep 29 08:52AM +0200

On 29/09/2021 05:34, Keith Thompson wrote:
> 0 (something that wasn't possible before C++11).
 
> If you're suggesting changing the language so that 0 is no longer a null
> pointer constant, that would break tons of existing code.
 
Yes. But if you want that effect, some compilers will give you it with
the right flags (-Werror=zero-as-null-pointer-constant in gcc).
 
The downside of backwards compatibility in C and C++ is that it is hard
to remove features, even if they have been shown to be dangerous or
replaced by significantly better alternatives. Look how long it has
taken C to get rid of non-prototype function declarations - 33 years,
assuming C23 comes out on plan.
 
It would be nice if there were some way to standardise options like this
in a cross-compiler fashion (perhaps with pragmas rather than compiler
flags, so that they are included in the source code). The C++ and C
committees [[attributes]] to standardise common extensions (gcc/clang
__attribute__, MSVC declspec). So I live in hope!
 
 
Branimir Maksimovic <branimir.maksimovic@gmail.com>: Sep 29 02:30AM


> Off-topic, but could you please stop top-posting? It's annoying.
 
> Just edit the original post, leaving the relevant part you are
> responding to, and write your response below. Like I'm doing here.
I haven't mentioned that. Sure I can post normally, no problem,
i just wanted to MAKE A POINT :P
 
--
 
7-77-777
Evil Sinner!
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: