Monday, September 27, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:25AM

> streaming: (A) sending 100 million ints to std::ostream separately, and
> (B) formatting them first into a huge std::string, then sending that to
> std::ostream in one go. My results are here:
 
You are constructing a string with the contents of the data in both cases.
This is not what I'm talking about. It's quite obvious (and I have never
had any illusion otherwise) that std::ostringstream is extraordinarily
inefficient. Obviously using other methods for constructing a string
are going to be a million times faster. (I myself pretty much never
use std::ostringstream if I can avoid it.)
 
But I am not talking about constructing a string from the content of
the data. In fact, I'm talking about the exact opposite: *Avoiding*
constructing a string into memory with the data.
 
How do you add support to the standard output functions for custom
types without requiring those custom types to create strings from
their data, and being able to directly output the data?
 
Many people responding to this challenge are arguing against it
by comparing the speed of std::ostream to the speed of some other
ways of outputting data. This is not relevant to my question.
Just substitute std::ostream with something more efficient.
The question remains: How do you add native support for custom
types to that output method, without requiring the custom types
to create dynamically allocated strings in memory, and instead
being able to directly use that output method to print their
contents (in whichever way they choose)?
Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:34AM


> But if there is really a need to convert of value of some abstract type
> T into a sequenct of characters, then how else can be it done other than
> providing support functions which does that conversion?
 
By providing a way for the custom type to directly output its contents
to the output (in whichever way it chooses), rather than forcing it
to create a dynamically allocated string in memory (which then gets
immediately destroyed afterwards).
 
In C++, when you overload operator<< for std::ostream, the custom type
does not need to construct any strings with its contents. It can output
directly to that std::ostream object. (The speed of std::ostream itself
is not the relevant thing here.)
 
I am not asking to replicate the way in which C++ solved that problem.
I am asking what's your own suggestion for a better alternative (that
does not involve forcing types to create dynamically allocated strings).
 
> Probably 99% of all print items would have an intermediate string less
> than 100 characters along, so that a simple fixed buffer would suffice.
 
Would that be a fixed buffer per object, or per type?
 
If it's a fixed buffer per object, then if you have a million objects
that would be 100 million bytes in buffers in total.
 
If it's per type, then that's not very thread-safe. Nor is it completely
safe even in single-threaded mode, if the references to the returned
strings can outlive their retrieval (so that you can have several
references to the strings returned by several objects... which would
all in fact refer to the same fixed buffer, which contents would only
be that of the last object called, making the other references invalid,
with no warning.)
Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:36AM

> os << a.to_string();
> return os;
> }
 
While you are at it, why not just output the contents of that A object
directly, rather than making it construct a string?
 
At this point that to_string() method is completely superfluous.
Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 11:19AM +0300

27.09.2021 08:36 Juha Nieminen kirjutas:
>> }
 
> While you are at it, why not just output the contents of that A object
> directly, rather than making it construct a string?
 
 
Because of speed. I just showed elsethread that serializing a large
object into an in-memory string can be up to 10x faster than writing it
into a std::ostream piece-by-piece.
 
Also, because of better modularity and easier usage. A string is
basically just a raw memory buffer which is easy to transport and use.
Streams are more complicated.
 
Say, I want to write my large data structure into a file in AWS cloud.
AmazonStreamingWebServiceRequest::SetBody() takes a pointer to an input
stream and reads data from it later when I call S3Object::PutObject().
 
Say, for my large data structure I have proper streaming support which
writes the data into an std::ostream. So now what? How do I connect this
output stream to an input stream used by the AWS library so that they
would "flow together"? Sure it can be done, but seems not so easy.
Threads or coroutines come to mind.
 
The easiest way is to dump the data into a temporary file, then let the
AWS library to read it. We do not need a file on disk, so this ought to
be an in-memory file. And guess what is the fastest way to create an
in-memory file? Answer: serializing the data into a raw memory buffer
such as std::string. IOW the dreaded to_string() method.
Ian Collins <ian-news@hotmail.com>: Sep 27 09:27PM +1300

On 27/09/2021 21:19, Paavo Helde wrote:
> be an in-memory file. And guess what is the fastest way to create an
> in-memory file? Answer: serializing the data into a raw memory buffer
> such as std::string. IOW the dreaded to_string() method.
 
You can string it into an in memory straeam buffer.
 
I can't see how adding to a string can be any faster and you have to
convert each field to a string representation which is what streams do
for you.
 
--
Ian.
Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 11:37AM +0300

27.09.2021 08:25 Juha Nieminen kirjutas:
>> streaming: (A) sending 100 million ints to std::ostream separately, and
>> (B) formatting them first into a huge std::string, then sending that to
>> std::ostream in one go. My results are here:
 
I used std::ostringstream only to exclude disk access from timings.
 
> You are constructing a string with the contents of the data in both cases.
 
No. In one case I construct a string with data indeed, but with
to_string() approach I construct this string *twice*! And it's still
faster than the first method!
 
> inefficient. Obviously using other methods for constructing a string
> are going to be a million times faster. (I myself pretty much never
> use std::ostringstream if I can avoid it.)
 
It's the general std::ostream interface which is slow. One can easily
switch to ofstream in my example if this feels better, this won't change
the timings much. Here are the results for std::ofstream:
 
MSVC++ 2019 x64 Release build:
 
Traditional streaming: 32367 ms
to_string() streaming: 3979 ms
to_string() is 8.13446 times faster than traditional streaming.
 
g++ 8.3 on Linux:
$ g++ -Wall -O2 test5.cpp -std=c++17
$ ./a.out
Traditional streaming: 3451 ms
to_string() streaming: 1771 ms
to_string() is 1.94862 times faster than traditional streaming.
 
Source code below.
 
 
 
> But I am not talking about constructing a string from the content of
> the data. In fact, I'm talking about the exact opposite: *Avoiding*
> constructing a string into memory with the data.
 
Why? In my practice this is a major usage scenario.
 
 
> How do you add support to the standard output functions for custom
> types without requiring those custom types to create strings from
> their data, and being able to directly output the data?
 
Why? What's wrong with creating strings?
 
 
> Just substitute std::ostream with something more efficient.
 
I just did. It's called to_string().
 
> to create dynamically allocated strings in memory, and instead
> being able to directly use that output method to print their
> contents (in whichever way they choose)?
 
Why? A lot of data transfer mechanisms use internal memory buffers,
which often are dynamically allocated.
 
 
Test source code without std::ostringstream:
 
#include <iostream>
#include <string>
#include <vector>
#include <numeric>
#include <charconv>
#include <chrono>
#include <cstdint>
#include <fstream>
 
class A {
public:
A();
 
// traditional operator<<
friend std::ostream& operator<<(std::ostream& os, const A& a);
 
// tostring() operator
std::string to_string() const;
 
private:
std::vector<int> data;
};
 
A::A() {
// Initialize data to something
data.resize(100000000);
std::iota(data.begin(), data.end(), 0);
}
 
std::ostream& operator<<(std::ostream& os, const A& a) {
for (auto& x: a.data) {
os << x << ' ';
}
return os;
}
 
std::string A::to_string() const {
const size_t k = 64;
char buffer[k];
std::string result;
for (auto x: data) {
auto q = std::to_chars(buffer, buffer+k, x).ptr;
*q++ = ' ';
result.append(buffer, q-buffer);
}
return result;
}
 
using sclock = std::chrono::steady_clock;
 
std::int64_t ms(sclock::duration lapse) {
return
std::chrono::duration_cast<std::chrono::milliseconds>(lapse).count();
}
 
int main() {
 
std::ofstream sink1("sink1.txt"), sink2("sink2.txt");
A a;
 
// traditional streaming
sclock::time_point start1 = sclock::now();
sink1 << a;
sclock::time_point finish1 = sclock::now();
std::cout << "Traditional streaming: " << ms(finish1-start1) << "
ms\n";
 
// tostring()
sclock::time_point start2 = sclock::now();
sink2 << a.to_string();
sclock::time_point finish2 = sclock::now();
std::cout << "to_string() streaming: " << ms(finish2-start2) << "
ms\n";
 
double ratio = double(ms(finish1-start1))/ms(finish2-start2);
std::cout << "to_string() is " << ratio << " times " <<
(ratio>1.0 ? "faster" : "slower") << " than traditional
streaming.\n";
 
}
Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 03:49PM +0300

27.09.2021 11:27 Ian Collins kirjutas:
>> in-memory file? Answer: serializing the data into a raw memory buffer
>> such as std::string. IOW the dreaded to_string() method.
 
> You can string it into an in memory straeam buffer.
 
This is the slow part.
 
> I can't see how adding to a string can be any faster
 
See my demo programs and timings elsethread.
 
and you have to
> convert each field to a string representation which is what streams do
> for you.
 
Yes, and that's the slow part. When adding to a string I can choose what
conversion function to use. There is a reason why std::to_chars() was
added to C++.
Christian Gollwitzer <auriocus@gmx.de>: Sep 27 11:51PM +0200

Am 25.09.21 um 14:37 schrieb Paavo Helde:
> 3 they now require parens:
 
> Python2:  print 1, 2, 3
 
> Python3: print(1, 2, 3)
 
It wasn't "too simple", whatever that means, but in Python2, "print" was
a keyword and specially treated in the interpreter, whereas the core
developers thought that it should not be set apart from regular
functions, because there is nothing that "print" can do that any other
old function could not do. Variable number of arguments of varying type
is possible for any Python function.
 
Christian
Bart <bc@freeuk.com>: Sep 27 11:41PM +0100

On 27/09/2021 22:51, Christian Gollwitzer wrote:
> developers thought that it should not be set apart from regular
> functions, because there is nothing that "print" can do that any other
> old function could not do.
 
Other than provide a more ergonomic syntax.
 
Some languages have features that allow 'if' and 'for' statements to be
implemented as functions. But just because you can, should you?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 12:18AM -0700

On 9/20/2021 1:43 PM, Bonita Montero wrote:
> WAITER_B_VALUE) + VISITOR_VALUE;
>     } while( !m_flagAndCounters.compare_exchange_weak( cmp, chg,
> memory_order_release, memory_order_relaxed ) );
[...]
 
Humm... For some reason I feel the need for std::memory_order_acq_rel
here wrt the cas. Humm... I need to port your algorihtm over to a form
that Relacy can understand. Its been a while! I just got a strange
feeling. Waiting is usually, acquire semantics. Humm... Sorry, need to
examine it further, and port it over. Then run it in certain scenarios.
Relacy has the capability to crack it wide open if there are any issues.
 
I should have some time later on tomorrow. I am busy with my fractal
software right now:
 
https://fractalforums.org/gallery/1612-270921004032.jpeg
 
http://siggrapharts.ning.com/photo/alien-anatomy
 
lol. ;^)
Bonita Montero <Bonita.Montero@gmail.com>: Sep 27 02:44PM +0200

Am 27.09.2021 um 09:18 schrieb Chris M. Thomasson:
> [...]
 
> Humm... For some reason I feel the need for std::memory_order_acq_rel
> here wrt the cas. ...
 
No, you only write nonsense. When I wait the lock is released,
so it's release-consistency.
 
You always write nonsense.
red floyd <no.spam.here@its.invalid>: Sep 27 10:54AM -0700

On 9/27/2021 5:44 AM, Bonita Montero wrote:
[redacted]
 
> No, you only write nonsense. When I wait the lock is released,
> so it's release-consistency.
 
> You always write nonsense.
 
If you're going to ask for opinions and then reject any criticism
as nonsense, then why the heck are you even bothering to post your
code?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 02:06PM -0700

On 9/27/2021 5:44 AM, Bonita Montero wrote:
>>> So this is the complete function but i put two empty lines around the
>>> code I mentioned.
 
>>> void dual_monitor::wait( bool b )
[...]
 
> No, you only write nonsense. When I wait the lock is released,
> so it's release-consistency.
 
> You always write nonsense.
 
Decrementing a semaphore requires acquire semantics. Incrementing a
semaphore requires release semantics. Trying to do both at once in a
single atomic operation requires acquire/release semantics.
 
I still need to port it to Relacy, but it seems like you need acq_rel here.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 02:10PM -0700

On 9/27/2021 10:54 AM, red floyd wrote:
 
> If you're going to ask for opinions and then reject any criticism
> as nonsense, then why the heck are you even bothering to post your
> code?
 
Yeah, no shi%. Wow. Fwiw, its been a while since I have worked on such
things. I just need to port Bonita's code over to Relacy, and give it a
go in the simulator. It can find obscure memory order issues pretty damn
fast. The problem is that I need to find the time. I mean, Bonita is not
paying me. ;^)
Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:47AM

> In C++ (unlike C), NULL is a macro defined to 0, so there is no
> difference.
 
Actually, many standard library implementations define NULL to be
nullptr (if we are in C++11 or newer).
 
(I haven't checked the standard, but this tells me that the standard does
not mandate NULL to be defined as 0.)
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 26 11:05PM -0700


> Nothing inherently wrong with that, but in C, it would be more traditional to use NULL. I believe the value assigned is going to be the same, whether 0 or NULL is used.
 
> Is one style preferred over the other in C++? Why?
 
> Thanks.
 
void* foo = 0;
void* foobar = NULL;
void* foobarCpp = nullptr;
 
Means foo == foobar == foobarCpp.
 
So, they should all be the same. Well, an impl can define these things
to mean a "null" pointer on their system, so to speak. Magic! nullptr
might mean something odd, and exotic under the hood... So does 0 wrt
pointers... ;^)
 
_____________________________
#include <iostream>
 
int main()
{
void* foo = 0;
void* foobar = NULL;
void* foobarCpp = nullptr;
 
std::cout << "foo = " << foo << "\n";
std::cout << "foobar = " << foobar << "\n";
std::cout << "foobarCpp = " << foobarCpp << "\n";
 
return 0;
}
_____________________________
 
Well, shit... Whats your output?
Can you even compile the damn thing?
 
;^)
Bo Persson <bo@bo-persson.se>: Sep 27 08:32AM +0200

On 2021-09-27 at 07:47, Juha Nieminen wrote:
> nullptr (if we are in C++11 or newer).
 
> (I haven't checked the standard, but this tells me that the standard does
> not mandate NULL to be defined as 0.)
 
It just says
 
"The macro NULL is an implementation-defined null pointer constant."
 
And then a footnote saying that 0 is one possibility, but (void*)0 is not.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 26 11:37PM -0700

On 9/26/2021 11:32 PM, Bo Persson wrote:
 
> It just says
 
> "The macro NULL is an implementation-defined null pointer constant."
 
> And then a footnote saying that 0 is one possibility, but (void*)0 is not.
 
OT comment: For some damn odd reason; thinking of this subject makes me
think of the following song:
 
https://youtu.be/y3hf0T4qpYg
 
Strange!
 
http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=e320776c84d666caf19b80ac7925f3d9e30ab3d99e0ab58d634535629abb3 d2f4b8a981dc0fbd9024aca3d2a2b29de38323340cf7e700b8599ddfac7d6d5972d0a2e8b8e9d751ecf0ea7a25e9394a86496ab208cb5b846f01bdff721feb48f8ece892344689b3d8db8bb39c3b21dfe4aad2f65608c0ef1ca3737a23b63c09ba2b0dad9ccd9a81cbf3a53a480bc0a55f9be590f6e021c787972bddce2f249e45137f75884f82bc74fa8115f0339b4c1515b55dfefd1f8322f16de06c50b5e3b7381f4d044ad9cdfad661d9c677e63a5c440ef9ac49c3a78c5397fe4ee2039d79cc7d790fe11036f99b6a3e9b8a6c738a84deccdf24d1277cbc081ae42398979a04346e34e6f3a135cdf6a3cf78b771a7bf052564c27e6767ad769141be938f1c35dff31c353311989339523a3dad8a8530e2301303329aa050ce085a6135338f3bdcef27485f2843df96ce01cee17b17ef5db63b621392c7dc08487add5c382d40199a67b6978f83650e3c586d67207731ed42b954b433ef6ff8f84b06456b9394eb610b116cfefe266a185
 
 
decrypts to the following plaintext using the default key:
_____________________________
#include <iostream>
 
int main()
{
void* foo = 0;
void* foobar = NULL;
void* foobarCpp = nullptr;
 
std::cout << "foo = " << foo << "\n";
std::cout << "foobar = " << foobar << "\n";
std::cout << "foobarCpp = " << foobarCpp << "\n";
 
return 0;
}
_____________________________
 
;^)
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Sep 27 10:32AM +0200

On 27 Sep 2021 08:32, Bo Persson wrote:
 
> It just says
 
> "The macro NULL is an implementation-defined null pointer constant."
 
> And then a footnote saying that 0 is one possibility, but (void*)0 is not.
 
C++17 §7.11/1:
❝A /null pointer constant/ is an integer literal with value zero or a
prvalue of type `std::nullptr_t`.❞
 
As I recall the insistence on a null pointer constant being a literal
was introduced in C++11; the C++03 definition was
 
C++03 §4.10/1:
❝A /null pointer constant/ is an integral constant expression rvalue of
integer type that evaluates to zero.❞
 
A subtle but perhaps important change.
 
 
- Alf
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Sep 27 11:32AM -0700

> ❝A /null pointer constant/ is an integral constant expression rvalue
> of integer type that evaluates to zero.❞
 
> A subtle but perhaps important change.
 
Perhaps subtle, but I don't think it's all that important. It means
that (2-2) is a null pointer constant in C++03 but not in C++17 -- but I
can't think of any good reason to use (2-2) as a null pointer constant
outside of deliberately contrived code.
 
Of course the addition of `std::nullptr_t` is important.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */
James Kuyper <jameskuyper@alumni.caltech.edu>: Sep 27 03:03PM -0400

On 9/27/21 1:47 AM, Juha Nieminen wrote:
> nullptr (if we are in C++11 or newer).
 
> (I haven't checked the standard, but this tells me that the standard does
> not mandate NULL to be defined as 0.)
 
The mandate is quite clear:
 
"The macro NULL is an implementation-defined null pointer constant."
(17.2p3).
 
Note that "null pointer constant" has a different definition in C++ than
in C, and when compiling using C++, NULL must have a definition that
meets C++ requirements rather than C requirements. In particular, that
means that NULL can expand to "a prvalue of type std::nullptr_t." (7.3.11p1)
Bonita Montero <Bonita.Montero@gmail.com>: Sep 27 05:35PM +0200

Am 26.09.2021 um 07:26 schrieb Bonita Montero:
 
> So I can adjust the spinning-loop according
> to pause_singleton::getNsPerPause().
 
I dropped it ! I simply made a spinning-loop according to the TSC
if the CPU has a TSC and it is invariant (these are also invariant
across sockets !). Reading the TSC can be done at roughly every 10
nanoseconds my PC (TR3990X, Zen3, Win10, SMT off). It's not accu-
rate since it might overlap with instruction before or afterwards,
but accuracy isn't relevant when you spin hundreds of clock-cycles.
And I changed a single pause per spin loop instead of a row of
PAUSEs which sum up to 30ns (which is roughly the most common
value on newer Intel -CPUs). This more eager spinnging may gain
locking earlier, although it may generate more interconnect-traffic.
 
But as I'm using RDTSC: I'm asking myself how fast RDTSC is on
different CPUs. So I modified my test-program to measure different
routines to test a loop of 10 RDTSCs per loop. Here it is:
 
#include <iostream>
#include <chrono>
#include <limits>
#include <functional>
#if defined(_MSC_VER)
#include <intrin.h>

No comments: