- rational numbers - 9 Updates
- Tricky ... - 5 Updates
- NULL versus 0 - 7 Updates
- Some help needed - further help - 2 Updates
- "C++20 Coroutines" by Martin Bond - 2 Updates
| Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:25AM > streaming: (A) sending 100 million ints to std::ostream separately, and > (B) formatting them first into a huge std::string, then sending that to > std::ostream in one go. My results are here: You are constructing a string with the contents of the data in both cases. This is not what I'm talking about. It's quite obvious (and I have never had any illusion otherwise) that std::ostringstream is extraordinarily inefficient. Obviously using other methods for constructing a string are going to be a million times faster. (I myself pretty much never use std::ostringstream if I can avoid it.) But I am not talking about constructing a string from the content of the data. In fact, I'm talking about the exact opposite: *Avoiding* constructing a string into memory with the data. How do you add support to the standard output functions for custom types without requiring those custom types to create strings from their data, and being able to directly output the data? Many people responding to this challenge are arguing against it by comparing the speed of std::ostream to the speed of some other ways of outputting data. This is not relevant to my question. Just substitute std::ostream with something more efficient. The question remains: How do you add native support for custom types to that output method, without requiring the custom types to create dynamically allocated strings in memory, and instead being able to directly use that output method to print their contents (in whichever way they choose)? |
| Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:34AM > But if there is really a need to convert of value of some abstract type > T into a sequenct of characters, then how else can be it done other than > providing support functions which does that conversion? By providing a way for the custom type to directly output its contents to the output (in whichever way it chooses), rather than forcing it to create a dynamically allocated string in memory (which then gets immediately destroyed afterwards). In C++, when you overload operator<< for std::ostream, the custom type does not need to construct any strings with its contents. It can output directly to that std::ostream object. (The speed of std::ostream itself is not the relevant thing here.) I am not asking to replicate the way in which C++ solved that problem. I am asking what's your own suggestion for a better alternative (that does not involve forcing types to create dynamically allocated strings). > Probably 99% of all print items would have an intermediate string less > than 100 characters along, so that a simple fixed buffer would suffice. Would that be a fixed buffer per object, or per type? If it's a fixed buffer per object, then if you have a million objects that would be 100 million bytes in buffers in total. If it's per type, then that's not very thread-safe. Nor is it completely safe even in single-threaded mode, if the references to the returned strings can outlive their retrieval (so that you can have several references to the strings returned by several objects... which would all in fact refer to the same fixed buffer, which contents would only be that of the last object called, making the other references invalid, with no warning.) |
| Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:36AM > os << a.to_string(); > return os; > } While you are at it, why not just output the contents of that A object directly, rather than making it construct a string? At this point that to_string() method is completely superfluous. |
| Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 11:19AM +0300 27.09.2021 08:36 Juha Nieminen kirjutas: >> } > While you are at it, why not just output the contents of that A object > directly, rather than making it construct a string? Because of speed. I just showed elsethread that serializing a large object into an in-memory string can be up to 10x faster than writing it into a std::ostream piece-by-piece. Also, because of better modularity and easier usage. A string is basically just a raw memory buffer which is easy to transport and use. Streams are more complicated. Say, I want to write my large data structure into a file in AWS cloud. AmazonStreamingWebServiceRequest::SetBody() takes a pointer to an input stream and reads data from it later when I call S3Object::PutObject(). Say, for my large data structure I have proper streaming support which writes the data into an std::ostream. So now what? How do I connect this output stream to an input stream used by the AWS library so that they would "flow together"? Sure it can be done, but seems not so easy. Threads or coroutines come to mind. The easiest way is to dump the data into a temporary file, then let the AWS library to read it. We do not need a file on disk, so this ought to be an in-memory file. And guess what is the fastest way to create an in-memory file? Answer: serializing the data into a raw memory buffer such as std::string. IOW the dreaded to_string() method. |
| Ian Collins <ian-news@hotmail.com>: Sep 27 09:27PM +1300 On 27/09/2021 21:19, Paavo Helde wrote: > be an in-memory file. And guess what is the fastest way to create an > in-memory file? Answer: serializing the data into a raw memory buffer > such as std::string. IOW the dreaded to_string() method. You can string it into an in memory straeam buffer. I can't see how adding to a string can be any faster and you have to convert each field to a string representation which is what streams do for you. -- Ian. |
| Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 11:37AM +0300 27.09.2021 08:25 Juha Nieminen kirjutas: >> streaming: (A) sending 100 million ints to std::ostream separately, and >> (B) formatting them first into a huge std::string, then sending that to >> std::ostream in one go. My results are here: I used std::ostringstream only to exclude disk access from timings. > You are constructing a string with the contents of the data in both cases. No. In one case I construct a string with data indeed, but with to_string() approach I construct this string *twice*! And it's still faster than the first method! > inefficient. Obviously using other methods for constructing a string > are going to be a million times faster. (I myself pretty much never > use std::ostringstream if I can avoid it.) It's the general std::ostream interface which is slow. One can easily switch to ofstream in my example if this feels better, this won't change the timings much. Here are the results for std::ofstream: MSVC++ 2019 x64 Release build: Traditional streaming: 32367 ms to_string() streaming: 3979 ms to_string() is 8.13446 times faster than traditional streaming. g++ 8.3 on Linux: $ g++ -Wall -O2 test5.cpp -std=c++17 $ ./a.out Traditional streaming: 3451 ms to_string() streaming: 1771 ms to_string() is 1.94862 times faster than traditional streaming. Source code below. > But I am not talking about constructing a string from the content of > the data. In fact, I'm talking about the exact opposite: *Avoiding* > constructing a string into memory with the data. Why? In my practice this is a major usage scenario. > How do you add support to the standard output functions for custom > types without requiring those custom types to create strings from > their data, and being able to directly output the data? Why? What's wrong with creating strings? > Just substitute std::ostream with something more efficient. I just did. It's called to_string(). > to create dynamically allocated strings in memory, and instead > being able to directly use that output method to print their > contents (in whichever way they choose)? Why? A lot of data transfer mechanisms use internal memory buffers, which often are dynamically allocated. Test source code without std::ostringstream: #include <iostream> #include <string> #include <vector> #include <numeric> #include <charconv> #include <chrono> #include <cstdint> #include <fstream> class A { public: A(); // traditional operator<< friend std::ostream& operator<<(std::ostream& os, const A& a); // tostring() operator std::string to_string() const; private: std::vector<int> data; }; A::A() { // Initialize data to something data.resize(100000000); std::iota(data.begin(), data.end(), 0); } std::ostream& operator<<(std::ostream& os, const A& a) { for (auto& x: a.data) { os << x << ' '; } return os; } std::string A::to_string() const { const size_t k = 64; char buffer[k]; std::string result; for (auto x: data) { auto q = std::to_chars(buffer, buffer+k, x).ptr; *q++ = ' '; result.append(buffer, q-buffer); } return result; } using sclock = std::chrono::steady_clock; std::int64_t ms(sclock::duration lapse) { return std::chrono::duration_cast<std::chrono::milliseconds>(lapse).count(); } int main() { std::ofstream sink1("sink1.txt"), sink2("sink2.txt"); A a; // traditional streaming sclock::time_point start1 = sclock::now(); sink1 << a; sclock::time_point finish1 = sclock::now(); std::cout << "Traditional streaming: " << ms(finish1-start1) << " ms\n"; // tostring() sclock::time_point start2 = sclock::now(); sink2 << a.to_string(); sclock::time_point finish2 = sclock::now(); std::cout << "to_string() streaming: " << ms(finish2-start2) << " ms\n"; double ratio = double(ms(finish1-start1))/ms(finish2-start2); std::cout << "to_string() is " << ratio << " times " << (ratio>1.0 ? "faster" : "slower") << " than traditional streaming.\n"; } |
| Paavo Helde <myfirstname@osa.pri.ee>: Sep 27 03:49PM +0300 27.09.2021 11:27 Ian Collins kirjutas: >> in-memory file? Answer: serializing the data into a raw memory buffer >> such as std::string. IOW the dreaded to_string() method. > You can string it into an in memory straeam buffer. This is the slow part. > I can't see how adding to a string can be any faster See my demo programs and timings elsethread. and you have to > convert each field to a string representation which is what streams do > for you. Yes, and that's the slow part. When adding to a string I can choose what conversion function to use. There is a reason why std::to_chars() was added to C++. |
| Christian Gollwitzer <auriocus@gmx.de>: Sep 27 11:51PM +0200 Am 25.09.21 um 14:37 schrieb Paavo Helde: > 3 they now require parens: > Python2: print 1, 2, 3 > Python3: print(1, 2, 3) It wasn't "too simple", whatever that means, but in Python2, "print" was a keyword and specially treated in the interpreter, whereas the core developers thought that it should not be set apart from regular functions, because there is nothing that "print" can do that any other old function could not do. Variable number of arguments of varying type is possible for any Python function. Christian |
| Bart <bc@freeuk.com>: Sep 27 11:41PM +0100 On 27/09/2021 22:51, Christian Gollwitzer wrote: > developers thought that it should not be set apart from regular > functions, because there is nothing that "print" can do that any other > old function could not do. Other than provide a more ergonomic syntax. Some languages have features that allow 'if' and 'for' statements to be implemented as functions. But just because you can, should you? |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 12:18AM -0700 On 9/20/2021 1:43 PM, Bonita Montero wrote: > WAITER_B_VALUE) + VISITOR_VALUE; > } while( !m_flagAndCounters.compare_exchange_weak( cmp, chg, > memory_order_release, memory_order_relaxed ) ); [...] Humm... For some reason I feel the need for std::memory_order_acq_rel here wrt the cas. Humm... I need to port your algorihtm over to a form that Relacy can understand. Its been a while! I just got a strange feeling. Waiting is usually, acquire semantics. Humm... Sorry, need to examine it further, and port it over. Then run it in certain scenarios. Relacy has the capability to crack it wide open if there are any issues. I should have some time later on tomorrow. I am busy with my fractal software right now: https://fractalforums.org/gallery/1612-270921004032.jpeg http://siggrapharts.ning.com/photo/alien-anatomy lol. ;^) |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 27 02:44PM +0200 Am 27.09.2021 um 09:18 schrieb Chris M. Thomasson: > [...] > Humm... For some reason I feel the need for std::memory_order_acq_rel > here wrt the cas. ... No, you only write nonsense. When I wait the lock is released, so it's release-consistency. You always write nonsense. |
| red floyd <no.spam.here@its.invalid>: Sep 27 10:54AM -0700 On 9/27/2021 5:44 AM, Bonita Montero wrote: [redacted] > No, you only write nonsense. When I wait the lock is released, > so it's release-consistency. > You always write nonsense. If you're going to ask for opinions and then reject any criticism as nonsense, then why the heck are you even bothering to post your code? |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 02:06PM -0700 On 9/27/2021 5:44 AM, Bonita Montero wrote: >>> So this is the complete function but i put two empty lines around the >>> code I mentioned. >>> void dual_monitor::wait( bool b ) [...] > No, you only write nonsense. When I wait the lock is released, > so it's release-consistency. > You always write nonsense. Decrementing a semaphore requires acquire semantics. Incrementing a semaphore requires release semantics. Trying to do both at once in a single atomic operation requires acquire/release semantics. I still need to port it to Relacy, but it seems like you need acq_rel here. |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 27 02:10PM -0700 On 9/27/2021 10:54 AM, red floyd wrote: > If you're going to ask for opinions and then reject any criticism > as nonsense, then why the heck are you even bothering to post your > code? Yeah, no shi%. Wow. Fwiw, its been a while since I have worked on such things. I just need to port Bonita's code over to Relacy, and give it a go in the simulator. It can find obscure memory order issues pretty damn fast. The problem is that I need to find the time. I mean, Bonita is not paying me. ;^) |
| Juha Nieminen <nospam@thanks.invalid>: Sep 27 05:47AM > In C++ (unlike C), NULL is a macro defined to 0, so there is no > difference. Actually, many standard library implementations define NULL to be nullptr (if we are in C++11 or newer). (I haven't checked the standard, but this tells me that the standard does not mandate NULL to be defined as 0.) |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 26 11:05PM -0700 > Nothing inherently wrong with that, but in C, it would be more traditional to use NULL. I believe the value assigned is going to be the same, whether 0 or NULL is used. > Is one style preferred over the other in C++? Why? > Thanks. void* foo = 0; void* foobar = NULL; void* foobarCpp = nullptr; Means foo == foobar == foobarCpp. So, they should all be the same. Well, an impl can define these things to mean a "null" pointer on their system, so to speak. Magic! nullptr might mean something odd, and exotic under the hood... So does 0 wrt pointers... ;^) _____________________________ #include <iostream> int main() { void* foo = 0; void* foobar = NULL; void* foobarCpp = nullptr; std::cout << "foo = " << foo << "\n"; std::cout << "foobar = " << foobar << "\n"; std::cout << "foobarCpp = " << foobarCpp << "\n"; return 0; } _____________________________ Well, shit... Whats your output? Can you even compile the damn thing? ;^) |
| Bo Persson <bo@bo-persson.se>: Sep 27 08:32AM +0200 On 2021-09-27 at 07:47, Juha Nieminen wrote: > nullptr (if we are in C++11 or newer). > (I haven't checked the standard, but this tells me that the standard does > not mandate NULL to be defined as 0.) It just says "The macro NULL is an implementation-defined null pointer constant." And then a footnote saying that 0 is one possibility, but (void*)0 is not. |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Sep 26 11:37PM -0700 On 9/26/2021 11:32 PM, Bo Persson wrote: > It just says > "The macro NULL is an implementation-defined null pointer constant." > And then a footnote saying that 0 is one possibility, but (void*)0 is not. OT comment: For some damn odd reason; thinking of this subject makes me think of the following song: https://youtu.be/y3hf0T4qpYg Strange! http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=e320776c84d666caf19b80ac7925f3d9e30ab3d99e0ab58d634535629abb3 d2f4b8a981dc0fbd9024aca3d2a2b29de38323340cf7e700b8599ddfac7d6d5972d0a2e8b8e9d751ecf0ea7a25e9394a86496ab208cb5b846f01bdff721feb48f8ece892344689b3d8db8bb39c3b21dfe4aad2f65608c0ef1ca3737a23b63c09ba2b0dad9ccd9a81cbf3a53a480bc0a55f9be590f6e021c787972bddce2f249e45137f75884f82bc74fa8115f0339b4c1515b55dfefd1f8322f16de06c50b5e3b7381f4d044ad9cdfad661d9c677e63a5c440ef9ac49c3a78c5397fe4ee2039d79cc7d790fe11036f99b6a3e9b8a6c738a84deccdf24d1277cbc081ae42398979a04346e34e6f3a135cdf6a3cf78b771a7bf052564c27e6767ad769141be938f1c35dff31c353311989339523a3dad8a8530e2301303329aa050ce085a6135338f3bdcef27485f2843df96ce01cee17b17ef5db63b621392c7dc08487add5c382d40199a67b6978f83650e3c586d67207731ed42b954b433ef6ff8f84b06456b9394eb610b116cfefe266a185 decrypts to the following plaintext using the default key: _____________________________ #include <iostream> int main() { void* foo = 0; void* foobar = NULL; void* foobarCpp = nullptr; std::cout << "foo = " << foo << "\n"; std::cout << "foobar = " << foobar << "\n"; std::cout << "foobarCpp = " << foobarCpp << "\n"; return 0; } _____________________________ ;^) |
| "Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Sep 27 10:32AM +0200 On 27 Sep 2021 08:32, Bo Persson wrote: > It just says > "The macro NULL is an implementation-defined null pointer constant." > And then a footnote saying that 0 is one possibility, but (void*)0 is not. C++17 §7.11/1: ❝A /null pointer constant/ is an integer literal with value zero or a prvalue of type `std::nullptr_t`.❞ As I recall the insistence on a null pointer constant being a literal was introduced in C++11; the C++03 definition was C++03 §4.10/1: ❝A /null pointer constant/ is an integral constant expression rvalue of integer type that evaluates to zero.❞ A subtle but perhaps important change. - Alf |
| Keith Thompson <Keith.S.Thompson+u@gmail.com>: Sep 27 11:32AM -0700 > ❝A /null pointer constant/ is an integral constant expression rvalue > of integer type that evaluates to zero.❞ > A subtle but perhaps important change. Perhaps subtle, but I don't think it's all that important. It means that (2-2) is a null pointer constant in C++03 but not in C++17 -- but I can't think of any good reason to use (2-2) as a null pointer constant outside of deliberately contrived code. Of course the addition of `std::nullptr_t` is important. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips void Void(void) { Void(); } /* The recursive call of the void */ |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Sep 27 03:03PM -0400 On 9/27/21 1:47 AM, Juha Nieminen wrote: > nullptr (if we are in C++11 or newer). > (I haven't checked the standard, but this tells me that the standard does > not mandate NULL to be defined as 0.) The mandate is quite clear: "The macro NULL is an implementation-defined null pointer constant." (17.2p3). Note that "null pointer constant" has a different definition in C++ than in C, and when compiling using C++, NULL must have a definition that meets C++ requirements rather than C requirements. In particular, that means that NULL can expand to "a prvalue of type std::nullptr_t." (7.3.11p1) |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 27 05:35PM +0200 Am 26.09.2021 um 07:26 schrieb Bonita Montero: > So I can adjust the spinning-loop according > to pause_singleton::getNsPerPause(). I dropped it ! I simply made a spinning-loop according to the TSC if the CPU has a TSC and it is invariant (these are also invariant across sockets !). Reading the TSC can be done at roughly every 10 nanoseconds my PC (TR3990X, Zen3, Win10, SMT off). It's not accu- rate since it might overlap with instruction before or afterwards, but accuracy isn't relevant when you spin hundreds of clock-cycles. And I changed a single pause per spin loop instead of a row of PAUSEs which sum up to 30ns (which is roughly the most common value on newer Intel -CPUs). This more eager spinnging may gain locking earlier, although it may generate more interconnect-traffic. But as I'm using RDTSC: I'm asking myself how fast RDTSC is on different CPUs. So I modified my test-program to measure different routines to test a loop of 10 RDTSCs per loop. Here it is: #include <iostream> #include <chrono> #include <limits> #include <functional> #if defined(_MSC_VER) #include <intrin.h>
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment