| Bonita Montero <Bonita.Montero@gmail.com>: May 29 08:02AM +0200 >> when closing a file-handle file data is usally written back asynchronously > That's wrong for Windows NT, ... CloseHandle() doesn't flush. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 08:16AM +0200 >>> asynchronously >> That's wrong for Windows NT, ... > CloseHandle() doesn't flush. If I run this ... #include <Windows.h> #include <iostream> #include <cstdint> using namespace std; int main( int argc, char **argv ) { if( argc < 2 ) return -1; HANDLE hFile = CreateFileA( argv[1], GENERIC_READ | GENERIC_WRITE, 0, nullptr, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL ); if( hFile == INVALID_HANDLE_VALUE ) return -1; char buf[0x10000]; DWORD dwWritten; for( uint64_t n = (size_t)4 * 1024 * 1024 * 1024; n; n -= sizeof buf ) if( !WriteFile( hFile, buf, sizeof buf, &dwWritten, nullptr ) ) return -1; if( argc >= 3 ) FlushFileBuffers( hFile ); CloseHandle( hFile ); } ... without a third parameter the time is ... real 6671.72ms user 0.00ms sys 1890.62ms cycles 7.367.367.816 ... and with a third parameter it is ... real 11051.36ms user 0.00ms sys 2078.12ms cycles 7.361.105.580 |
| "Öö Tiib" <ootiib@hot.ee>: May 29 01:42AM -0700 On Friday, 28 May 2021 at 14:02:11 UTC+3, Chris Vine wrote: > catch with a catch-all but which (if you do catch it) you must rethrow > in your catch block - in other words, you cannot stop cancellation by > using a catch block once it has started but you can do clean-up. Does that pseudo exception not cause std::terminate when thrown from noexcept function? Does it mean that implementing noexcept functions by catching and handling everything is impossible? > multi-threaded programs. Furthermore some POSIX blocking functions > (including pthread_cond_wait) are specified as not interruptible by > EINTR. What are the scenarios where there are no opportunities to check the flags? The blocking functions I've used on posix have timeouts or versions with timeouts, nonblocking options (O_NONBLOCK) or do wake up spuriously frequently enough. But it is sure possible that I've missed scenario as there have been only dozen or so posix projects. > apart from a lack of familiarity arising from the fact it does not > feature in the C++ standard and is not usuably available on the windows > platform. Testing of magical, non-mockable features is really no problem? I can't find programmers that do no defects so we need to test. |
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 01:05PM +0100 On Sat, 29 May 2021 01:42:19 -0700 (PDT) > Does that pseudo exception not cause std::terminate when thrown from > noexcept function? Does it mean that implementing noexcept functions > by catching and handling everything is impossible? If a function is cancellable, then with the glibc implementation (NPTL) it cannot be noexcept. I don't know about other cancellation implementations which unwind the stack. It is up to the programmer to determine whether the function is to be cancellable or is to be noexcept: if the function contains a POSIX cancellation point (or applies a function containing a POSIX cancellation point), the programmer can either allow or disallow cancellation during the execution of the function. Of course, the same is true if you use flags and exceptions to emulate cancellation. > do wake up spuriously frequently enough. But it is sure possible > that I've missed scenario as there have been only dozen or so > posix projects. The base case is a thread waiting for something to happen which it turns out can no longer happen in the manner required. There are numerous variations on this of course. If the blocking function in question has a timeout option then yes you could use timeouts and loop on the timeout to check a flag and throw an exception if the flag is set. If the function in question does not have a timeout option but has a EAGAIN option (say you have a file descriptor set O_NONBLOCK), polling a flag in a loop is possible but usually sub-optimal because you have to mitigate an otherwise tight loop by use of sched_yield(), pthread_yield(), usleep() or similar in the loop - and using usleep() is problematic because if the waiting event occurs, you introduce variable latency into its handling. (Obviously if you have a non-blocking file descriptor and you are selecting on the descriptor the issue is different, but then you are not looking at thread cancellation: instead your business is to remove the descriptor from the selection set.) For blocking functions which return on an interrupt (not all do), using EINTR via pthread_kill is a possibility but then the signal in question cannot be one of a set on which sigwait() is waiting and you cannot set SA_RESTART for it. Relying on asynchronous signals and EINTR, and throwing an exception if a quit flag is set, is just a poor version of what cancellation does better in my view. Thread cancellation is not something you need often. But when you need it, you need it, either by emulating it in some way or employing it directly. > > platform. > Testing of magical, non-mockable features is really no problem? > I can't find programmers that do no defects so we need to test. Testing anything to do with the interaction of different threads with one another is difficult, including any scheme to emulate cancellation of one thread by another in the way you have mentioned. Thread cancellation, if properly done, turns out to be another synchronization exercise. |
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 01:44PM +0100 On Sat, 29 May 2021 13:05:49 +0100 > Thread cancellation is not something you need often. But when you need > it, you need it, either by emulating it in some way or employing it > directly. By the way, since you have mentioned O_NONBLOCK as an option, I would not expect that doing a blocking read of a file descriptor would normally require resorting to cancellation, or some cancellation substitute involving a non-blocking read and polling on a flag. That is because the closing of the remote end of a pipe or socket will cause the blocking read to return anyway. If you are in control of the remote end of the pipe or socket you can close the remote end; if you are not in control you are probably happy to wait anyway until the remote end is closed. And if you are writing to a pipe or socket which has had its remote end closed you will get SIGPIPE or EPIPE. |
| "Öö Tiib" <ootiib@hot.ee>: May 29 06:34AM -0700 On Saturday, 29 May 2021 at 15:44:26 UTC+3, Chris Vine wrote: > are not in control you are probably happy to wait anyway until the > remote end is closed. And if you are writing to a pipe or socket which > has had its remote end closed you will get SIGPIPE or EPIPE. But on case the remote end did neither do its operation nor close within reasonable time-frame? Is it reasonable to assume that the remote software was programmed by god? Are cases when code hanging somewhere or forgetting that it is communicating and leaking the descriptor unusual? When should my code resort to that cancellation? For me it is most normal, everyday case that there is some proof of concept or worse level garbage made by startups in hope to raise funding or something that was good before but is now maintained by some kind of least bidder. I prefer to close my end and to blame accordingly without that cancellation if possible. |
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 02:48PM +0100 On Sat, 29 May 2021 06:34:13 -0700 (PDT) > funding or something that was good before but is now maintained > by some kind of least bidder. I prefer to close my end and > to blame accordingly without that cancellation if possible. It is undefined behaviour for a thread to close a descriptor upon which another thread is blocking. In fact, it appears linux continues to block in that case: http://lkml.iu.edu/hypermail/linux/kernel/0106.0/0768.html If that is your scenario and you don't want to use cancellation I would resort to asynchronous (non-blocking) i/o and use poll() or select() even if you don't otherwise need it. You could of course put a timeout on the poll() or select() as well. |
| "Öö Tiib" <ootiib@hot.ee>: May 29 06:51AM -0700 On Saturday, 29 May 2021 at 15:06:02 UTC+3, Chris Vine wrote: > by use of sched_yield(), pthread_yield(), usleep() or similar in the > loop - and using usleep() is problematic because if the waiting event > occurs, you introduce variable latency into its handling. Why simply not to ask about 2000-3000$ per hour in project where work may not have few ms latency but where killing the whole thread doing it without killing the process itself is allowable? There are lot of more interesting and fruitful work to do. |
| "Öö Tiib" <ootiib@hot.ee>: May 29 07:59AM -0700 On Saturday, 29 May 2021 at 16:48:09 UTC+3, Chris Vine wrote: > resort to asynchronous (non-blocking) i/o and use poll() or select() > even if you don't otherwise need it. You could of course put a timeout > on the poll() or select() as well. Yes, I repeat "The blocking functions I've used on posix have timeouts or versions with timeouts, nonblocking options (O_NONBLOCK) or do wake up spuriously frequently enough." I have met no reason to use anything else and was asking what is the scenario that does not let my thread to check frequently enough if the work it is doing or the whole thread has became obsolete and so should wrap it up. Also I can mock that flag checking function and just make it to tell that now it is time to wrap up on 6987th check of flags. How to do same with cancellation? |
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 06:15PM +0100 On Sat, 29 May 2021 07:59:02 -0700 (PDT) > Also I can mock that flag checking function and just make it to tell that > now it is time to wrap up on 6987th check of flags. How to do same > with cancellation? I am not certain what you mean by "wrap up on", but you can instrument a cancellation by including a catch-all in your checking function which logs that cancellation has begun and such other state as is available to it to record, and then rethrows. One happy outcome of using POSIX functions is that the only exception-like thing they can emit is a cancellation pseudo-exception. But perhaps you meant something else. As to usage I have used cancellation with a thread blocking on accept(). It would have been possible to make the socket descriptor non-blocking and block on select() after adding the socket to the set of read descriptors (when a non-blocking accept receives a connection select() will signal it as ready for reading), and polled a flag on a select timeout, but just cancelling it proved much easier and more obvious. I can recall using it to kill a thread waiting on pthread_join() but I cannot now remember the reasons why. |
| Paavo Helde <myfirstname@osa.pri.ee>: May 29 02:24AM +0300 28.05.2021 20:54 Bonita Montero kirjutas: > as the set and map trees are usually red-black-trees there's some > kind of binary-lookup inside the nodes (up to 4 descendants), so the > memory access-patterns become not so random. Are you kidding? Making big-O tests with N=1000? This is an insult for big-O! Here are timings of your program with a bit more meaningful N. I also added unordered_set for curiosity, in the last column. N ROUNDS Set time /ns/ Vec time /ns/ hash_set /ns/ 1'000 10'000'000 52.3 50.0 4.88 100'000 100'000 208 115 14.6 10'000'000 1000 1322 340 53 100'000'000 100 2242 690 122 1'000'000'000 10 5320 1380 420 So, std::set is the clear loser here. std::unordered_set lookup is fastest, but OTOH its memory consumption, construction and destruction time are the worst (not shown in the table, but building and destroying the last hash set with 1G elements took ages and it ate up almost 64GB RAM!). A (sorted) vector takes *much* less memory, is *much* faster to construct and destruct, and binary lookup in it is better than std::set, regardless of size. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:18AM +0200 > A thousand integers will fit in the L1 cache. > Try it with a hundred million. It will be slower - but the relationship will be the same. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:25AM +0200 > Your test is wrong. Too much iterations while data fits in cache: > ... When I do this: #include <iostream> #include <set> #include <vector> #include <algorithm> #include <random> #include <chrono> using namespace std; using namespace chrono; int main() { using hrc_tp = time_point<high_resolution_clock>; size_t const N = 1'000'000, ROUNDS = 100'000; set<int> si; for( int i = N; i--; ) si.insert( i ); mt19937_64 mt( (default_random_engine())() ); uniform_int_distribution<int> uid( 0, 999 ); int const *volatile pvi; hrc_tp start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) pvi = &*si.find( uid( mt ) ); double ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; cout << "set: " << ns << endl; vector<int> vi( N ); for( size_t i = N; i--; vi[i] = (int)i ); bool volatile found; start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) found = binary_search( vi.begin(), vi.end(), uid( mt ) ); ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; cout << "vec: " << ns << endl; } set is about 78,2 and vector is about 53,8. Not a big difference ... |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:42AM +0200 And this is an extension to unordered_set: #include <iostream> #include <set> #include <unordered_set> #include <vector> #include <algorithm> #include <random> #include <chrono> using namespace std; using namespace chrono; int main() { using hrc_tp = time_point<high_resolution_clock>; size_t const N = 10'000'000, ROUNDS = 10'000'000; set<int> si; for( int i = N; i--; ) si.insert( i ); mt19937_64 mt( (default_random_engine())() ); uniform_int_distribution<int> uid( 0, 999 ); int const *volatile pvi; hrc_tp start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) pvi = &*si.find( uid( mt ) ); double ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; cout << "set: " << ns << endl; unordered_set<int> usi; usi.max_load_factor( 2.0f ); usi.reserve( N ); for( int i = N; i--; ) usi.insert( i ); bool volatile found; start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) pvi = &*usi.find( uid( mt ) ); ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; cout << "uset: " << ns << endl; vector<int> vi( N ); for( size_t i = N; i--; vi[i] = (int)i ); start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) found = binary_search( vi.begin(), vi.end(), uid( mt ) ); ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; cout << "vec: " << ns << endl; } So with unordered_set the lookup is 15.1ns. I think that's also while the lookups get OoO-parallelized because the bucket-chains aren't so deep. |
| Branimir Maksimovic <branimir.maksimovic@gmail.com>: May 29 10:44AM > } > set is about 78,2 and vector is about 53,8. > Not a big difference ... That still fits in cache. Try with 8-16 million depending on your cache size. -- current job title: senior software engineer skills: x86 aasembler,c++,c,rust,go,nim,haskell... press any key to continue or any other to quit... |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 01:00PM +0200 > That still fits in cache. Try with 8-16 million depending on your cache > size. 10E6 ints are 40MB with the vector-version. I've got only 4MB L2-cache, but 256MB (8 * 32MB) L3-cache. But a cacheline in a L3-cache of a CPU -die of my multi-die CPU is only populated by one of the cores of the same die; other dies can't populate it. That are the number from memory: set: 1872.5 uset: 188.302 vec: 695.665 |
| "Öö Tiib" <ootiib@hot.ee>: May 29 08:17AM -0700 On Friday, 28 May 2021 at 08:23:54 UTC+3, Bonita Montero wrote: > you have a lot of random access memory accesses which aren't > prectible by the prefetcher and the number of memory-accesses > is usually higher than that of a hash-set. If it is that or other way can be redicted per use case. But as std::lower_bound is constexpr but std::unordered_map is not the latter can not compete on field of O(0) not doing it runtime at all. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 29 06:12PM +0200 > If it is that or other way can be redicted per use case. If you have a reasonable load-factor on a hash-set it always outperforms the vector. > But as std::lower_bound is constexpr ... If you've got a array with static content that lower_bound is scanning in and a very smart compiler this would help - otherwise not. |
| "Öö Tiib" <ootiib@hot.ee>: May 28 05:10PM -0700 On Friday, 28 May 2021 at 22:10:07 UTC+3, Keith Thompson wrote: > behavior of any strictly conforming program. > I'm not sure how (or why!) you'd forbid extensions that happen to act > almost like integer types. I hoped to managed to express it. Technically the integer types are causing some of trouble as the rules of promotion and implicit conversion especially with implementation defined features in mix seem not to be intuitive to many programmers. Therefore desire to regulate it is welcome and appearance of regulating something without actually regulating anything is double unwelcome. > cause serious problems with ABIs (there are standard library functions > that take arguments of type intmax_t). The alternative would have been > not to support 128-bit integers at all. I think of it as nonsense. The monsters are just thinking they are clever and fooling each other and so there is illusion of consensus. Actual desire is to have support to 128 bit (or perhaps arbitrary amount of bit) integers in their Golangs, Swifts, Javas or C#s before C and C++ and so hopefully others. But their proprieritary language infrastructures are mostly written in C or C++ so I see no point in pretending that we don't see it through. |
| David Brown <david.brown@hesbynett.no>: May 29 11:47AM +0200 On 28/05/2021 21:09, Keith Thompson wrote: > cause serious problems with ABIs (there are standard library functions > that take arguments of type intmax_t). The alternative would have been > not to support 128-bit integers at all. The definition of intmax_t is a problem - it is a limitation for integer types in C and C++. I'd have preferred to see functions like "abs" be type-generic macros in C and template functions in C++. From C90 there was "abs" and "labs" - C99 could have skipped "llabs" and "imaxabs", and similar functions. The "div" functions wouldn't need extended for bigger types - they are a hangover from an era of weaker compilers. No doubt there would be complications with some other functions that today use intmax_t types - no doubt there would be alternative ways of handling them, given a bit of thought. But of course it is too late to change all that now. The gcc solution of __int128 covers most purposes without affecting backwards compatibility. > I'd like to see full support for 128-bit integers, but gcc's __int128 is > IMHO better than nothing (though to be honest I've never used it except > in small test programs). There is nothing stopping the C++ standard library introducing types std::int<N> and std::uint<N> types, where implementations can choose which sizes of N they support (but requiring support for any N for which std::intN_t exists). These would work just like integer types for most purposes, but not be /called/ integer types. So in gcc, std::int<128> would be the same as __int128_t. Constants would be handled by user-defined literals. (I also don't see much need for 128-bit or bigger types - until you get to cryptography-sized integers - but I guess some people do.) |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment