Saturday, May 29, 2021

Digest for comp.lang.c++@googlegroups.com - 20 updates in 3 topics

Bonita Montero <Bonita.Montero@gmail.com>: May 29 08:02AM +0200

>> when closing a file-handle file data is usally written back asynchronously
 
> That's wrong for Windows NT, ...
 
CloseHandle() doesn't flush.
Bonita Montero <Bonita.Montero@gmail.com>: May 29 08:16AM +0200

>>> asynchronously
 
>> That's wrong for Windows NT, ...
 
> CloseHandle() doesn't flush.
 
If I run this ...
 
#include <Windows.h>
#include <iostream>
#include <cstdint>
 
using namespace std;
 
int main( int argc, char **argv )
{
if( argc < 2 )
return -1;
HANDLE hFile = CreateFileA( argv[1], GENERIC_READ | GENERIC_WRITE, 0,
nullptr, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL );
if( hFile == INVALID_HANDLE_VALUE )
return -1;
char buf[0x10000];
DWORD dwWritten;
for( uint64_t n = (size_t)4 * 1024 * 1024 * 1024; n; n -= sizeof buf )
if( !WriteFile( hFile, buf, sizeof buf, &dwWritten, nullptr ) )
return -1;
if( argc >= 3 )
FlushFileBuffers( hFile );
CloseHandle( hFile );
}
 
... without a third parameter the time is ...
 
real 6671.72ms
user 0.00ms
sys 1890.62ms
cycles 7.367.367.816
 
... and with a third parameter it is ...
 
real 11051.36ms
user 0.00ms
sys 2078.12ms
cycles 7.361.105.580
"Öö Tiib" <ootiib@hot.ee>: May 29 01:42AM -0700

On Friday, 28 May 2021 at 14:02:11 UTC+3, Chris Vine wrote:
> catch with a catch-all but which (if you do catch it) you must rethrow
> in your catch block - in other words, you cannot stop cancellation by
> using a catch block once it has started but you can do clean-up.
 
Does that pseudo exception not cause std::terminate when thrown from
noexcept function? Does it mean that implementing noexcept functions
by catching and handling everything is impossible?
 
> multi-threaded programs. Furthermore some POSIX blocking functions
> (including pthread_cond_wait) are specified as not interruptible by
> EINTR.
 
What are the scenarios where there are no opportunities to check
the flags? The blocking functions I've used on posix have timeouts
or versions with timeouts, nonblocking options (O_NONBLOCK) or
do wake up spuriously frequently enough. But it is sure possible
that I've missed scenario as there have been only dozen or so
posix projects.
 
> apart from a lack of familiarity arising from the fact it does not
> feature in the C++ standard and is not usuably available on the windows
> platform.
 
Testing of magical, non-mockable features is really no problem?
I can't find programmers that do no defects so we need to test.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 01:05PM +0100

On Sat, 29 May 2021 01:42:19 -0700 (PDT)
 
> Does that pseudo exception not cause std::terminate when thrown from
> noexcept function? Does it mean that implementing noexcept functions
> by catching and handling everything is impossible?
 
If a function is cancellable, then with the glibc implementation (NPTL)
it cannot be noexcept. I don't know about other cancellation
implementations which unwind the stack. It is up to the programmer to
determine whether the function is to be cancellable or is to be
noexcept: if the function contains a POSIX cancellation point (or
applies a function containing a POSIX cancellation point), the
programmer can either allow or disallow cancellation during the
execution of the function.
 
Of course, the same is true if you use flags and exceptions to emulate
cancellation.
 
> do wake up spuriously frequently enough. But it is sure possible
> that I've missed scenario as there have been only dozen or so
> posix projects.
 
The base case is a thread waiting for something to happen which it
turns out can no longer happen in the manner required. There are
numerous variations on this of course.
 
If the blocking function in question has a timeout option then yes you
could use timeouts and loop on the timeout to check a flag and throw an
exception if the flag is set. If the function in question does not
have a timeout option but has a EAGAIN option (say you have a file
descriptor set O_NONBLOCK), polling a flag in a loop is possible but
usually sub-optimal because you have to mitigate an otherwise tight loop
by use of sched_yield(), pthread_yield(), usleep() or similar in the
loop - and using usleep() is problematic because if the waiting event
occurs, you introduce variable latency into its handling. (Obviously if
you have a non-blocking file descriptor and you are selecting on the
descriptor the issue is different, but then you are not looking at
thread cancellation: instead your business is to remove the descriptor
from the selection set.) For blocking functions which return on an
interrupt (not all do), using EINTR via pthread_kill is a possibility
but then the signal in question cannot be one of a set on which
sigwait() is waiting and you cannot set SA_RESTART for it. Relying on
asynchronous signals and EINTR, and throwing an exception if a quit
flag is set, is just a poor version of what cancellation does better in
my view.
 
Thread cancellation is not something you need often. But when you need
it, you need it, either by emulating it in some way or employing it
directly.
 
> > platform.
 
> Testing of magical, non-mockable features is really no problem?
> I can't find programmers that do no defects so we need to test.
 
Testing anything to do with the interaction of different threads with
one another is difficult, including any scheme to emulate cancellation
of one thread by another in the way you have mentioned. Thread
cancellation, if properly done, turns out to be another synchronization
exercise.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 01:44PM +0100

On Sat, 29 May 2021 13:05:49 +0100
> Thread cancellation is not something you need often. But when you need
> it, you need it, either by emulating it in some way or employing it
> directly.
 
By the way, since you have mentioned O_NONBLOCK as an option, I would
not expect that doing a blocking read of a file descriptor would
normally require resorting to cancellation, or some cancellation
substitute involving a non-blocking read and polling on a flag. That
is because the closing of the remote end of a pipe or socket will cause
the blocking read to return anyway. If you are in control of the
remote end of the pipe or socket you can close the remote end; if you
are not in control you are probably happy to wait anyway until the
remote end is closed. And if you are writing to a pipe or socket which
has had its remote end closed you will get SIGPIPE or EPIPE.
"Öö Tiib" <ootiib@hot.ee>: May 29 06:34AM -0700

On Saturday, 29 May 2021 at 15:44:26 UTC+3, Chris Vine wrote:
> are not in control you are probably happy to wait anyway until the
> remote end is closed. And if you are writing to a pipe or socket which
> has had its remote end closed you will get SIGPIPE or EPIPE.
 
But on case the remote end did neither do its operation nor close
within reasonable time-frame? Is it reasonable to assume
that the remote software was programmed by god?
Are cases when code hanging somewhere or forgetting that
it is communicating and leaking the descriptor unusual?
When should my code resort to that cancellation?
 
For me it is most normal, everyday case that there is some proof of
concept or worse level garbage made by startups in hope to raise
funding or something that was good before but is now maintained
by some kind of least bidder. I prefer to close my end and
to blame accordingly without that cancellation if possible.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 02:48PM +0100

On Sat, 29 May 2021 06:34:13 -0700 (PDT)
> funding or something that was good before but is now maintained
> by some kind of least bidder. I prefer to close my end and
> to blame accordingly without that cancellation if possible.
 
It is undefined behaviour for a thread to close a descriptor upon which
another thread is blocking. In fact, it appears linux continues to
block in that case:
http://lkml.iu.edu/hypermail/linux/kernel/0106.0/0768.html
 
If that is your scenario and you don't want to use cancellation I would
resort to asynchronous (non-blocking) i/o and use poll() or select()
even if you don't otherwise need it. You could of course put a timeout
on the poll() or select() as well.
"Öö Tiib" <ootiib@hot.ee>: May 29 06:51AM -0700

On Saturday, 29 May 2021 at 15:06:02 UTC+3, Chris Vine wrote:
> by use of sched_yield(), pthread_yield(), usleep() or similar in the
> loop - and using usleep() is problematic because if the waiting event
> occurs, you introduce variable latency into its handling.
 
Why simply not to ask about 2000-3000$ per hour in project where work
may not have few ms latency but where killing the whole thread doing
it without killing the process itself is allowable? There are lot of more
interesting and fruitful work to do.
"Öö Tiib" <ootiib@hot.ee>: May 29 07:59AM -0700

On Saturday, 29 May 2021 at 16:48:09 UTC+3, Chris Vine wrote:
> resort to asynchronous (non-blocking) i/o and use poll() or select()
> even if you don't otherwise need it. You could of course put a timeout
> on the poll() or select() as well.
 
Yes, I repeat "The blocking functions I've used on posix have timeouts
or versions with timeouts, nonblocking options (O_NONBLOCK) or
do wake up spuriously frequently enough." I have met no reason to use
anything else and was asking what is the scenario that does not let my
thread to check frequently enough if the work it is doing or the whole
thread has became obsolete and so should wrap it up.
 
Also I can mock that flag checking function and just make it to tell that
now it is time to wrap up on 6987th check of flags. How to do same
with cancellation?
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 29 06:15PM +0100

On Sat, 29 May 2021 07:59:02 -0700 (PDT)
 
> Also I can mock that flag checking function and just make it to tell that
> now it is time to wrap up on 6987th check of flags. How to do same
> with cancellation?
 
I am not certain what you mean by "wrap up on", but you can instrument
a cancellation by including a catch-all in your checking function which
logs that cancellation has begun and such other state as is available
to it to record, and then rethrows. One happy outcome of using POSIX
functions is that the only exception-like thing they can emit is a
cancellation pseudo-exception. But perhaps you meant something else.
 
As to usage I have used cancellation with a thread blocking on
accept(). It would have been possible to make the socket descriptor
non-blocking and block on select() after adding the socket to the set of
read descriptors (when a non-blocking accept receives a connection
select() will signal it as ready for reading), and polled a flag on a
select timeout, but just cancelling it proved much easier and more
obvious. I can recall using it to kill a thread waiting on
pthread_join() but I cannot now remember the reasons why.
Paavo Helde <myfirstname@osa.pri.ee>: May 29 02:24AM +0300

28.05.2021 20:54 Bonita Montero kirjutas:
> as the set and map trees are usually red-black-trees there's some
> kind of binary-lookup inside the nodes (up to 4 descendants), so the
> memory access-patterns become not so random.
 
Are you kidding? Making big-O tests with N=1000? This is an insult for
big-O!
 
Here are timings of your program with a bit more meaningful N. I also
added unordered_set for curiosity, in the last column.
 
N ROUNDS Set time /ns/ Vec time /ns/ hash_set /ns/
1'000 10'000'000 52.3 50.0 4.88
100'000 100'000 208 115 14.6
10'000'000 1000 1322 340 53
100'000'000 100 2242 690 122
1'000'000'000 10 5320 1380 420
 
So, std::set is the clear loser here. std::unordered_set lookup is
fastest, but OTOH its memory consumption, construction and destruction
time are the worst (not shown in the table, but building and destroying
the last hash set with 1G elements took ages and it ate up almost 64GB
RAM!). A (sorted) vector takes *much* less memory, is *much* faster to
construct and destruct, and binary lookup in it is better than std::set,
regardless of size.
Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:18AM +0200

> A thousand integers will fit in the L1 cache.
> Try it with a hundred million.
 
It will be slower - but the relationship will be the same.
Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:25AM +0200

> Your test is wrong. Too much iterations while data fits in cache:
> ...
 
When I do this:
 
#include <iostream>
#include <set>
#include <vector>
#include <algorithm>
#include <random>
#include <chrono>
 
using namespace std;
using namespace chrono;
 
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
size_t const N = 1'000'000,
ROUNDS = 100'000;
set<int> si;
for( int i = N; i--; )
si.insert( i );
mt19937_64 mt( (default_random_engine())() );
uniform_int_distribution<int> uid( 0, 999 );
int const *volatile pvi;
hrc_tp start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
pvi = &*si.find( uid( mt ) );
double ns = (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (double)ROUNDS;
cout << "set: " << ns << endl;
vector<int> vi( N );
for( size_t i = N; i--; vi[i] = (int)i );
bool volatile found;
start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
found = binary_search( vi.begin(), vi.end(), uid( mt ) );
ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count() / (double)ROUNDS;
cout << "vec: " << ns << endl;
}
 
set is about 78,2 and vector is about 53,8.
Not a big difference ...
Bonita Montero <Bonita.Montero@gmail.com>: May 29 05:42AM +0200

And this is an extension to unordered_set:
 
#include <iostream>
#include <set>
#include <unordered_set>
#include <vector>
#include <algorithm>
#include <random>
#include <chrono>
 
using namespace std;
using namespace chrono;
 
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
size_t const N = 10'000'000,
ROUNDS = 10'000'000;
set<int> si;
for( int i = N; i--; )
si.insert( i );
mt19937_64 mt( (default_random_engine())() );
uniform_int_distribution<int> uid( 0, 999 );
int const *volatile pvi;
hrc_tp start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
pvi = &*si.find( uid( mt ) );
double ns = (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (double)ROUNDS;
cout << "set: " << ns << endl;
unordered_set<int> usi;
usi.max_load_factor( 2.0f );
usi.reserve( N );
for( int i = N; i--; )
usi.insert( i );
bool volatile found;
start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
pvi = &*usi.find( uid( mt ) );
ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count() / (double)ROUNDS;
cout << "uset: " << ns << endl;
vector<int> vi( N );
for( size_t i = N; i--; vi[i] = (int)i );
start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
found = binary_search( vi.begin(), vi.end(), uid( mt ) );
ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count() / (double)ROUNDS;
cout << "vec: " << ns << endl;
}
 
So with unordered_set the lookup is 15.1ns. I think that's
also while the lookups get OoO-parallelized because the
bucket-chains aren't so deep.
Branimir Maksimovic <branimir.maksimovic@gmail.com>: May 29 10:44AM

> }
 
> set is about 78,2 and vector is about 53,8.
> Not a big difference ...
 
That still fits in cache. Try with 8-16 million depending on your cache
size.
 
 
--
current job title: senior software engineer
skills: x86 aasembler,c++,c,rust,go,nim,haskell...
 
press any key to continue or any other to quit...
Bonita Montero <Bonita.Montero@gmail.com>: May 29 01:00PM +0200

> That still fits in cache. Try with 8-16 million depending on your cache
> size.
 
10E6 ints are 40MB with the vector-version. I've got only 4MB L2-cache,
but 256MB (8 * 32MB) L3-cache. But a cacheline in a L3-cache of a CPU
-die of my multi-die CPU is only populated by one of the cores of the
same die; other dies can't populate it.
 
That are the number from memory:
 
set: 1872.5
uset: 188.302
vec: 695.665
"Öö Tiib" <ootiib@hot.ee>: May 29 08:17AM -0700

On Friday, 28 May 2021 at 08:23:54 UTC+3, Bonita Montero wrote:
> you have a lot of random access memory accesses which aren't
> prectible by the prefetcher and the number of memory-accesses
> is usually higher than that of a hash-set.
 
If it is that or other way can be redicted per use case.
But as std::lower_bound is constexpr but std::unordered_map is
not the latter can not compete on field of O(0) not doing it runtime
at all.
Bonita Montero <Bonita.Montero@gmail.com>: May 29 06:12PM +0200

> If it is that or other way can be redicted per use case.
 
If you have a reasonable load-factor on a hash-set it always outperforms
the vector.
 
> But as std::lower_bound is constexpr ...
 
If you've got a array with static content that lower_bound is scanning
in and a very smart compiler this would help - otherwise not.
"Öö Tiib" <ootiib@hot.ee>: May 28 05:10PM -0700

On Friday, 28 May 2021 at 22:10:07 UTC+3, Keith Thompson wrote:
> behavior of any strictly conforming program.
 
> I'm not sure how (or why!) you'd forbid extensions that happen to act
> almost like integer types.
 
I hoped to managed to express it. Technically the integer types
are causing some of trouble as the rules of promotion and implicit
conversion especially with implementation defined features in mix
seem not to be intuitive to many programmers. Therefore desire to
regulate it is welcome and appearance of regulating something
without actually regulating anything is double unwelcome.
 
> cause serious problems with ABIs (there are standard library functions
> that take arguments of type intmax_t). The alternative would have been
> not to support 128-bit integers at all.
 
I think of it as nonsense. The monsters are just thinking they are clever
and fooling each other and so there is illusion of consensus. Actual
desire is to have support to 128 bit (or perhaps arbitrary amount of bit)
integers in their Golangs, Swifts, Javas or C#s before C and C++ and so
hopefully others. But their proprieritary language infrastructures are
mostly written in C or C++ so I see no point in pretending that we don't
see it through.
David Brown <david.brown@hesbynett.no>: May 29 11:47AM +0200

On 28/05/2021 21:09, Keith Thompson wrote:
 
> cause serious problems with ABIs (there are standard library functions
> that take arguments of type intmax_t). The alternative would have been
> not to support 128-bit integers at all.
 
The definition of intmax_t is a problem - it is a limitation for integer
types in C and C++. I'd have preferred to see functions like "abs" be
type-generic macros in C and template functions in C++. From C90 there
was "abs" and "labs" - C99 could have skipped "llabs" and "imaxabs", and
similar functions. The "div" functions wouldn't need extended for
bigger types - they are a hangover from an era of weaker compilers. No
doubt there would be complications with some other functions that today
use intmax_t types - no doubt there would be alternative ways of
handling them, given a bit of thought.
 
But of course it is too late to change all that now. The gcc solution
of __int128 covers most purposes without affecting backwards compatibility.
 
 
> I'd like to see full support for 128-bit integers, but gcc's __int128 is
> IMHO better than nothing (though to be honest I've never used it except
> in small test programs).
 
There is nothing stopping the C++ standard library introducing types
std::int<N> and std::uint<N> types, where implementations can choose
which sizes of N they support (but requiring support for any N for which
std::intN_t exists). These would work just like integer types for most
purposes, but not be /called/ integer types. So in gcc, std::int<128>
would be the same as __int128_t.
 
Constants would be handled by user-defined literals.
 
(I also don't see much need for 128-bit or bigger types - until you get
to cryptography-sized integers - but I guess some people do.)
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: