Thursday, May 27, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

"Öö Tiib" <ootiib@hot.ee>: May 27 05:52PM -0700

On Thursday, 27 May 2021 at 21:00:13 UTC+3, Chris Vine wrote:
> include std::bad_alloc in the mix. C++11 threads don't deal with
> cancellation and will never be able to do so because they require OS
> support - in this case POSIX support.
 
With that thread cancellation I'm always in doubt. Especially if discussed
in context of RAII and destructor of thing like std::thread.
 
One of major reasons making a thread is to turn synchronous work into
asynchronous. For that we put it into thread so then synchronous work is
going on in thread doing its blocking operations (potentially checking if
the work has become obsolete and/or reporting progress between those)
and once complete (or failed) handing the results back over to caller thread.
That way the caller thread has all benefits of asynchronous works going on.
It is fully responsive, so can take care of other responsibilities, entertain
users, report progresses, mark ongoing works obsolete and/or put out the
results (or failures) once something becomes complete.
 
But what is that cancellation? It is like telling to worker thread to die in
middle of unknown state potentially in kernel with unknown amount of
locks taken, files open or mapped to virtual memory, that fragile external
hardware device produced by random morons in middle of something
and what not? std::terminate feels better and more honest than to cancel
the thread ... at least our users will see that we screwed all up in major
way. It all gets of course lost in examples because these talk about "foo()".
Bonita Montero <Bonita.Montero@gmail.com>: May 28 03:55AM +0200

> at defined points in the code which are able to deal with it (normally
> where a wait is to occur). Done properly, thread cancellation is far
> easier to use than exceptions, ...
 
Handling cancelation via flags and exceptions is cleaner and as easy
to handle. It's cleaner as it does honor destructors to be called inside
the stack. So never use Posix cancellation in C++-programs !
Bonita Montero <Bonita.Montero@gmail.com>: May 28 04:02AM +0200

> ofstream's destructor has to ignore errors on closing the system handle ---
> a bad choice, data loss without detection.
 
See it that way: when closing a file-handle file data is usally
written back asynchronously, so you can't see if writing the
data was done properly if you don't flush before. So silently
closing the handle is o.k..
Ian Collins <ian-news@hotmail.com>: May 28 02:59PM +1200

On 28/05/2021 12:52, Öö Tiib wrote:
> and what not? std::terminate feels better and more honest than to cancel
> the thread ... at least our users will see that we screwed all up in major
> way. It all gets of course lost in examples because these talk about "foo()".
 
Cancellation is well defined in POSIX. Cancellation points are defined
by the standard and threads can manage their own cancellation type and
state,
 
If you want to manage resources in a thread that may be cancelled, you
can use cleanup functions. I haven't tried it in Linux, but certainly
in Solaris, the C++ runtime will destroy objects if the creating thread
is cancelled. This is one of those annoying behaviours not covered by
either the C++ or POSIX standards...
 
A quick check on Linux shows that yes, destructors are called when a
thread is cancelled.
 
--
Ian
Bonita Montero <Bonita.Montero@gmail.com>: May 28 05:24AM +0200

> can use cleanup functions.  I haven't tried it in Linux, but certainly
> in Solaris, the C++ runtime will destroy objects if the creating thread
> is cancelled. ...
 
With Linux that doesn't work:
 
#include <iostream>
#include <limits>
#include <pthread.h>
#include <unistd.h>
 
using namespace std;
 
struct destr
{
~destr();
};
 
destr::~destr()
{
cout << "destr::~destr()" << endl;
}
 
int main()
{
auto thr = []( void * ) -> void *
{
cout << "thread is running" << endl;
int oldStat;
if( pthread_setcanceltype( PTHREAD_CANCEL_DEFERRED, &oldStat ) != 0
|| pthread_setcanceltype( PTHREAD_CANCEL_ENABLE, &oldStat ) != 0 )
{
cout << "can't enable cancelling" << endl;
return nullptr;
}
destr d;
sleep( numeric_limits<int>::max() );
return nullptr;
};
pthread_t pt;
if( pthread_create( &pt, nullptr, thr, nullptr ) != 0 )
{
cout << "can't create thread" << endl;
return -1;
}
if( pthread_cancel( pt ) != 0 )
{
cout << "can't cancel thread" << endl;
return -1;
}
if( pthread_join( pt, nullptr ) != 0 )
{
cout << "can't join thread" << endl;
return -1;
}
 
}
 
Remember that a cancellation-request remains queued even if cancellation
isn't enabled yet.
Bonita Montero <Bonita.Montero@gmail.com>: May 28 05:27AM +0200

> ) != 0
>             || pthread_setcanceltype( PTHREAD_CANCEL_ENABLE,   &oldStat
> ) != 0 )
 
Oh, a little bug:
if( pthread_setcanceltype( PTHREAD_CANCEL_DEFERRED, &oldStat ) != 0
|| pthread_setcancelstate( PTHREAD_CANCEL_ENABLE, &oldStat ) != 0 )
But doesn't change anything.
"Öö Tiib" <ootiib@hot.ee>: May 27 08:37PM -0700

On Friday, 28 May 2021 at 05:59:41 UTC+3, Ian Collins wrote:
> either the C++ or POSIX standards...
 
> A quick check on Linux shows that yes, destructors are called when a
> thread is cancelled.
 
Very interesting ... but magical solutions make me even more worried.
Is there some kind of secret exception thrown or some kind of alternative
stack unwinding used or what? If secret exception then does
catch(...) { mopup(); throw; } work or has it to be full RAII? If alternative
stack unwinding then what it costs and do noexcept(true) functions
in call stack compile still into that rainbow table because of it?
Ian Collins <ian-news@hotmail.com>: May 28 04:50PM +1200

On 28/05/2021 15:37, Öö Tiib wrote:
> catch(...) { mopup(); throw; } work or has it to be full RAII? If alternative
> stack unwinding then what it costs and do noexcept(true) functions
> in call stack compile still into that rainbow table because of it?
 
I believe (at lease on Solaris), the "magic" is the runtime using
pthread_cleanup_push/pthread_cleanup_pop to manage the destructor
calling, there's no need for exceptions. There's nothing to stop you
doing this by hand...
 
There a a number of behaviours which fall between two standards and we
have to rely on the quality of the implementation, this just happens to
be one.
 
A simple test case:
 
#include <pthread.h>
#include <iostream>
#include <unistd.h>
 
struct foo
{
foo() { std::cout << "constructor" << std::endl; }
 
~foo() { std::cout << "destructor" << std::endl; }
};
 
void*
thread(void*)
{
foo f;
 
while (true)
{
sleep(5);
}
 
return nullptr;
}
 
 
int
main()
{
pthread_t t;
 
pthread_create(&t, 0, thread, 0);
 
sleep(1);
pthread_cancel(t);
 
pthread_join(t, 0);
}
 
--
Ian.
Bonita Montero <Bonita.Montero@gmail.com>: May 28 06:53AM +0200

>   pthread_cancel(t);
 
>   pthread_join(t, 0);
> }
 
You have to makje the thread cancelable.
"Öö Tiib" <ootiib@hot.ee>: May 27 11:07PM -0700

On Friday, 28 May 2021 at 07:51:05 UTC+3, Ian Collins wrote:
 
> There a a number of behaviours which fall between two standards and we
> have to rely on the quality of the implementation, this just happens to
> be one.
 
My threads typically do some sequential work that can be time consuming
and complex (or otherwise why thread?). So I am unsure how to unit or
automatic test canceling it cheaply enough as magic goes in the
supernatural realm.
 
When I do by hand, set some atomic flag (that thread checks if it
should stop) and then join then I can measure, can mock that flag
checking function. It is lot easier?
Bonita Montero <Bonita.Montero@gmail.com>: May 28 08:26AM +0200

> When I do by hand, set some atomic flag (that thread checks if it
> should stop) and then join then I can measure, can mock that flag
> checking function. It is lot easier?
 
And if you load that flag relaxed and it is shared in several
cachelines of different cores the check is usually predicted
as false by the branch-prediction so that checking that flag
almost takes no overhead.
Juha Nieminen <nospam@thanks.invalid>: May 28 06:27AM


>>> Good morning Mr Happy, things going well in Finland today?
 
>>Fuck off, asshole.
 
> Oh dear, another bad day? Have a lie down and cuddle the therapy teddy.
 
Just fuck off already, fucking asshole.
Louis Krupp <lkrupp@invalid.pssw.com.invalid>: May 27 07:03PM -0600

On 5/27/2021 2:38 PM, Vir Campestris wrote:
 
> But having glanced through that code It's not clear to me _why_ set
> works so much better for that case. Most likely lots of insertions or
> searches.
 
The update code for vectors loops through elements until it finds a
match. In the Code Project page:
 
for (auto it = _aPixelActionsVec.begin(); it != _aPixelActionsVec.end(); it++)
{
if (it->ManipulationType == PIXEL_MANIPULATION &&
it->ColumnOrPalIdx == lCol &&
it->Row == lRow)
{
it->NewColIdxOrRef = dwNewColIdxOrRef;
it->NewMask = bNewMask;
return true;
}
}
 
The update code for sets calls the set's "find" method:
 
IMAGMANIPULATION im;
SetPixelManipulation(im, lCol, lRow, dwOldColIdxOrRef, dwNewColIdxOrRef,
bOldMask, bNewMask);

auto it = _aPixelActionsSet.find(im);
if (it != _aPixelActionsSet.end())
{
if (it->NewColIdxOrRef != dwNewColIdxOrRef ||
it->NewMask != bNewMask )
{
_aPixelActionsSet.erase(it);
_aPixelActionsSet.insert(im);
}
return true;
}
 
A set *could* be implemented as a vector, but it's almost certainly not;
as Scott said, it uses a self-balancing binary search tree. The set's
"find" method would take advantage of this to find matches quickly;
think of how you would code a binary search to find something in a
sorted array.
 
If there's no need to cycle through all the elements of a set in a
particular sequence, an unordered set might be even faster than a set;
from what I've read, it uses a hash table instead of a binary search tree.
 
If there's a need to cycle through the elements in their order of
insertion, it might be possible to maintain both a vector and a set,
adding a vector element pointer to the set element. The code could find
matches quickly using the set while still being able to cycle through
the elements of the vector. The bookkeeping required to keep the vector
and the set in sync might or might not be worth the time and trouble.
 
Louis
"Öö Tiib" <ootiib@hot.ee>: May 27 08:10PM -0700

On Wednesday, 26 May 2021 at 23:27:45 UTC+3, Lynn McGuire wrote:
> "STL: Amazing Speed Differences between std::vector and std::set
> (Observed with an UndoRedoAction)"
 
> https://www.codeproject.com/Tips/5303529/STL-Amazing-Speed-Differences-between-std-vector-a
 
The article is such a TL;DR garbage about undo_redo_actions with pixels I don't care
about and so can't profile. It is likely important to lot of other people but I would like
to have minimal reproducible example without that befoggement and
confusion.
 
What is its point? Is it that search from unsorted vector can be orders of magnitude
worse than search from set? Big surprise.
 
> I have seen this myself. We used a std::map for a very large set
> (>10,000 members) just because it is much faster than std::vector.
 
As performance does not matter in majority of code I have quite lot of
std::sets in code. Where performance matters there I have one of
sorted std::vector, std::unordered_set or boost::intrusive::set used.
No case where std::set outperforms all three has been met. So
lately my default of searchable containers is std::unordered_set
and sorted std::vector / boost::intrusive::set are performance
optimizations.
Bonita Montero <Bonita.Montero@gmail.com>: May 28 07:23AM +0200

> lately my default of searchable containers is std::unordered_set
> and sorted std::vector / boost::intrusive::set are performance
> optimizations.
 
Of course: if you sort a vector and do a binary search on this
you have a lot of random access memory accesses which aren't
prectible by the prefetcher and the number of memory-accesses
is usually higher than that of a hash-set.
Paavo Helde <myfirstname@osa.pri.ee>: May 28 08:59AM +0300

28.05.2021 04:03 Louis Krupp kirjutas:
> match. In the Code Project page:
 
>             for  (auto  it = _aPixelActionsVec.begin(); it !=
> _aPixelActionsVec.end(); it++)
 
One can have efficient search in vectors, beating std::set. But for that
the vector must be sorted and one must use std::lower_bound(), not
linear search.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: May 27 04:58PM -0700


> Except the actual size of a FILE * object in the filesystem. Size_t
> can hold the size of the FILE * structure but if the actual file size
> is greater than 4 GB in a Win32 program, size_t will be wrong.
 
A FILE* object is a pointer, likely 4 or 8 bytes. A FILE object is
probably a struct, 216 bytes on my system.
 
> typedef int ptrdiff_t;
> typedef int intptr_t;
>

No comments: