Monday, May 31, 2021

Digest for comp.lang.c++@googlegroups.com - 24 updates in 5 topics

Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 27 07:00PM +0100

On Thu, 27 May 2021 19:32:58 +0200
> design very complex if you really want to make it interruptible at
> every time while processing a call. That's not necessary if you'd
> design it not that stupid.
 
Interrupts were necessary with single-threaded programs. They are not
usually necessary with multi-threaded programs and POSIX provides
everything you need in consequence. You block the signals of interest
in all your threads and then have a single handler thread with blocks
on sigwait() and deals with the signals synchronously (synchronously
for the handler thread that is). Job done.
 
One other thing POSIX provides and C++11 doesn't is thread cancellation.
Some programmers who don't have relevant experience think thread
cancellation is a bad idea because it allows arbitrary cancellation
which cannot be adequately controlled (the "combing your hair with a
fork" jibe). Whilst that is true with windows it absolutely isn't with
POSIX. What you do is use deferred cancellation (the default in POSIX)
and block cancellation as normal policy, and only enable cancellation
at defined points in the code which are able to deal with it (normally
where a wait is to occur). Done properly, thread cancellation is far
easier to use than exceptions, which can jump out of anywhere if you
include std::bad_alloc in the mix. C++11 threads don't deal with
cancellation and will never be able to do so because they require OS
support - in this case POSIX support.
Paavo Helde <myfirstname@osa.pri.ee>: May 26 05:35PM +0300

26.05.2021 14:51 Juha Nieminen kirjutas:
 
> the program will still crash, because std::thread does not like being
> destroyed without being joined or detached, so it just terminates if that's
> the case.
 
std::thread is based on boost::thread which automatically detaches in
the destructor. Alas, this would create a rampant thread which cannot be
joined any more.
 
I guess when it got standardized, they felt that such automatic detach
is no good, but did not dare to enforce automatic join either, by some
reason. So they chose to std::terminate() which is the worst of them all
IMO.
 
See https://isocpp.org/files/papers/p0206r0.html (Discussion about
std::thread and RAII).
Juha Nieminen <nospam@thanks.invalid>: May 25 05:16AM

> while ((num = fread (buffer, sizeof (char), sizeof (buffer) - 1,
> pOutputFile)) > 0)
 
The standard mandates that sizeof(char) is 1. It cannot have any other
value.
 
> {
> // nothing to catch since any error causes this code to bypass
> }
 
If you are catching a memory allocation, why not catch it explicitly and
return an error code or print an informative error message or something
that indicates what happened, rather than silently failing and doing
nothing?
 
try
{
// your code here
}
catch(const std::bad_alloc& e)
{
// Could do, for example:
std::cout << "Memory allocation failed: " << e.what() << "\n";
}
catch(...)
{
std::cout << "Unknown excpetion thrown while trying to read file\n";
}
Lynn McGuire <lynnmcguire5@gmail.com>: May 27 02:08PM -0500

On 5/27/2021 12:50 AM, Christian Gollwitzer wrote:
> size_t for unsigned or ptrdiff_t for signed. That will correspond to a
> 32bit integer in 32 bit and a 64 bit integer in 64 bit.
 
>     Christian
 
Done. With checking against SIZE_MAX before casting the variable to size_t.
 
Yeah, if 1 GB is having trouble in Win32, 2+ GB will be much worse. The
code is now ok to fail without crashing the program. Porting to Win64
is needed in the near future. So many thing to do, so little time. I
will be 61 in a couple of weeks, kinda hoping to retire before 75.
 
Thanks,
Lynn
Lynn McGuire <lynnmcguire5@gmail.com>: May 25 12:46PM -0500

On 5/25/2021 12:04 PM, Paavo Helde wrote:
 
> OP was a bit unclear in this point. But there would be no point to have
> a 32-bit Windows 7 on a machine with 16 GB RAM, so I hope he has got a
> 64-bit OS after all.
 
Yes, I have Windows 7 x64 Pro. I cannot convert to win64 at this time.
When we do convert, it will be a steep hill as we started this code in
Win16 in 1987. The Win32 port was a very steep hill in 2000.
 
Lynn
Lynn McGuire <lynnmcguire5@gmail.com>: May 25 12:49PM -0500

On 5/25/2021 12:16 AM, Juha Nieminen wrote:
> {
> std::cout << "Unknown excpetion thrown while trying to read file\n";
> }
 
Thanks, I did not know the code for catching the bad_alloc explicitly.
 
Lynn
MrSpook_Ann1u@d0v_5eh1bqgd.com: May 25 04:10PM

On Tue, 25 May 2021 18:42:54 +0300
>> as real memory or as configured backing store (i.e. swap space).
 
>For fitting a 900 MB std::string into a memory one needs 900 MB of
>contiguous address space and Bonita is right in pointing out this might
 
Why? Only the virtual memory address space needs to be contiguous, the
real memory pages storing the string could be all over the place.
Bonita Montero <Bonita.Montero@gmail.com>: May 26 06:16AM +0200

Maybe it would be an idea to process your file in pieces ?
Christian Gollwitzer <auriocus@gmx.de>: May 26 08:36AM +0200

Am 25.05.21 um 19:20 schrieb Lynn McGuire:
> not funny.  My calculation engine is 850,000 lines of F77 and about
> 20,000 lines of C++ but it is still portable to the Unix boxen, probably
> mainframes too if any engineers ran them anymore.
 
Have you actually tried to recompile in 64bit? The Win32-API is the same
AFAIUI. Are you casting pointers to ints/longs? I haven't ported
million-LOC programs to 64bit, but in smaller projects there was
surprisingly little to do to make it work. I have no idea what goes
wrong when you link to F77, though.
 
Christian
Bonita Montero <Bonita.Montero@gmail.com>: May 26 09:22AM +0200

> Things would have to be pretty badly FUBARed for the VM to run out of virtual
> memory address space on a 64 bit system given the 16 exabyte max size!
 
Actually AMD64 supports "only" 48 bit page-tables where the lower 47
bit fall into user-space. Intel 64 has a recent change to 56 bit page
-tables, but I think that's rather for systems with large file-mappings.
Bonita Montero <Bonita.Montero@gmail.com>: May 26 08:17AM +0200

I'm too lazy to read your code.
 
> #define CT_L2_ALIGNMENT 128
 
But this is a false assumption. Except from the Intel Pentium 4 and
the IBM POWER's L4-caches (embedded DRAM) there are AFAIK no caches
that have 128 byte cacheline size. 128 byte cachelines have been pro-
ven to be inefficient below a certain cache-size because the cache-
lines become used too partitially as there's a too less reusage-be-
haviour of the working-set.
Better use this:
https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 07:03PM -0700

Fwiw, here is some crude Windows code I just coded up for a single
linked list using a futex. It only supports push and flush operations
for now. flush will wait if the stack is empty. I was to lazy to
implement an ABA counter. However, it shows an interesting way to use a
futex for a lock-free algorihtm.
 
Can you get it to run? Thanks.
__________________________________________________
 
 
// Futex Single Linked List by Chris M. Thomasson
//___________________________________________________
 
 
 
#include <iostream>
#include <thread>
#include <vector>
#include <functional>
#include <cassert>
 
 
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
 
 
#define CT_L2_ALIGNMENT 128
#define CT_ITERS 6666666
#define CT_NODES 42
 
 
#define CT_WAITBIT 0x1UL
 
 
static LONG g_memory_allocations = 0;
static LONG g_memory_deallocations = 0;
static LONG g_futex_signals = 0;
static LONG g_futex_waits = 0;
 
struct ct_node
{
ct_node* m_next;
 
ct_node() : m_next(NULL)
{
InterlockedAdd(&g_memory_allocations, 1);
}
 
~ct_node()
{
InterlockedAdd(&g_memory_deallocations, 1);
}
};
 
 
#define CT_NODE_SET_WAITBIT(mp_ptr) ((ct_node*)(((ULONG_PTR)(mp_ptr)) |
CT_WAITBIT))
#define CT_NODE_CHECK_WAITBIT(mp_ptr) (((ULONG_PTR)(mp_ptr)) & CT_WAITBIT)
#define CT_NODE_CLEAR_WAITBIT(mp_ptr) ((ct_node*)(((ULONG_PTR)(mp_ptr))
& ~CT_WAITBIT))
 
 
void ct_node_flush(ct_node* node)
{
while (node)
{
ct_node* next = node->m_next;
delete node;
node = next;
}
}
 
 
struct ct_futex_slist
{
ct_node* alignas(CT_L2_ALIGNMENT) m_head;
 
 
ct_futex_slist() : m_head(nullptr)
{
 
}
 
 
void push(ct_node* node)
{
ct_node* head = m_head;
 
for (;;)
{
ct_node* xchg = CT_NODE_CLEAR_WAITBIT(head);
node->m_next = xchg;
 
ct_node* ret =
(ct_node*)InterlockedCompareExchangePointer((PVOID*)&m_head, node, head);
 
if (ret == head)
{
if (CT_NODE_CHECK_WAITBIT(ret))
{
InterlockedAdd(&g_futex_signals, 1);
WakeByAddressSingle(&m_head);
}
 
return;
}
 
head = ret;
}
}
 
 
ct_node* flush()
{
ct_node* head_raw =
(ct_node*)InterlockedExchangePointer((PVOID*)&m_head, NULL);
ct_node* head = CT_NODE_CLEAR_WAITBIT(head_raw);
 
if (! head)
{
for (;;)
{
head_raw =
(ct_node*)InterlockedExchangePointer((PVOID*)&m_head, (ct_node*)CT_WAITBIT);
head = CT_NODE_CLEAR_WAITBIT(head_raw);
 
if (head)
{
break;
}
 
InterlockedAdd(&g_futex_waits, 1);
ct_node* waitbit = (ct_node*)CT_WAITBIT;
WaitOnAddress(&m_head, &waitbit, sizeof(PVOID), INFINITE);
}
}
 
return head;
}
};
 
 
struct ct_shared
{
ct_futex_slist m_slist;
 
 
~ct_shared()
{
ct_node* head_raw = m_slist.m_head;
 
if (CT_NODE_CHECK_WAITBIT(head_raw))
{
std::cout << "\n\nWAITBIT LEAK!\n";
}
 
ct_node_flush(head_raw);
 
if (g_memory_allocations != g_memory_deallocations)
{
std::cout << "\n\nMEMORY LEAK!\n";
}
 
std::cout << "\ng_memory_allocations = " <<
g_memory_allocations << "\n";
std::cout << "g_memory_deallocations = " <<
g_memory_deallocations << "\n";
std::cout << "g_futex_waits = " << g_futex_waits << "\n";
std::cout << "g_futex_signals = " << g_futex_signals << "\n";
}
};
 
 
void ct_thread(ct_shared& shared)
{
for (unsigned long i = 0; i < CT_ITERS; ++i)
{
for (unsigned long n = 0; n < CT_NODES; ++n)
{
shared.m_slist.push(new ct_node());
}
 
ct_node* node = shared.m_slist.flush();
ct_node_flush(node);
}
 
shared.m_slist.push(new ct_node());
}
 
 
int main()
{
unsigned int threads_n = std::thread::hardware_concurrency();
 
std::vector<std::thread> threads(threads_n);
 
std::cout << "Futex Single Linked List by Chris M. Thomasson\n\n";
 
std::cout << "Launching " << threads_n << " threads...\n";
std::cout.flush();
 
{
ct_shared shared;
 
for (unsigned long i = 0; i < threads_n; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared));
}
 
std::cout << "Processing...\n";
std::cout.flush();
 
for (unsigned long i = 0; i < threads_n; ++i)
{
threads[i].join();
}
}
 
std::cout << "\nCompleted!\n";
 
return 0;
}
__________________________________________________
 
 
 
Here is my output:
__________________________________________________
Futex Single Linked List by Chris M. Thomasson
 
Launching 4 threads...
Processing...
 
g_memory_allocations = 1119999892
g_memory_deallocations = 1119999892
g_futex_waits = 21965
g_futex_signals = 55630
 
Completed!
__________________________________________________
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 11:21PM -0700

On 5/25/2021 11:17 PM, Bonita Montero wrote:
> haviour of the working-set.
> Better use this:
> https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size
 
Yes, you are correct. std::hardware_constructive_interference_size is
the way to go. Thanks for taking a look at my code to begin with Bonita!
 
:^)
Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:32AM +0200

> I can give you a generic waiting thing called an, oh wait. I want
> to see if you can do it for yourself. I can encrypt my answer....
 
You're stupid. We're talking about whether a lock-free queue
is possible without polling. And it is obviously not; if it
wouldn't be, it wouldn't be lock-free.
Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:30AM +0200

>> an item in the queue the only opportunity that is left is polling.
 
> You are missing things again. Think! Remember back when you said that
> mutexes must use CAS?
 
I never said that because I implemended mutex with LOCK XADD a long
time ago. And this has nothing to do with that you don't understand
that lock-free programming isn't possible without polling.
Describe me a lock-free algorithm that works without polling !
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 24 11:33PM -0700

On 5/24/2021 11:32 PM, Bonita Montero wrote:
>> to see  if you can do it for yourself. I can encrypt my answer....
 
> You're stupid. We're talking about whether a lock-free queue
> is possible without polling. And it is obviously not
 
Why poll, when we can check the predicate then wait?
 
 
; if it
Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:36AM +0200

>> that lock-free programming isn't possible without polling.
>> Describe me a lock-free algorithm that works without polling !
 
> http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=27becd4634e98df52a89d06b2906ce8a7fdf24968475da63b3d57719e7eb01d4ab5f0e28d0a76c73883661af6abd0514412e774530b3af251b118a66f38c521f1bb4c5652c303411c9de7606f4690777a997ad6d65ac625c456673216ae073a067fd34e5f37010811231569824a22be1f5ab1ee525a30626b87986672f
 
Have you ever been checked by a mental health doctor ?
That has nothing to do with what we're talking about.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 12:07AM -0700

On 5/25/2021 12:05 AM, Bonita Montero wrote:
>>> Otherwise they woudln't be lock-free.
 
>> Oh argh! Did you read where I mentioned predicates?
 
> Yes, this predicates are polled with all lock-free algorithms.
 
The predicate is polled in a condvar as well, yawn. Getting tired. Just
try to think about it some more. I will get back to you tomorrow. I have
some other work to do. Will but out Relacy. Its fun to use.
Bonita Montero <Bonita.Montero@gmail.com>: May 25 12:53PM +0200

>> I've shown that fork()ing is much more slower than to create a thread.
>> And even more a lot slower when pages are CoWd.
 
> For the use cases of fork the time of creation is irrelevant.
 
F.e. if you have a webserver that fork()s for every request,
that's relevant as in most other cases.
Bonita Montero <Bonita.Montero@gmail.com>: May 25 09:10AM +0200

> The predicate is polled in a condvar as well, yawn. ...
 
A condvar isn't a lock-free structure.
MrSpook_m0gtwlu6_g@j_cyo9l9.edu: May 25 09:05AM

On Mon, 24 May 2021 18:51:58 +0200
>means that plugins, extensions, tabs, etc., can crash without bringing
>down the whole browser. It costs - especially in ram - but you get
>benefits from it.
 
Yup. Also some browsers - eg firefox - started out using seperate processes
for seperate tabs but switched over to multithreading - presumably to make
porting to Windows easier - with the consequent drop in reliability. Now
some have gone back to multi process which also has the benefit of making
some kind of javascript hacks difficult if not impossible if each tab is a
seperate process.
Bonita Montero <Bonita.Montero@gmail.com>: May 25 12:54PM +0200

>> thread( func, params ... ).detach() and forget the thread.
 
> What a pity I can't find some ascii art of someone with their head in their
> hands.
 
The above is a trivial statement. And you won't have to have any
shared memory, you simply pass the parameters to the thread and
they're destructed when the thread ends. That's magnitudes more
convenient than fork() ing and having shared memory.
Bonita Montero <Bonita.Montero@gmail.com>: May 24 02:19PM +0200

> creating a thread. Creating a thread on my Linux Ryzen 7 1800X is about
> 17.000 clock cycles and I bet fork()ing is magnitudes higher if you
> include the costs of making copy-on-writes of every written page.
 
I just wrote a little program:
 
#include <unistd.h>
#include <iostream>
#include <chrono>
#include <cstdlib>
#include <cctype>
#include <cstring>
#include <cstdint>
 
using namespace std;
using namespace chrono;
 
int main( int argc, char **argv )
{
using sc_tp = time_point<steady_clock>;
if( argc < 2 )
return EXIT_FAILURE;
uint64_t k = 1'000 * (unsigned)atoi( argv[1] );
sc_tp start = steady_clock::now();
char buf[0x10000];
for( uint64_t n = 0; fork() == 0; ++n )
{
memset( buf, 0, sizeof buf );
if( n == k )
{
double ns = (double)(int64_t)duration_cast<nanoseconds>(
steady_clock::now() - start ).count() / (int64_t)k;
cout << ns << endl;
break;
}
}
}
 
On my Ryzen 7 1800X Linux PC it gives me a fork-cost of about
180.000 to 222.000 clock cycles per fork. Any questions ? I'm
doing a minimal writing in a 64kB-block (hopefully this isn't
optimized away) to simulate some copy-on-write. As you can see
forking is magnitudes slower than creating a thread (17.000
cylces on the same machine).
Bonita Montero <Bonita.Montero@gmail.com>: May 31 09:33AM +0200

> If here really was the wrong place then I would not have been
> able to get Kaz to come back to comp.theory by posting here.
 
No matter what you say - your halting-problem-idea are always
off-topic here.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: