- std::thread does not follow RAII principles - 2 Updates
- recovering from std::bad_alloc in std::string reserve - 8 Updates
- A single linked list using a futex... - 3 Updates
- Niuce C++14-feature - 10 Updates
- Would D ever stop if simulating halt decider H never stopped simulating it? - 1 Update
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 27 07:00PM +0100 On Thu, 27 May 2021 19:32:58 +0200 > design very complex if you really want to make it interruptible at > every time while processing a call. That's not necessary if you'd > design it not that stupid. Interrupts were necessary with single-threaded programs. They are not usually necessary with multi-threaded programs and POSIX provides everything you need in consequence. You block the signals of interest in all your threads and then have a single handler thread with blocks on sigwait() and deals with the signals synchronously (synchronously for the handler thread that is). Job done. One other thing POSIX provides and C++11 doesn't is thread cancellation. Some programmers who don't have relevant experience think thread cancellation is a bad idea because it allows arbitrary cancellation which cannot be adequately controlled (the "combing your hair with a fork" jibe). Whilst that is true with windows it absolutely isn't with POSIX. What you do is use deferred cancellation (the default in POSIX) and block cancellation as normal policy, and only enable cancellation at defined points in the code which are able to deal with it (normally where a wait is to occur). Done properly, thread cancellation is far easier to use than exceptions, which can jump out of anywhere if you include std::bad_alloc in the mix. C++11 threads don't deal with cancellation and will never be able to do so because they require OS support - in this case POSIX support. |
| Paavo Helde <myfirstname@osa.pri.ee>: May 26 05:35PM +0300 26.05.2021 14:51 Juha Nieminen kirjutas: > the program will still crash, because std::thread does not like being > destroyed without being joined or detached, so it just terminates if that's > the case. std::thread is based on boost::thread which automatically detaches in the destructor. Alas, this would create a rampant thread which cannot be joined any more. I guess when it got standardized, they felt that such automatic detach is no good, but did not dare to enforce automatic join either, by some reason. So they chose to std::terminate() which is the worst of them all IMO. See https://isocpp.org/files/papers/p0206r0.html (Discussion about std::thread and RAII). |
| Juha Nieminen <nospam@thanks.invalid>: May 25 05:16AM > while ((num = fread (buffer, sizeof (char), sizeof (buffer) - 1, > pOutputFile)) > 0) The standard mandates that sizeof(char) is 1. It cannot have any other value. > { > // nothing to catch since any error causes this code to bypass > } If you are catching a memory allocation, why not catch it explicitly and return an error code or print an informative error message or something that indicates what happened, rather than silently failing and doing nothing? try { // your code here } catch(const std::bad_alloc& e) { // Could do, for example: std::cout << "Memory allocation failed: " << e.what() << "\n"; } catch(...) { std::cout << "Unknown excpetion thrown while trying to read file\n"; } |
| Lynn McGuire <lynnmcguire5@gmail.com>: May 27 02:08PM -0500 On 5/27/2021 12:50 AM, Christian Gollwitzer wrote: > size_t for unsigned or ptrdiff_t for signed. That will correspond to a > 32bit integer in 32 bit and a 64 bit integer in 64 bit. > Christian Done. With checking against SIZE_MAX before casting the variable to size_t. Yeah, if 1 GB is having trouble in Win32, 2+ GB will be much worse. The code is now ok to fail without crashing the program. Porting to Win64 is needed in the near future. So many thing to do, so little time. I will be 61 in a couple of weeks, kinda hoping to retire before 75. Thanks, Lynn |
| Lynn McGuire <lynnmcguire5@gmail.com>: May 25 12:46PM -0500 On 5/25/2021 12:04 PM, Paavo Helde wrote: > OP was a bit unclear in this point. But there would be no point to have > a 32-bit Windows 7 on a machine with 16 GB RAM, so I hope he has got a > 64-bit OS after all. Yes, I have Windows 7 x64 Pro. I cannot convert to win64 at this time. When we do convert, it will be a steep hill as we started this code in Win16 in 1987. The Win32 port was a very steep hill in 2000. Lynn |
| Lynn McGuire <lynnmcguire5@gmail.com>: May 25 12:49PM -0500 On 5/25/2021 12:16 AM, Juha Nieminen wrote: > { > std::cout << "Unknown excpetion thrown while trying to read file\n"; > } Thanks, I did not know the code for catching the bad_alloc explicitly. Lynn |
| MrSpook_Ann1u@d0v_5eh1bqgd.com: May 25 04:10PM On Tue, 25 May 2021 18:42:54 +0300 >> as real memory or as configured backing store (i.e. swap space). >For fitting a 900 MB std::string into a memory one needs 900 MB of >contiguous address space and Bonita is right in pointing out this might Why? Only the virtual memory address space needs to be contiguous, the real memory pages storing the string could be all over the place. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 26 06:16AM +0200 Maybe it would be an idea to process your file in pieces ? |
| Christian Gollwitzer <auriocus@gmx.de>: May 26 08:36AM +0200 Am 25.05.21 um 19:20 schrieb Lynn McGuire: > not funny. My calculation engine is 850,000 lines of F77 and about > 20,000 lines of C++ but it is still portable to the Unix boxen, probably > mainframes too if any engineers ran them anymore. Have you actually tried to recompile in 64bit? The Win32-API is the same AFAIUI. Are you casting pointers to ints/longs? I haven't ported million-LOC programs to 64bit, but in smaller projects there was surprisingly little to do to make it work. I have no idea what goes wrong when you link to F77, though. Christian |
| Bonita Montero <Bonita.Montero@gmail.com>: May 26 09:22AM +0200 > Things would have to be pretty badly FUBARed for the VM to run out of virtual > memory address space on a 64 bit system given the 16 exabyte max size! Actually AMD64 supports "only" 48 bit page-tables where the lower 47 bit fall into user-space. Intel 64 has a recent change to 56 bit page -tables, but I think that's rather for systems with large file-mappings. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 26 08:17AM +0200 I'm too lazy to read your code. > #define CT_L2_ALIGNMENT 128 But this is a false assumption. Except from the Intel Pentium 4 and the IBM POWER's L4-caches (embedded DRAM) there are AFAIK no caches that have 128 byte cacheline size. 128 byte cachelines have been pro- ven to be inefficient below a certain cache-size because the cache- lines become used too partitially as there's a too less reusage-be- haviour of the working-set. Better use this: https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 07:03PM -0700 Fwiw, here is some crude Windows code I just coded up for a single linked list using a futex. It only supports push and flush operations for now. flush will wait if the stack is empty. I was to lazy to implement an ABA counter. However, it shows an interesting way to use a futex for a lock-free algorihtm. Can you get it to run? Thanks. __________________________________________________ // Futex Single Linked List by Chris M. Thomasson //___________________________________________________ #include <iostream> #include <thread> #include <vector> #include <functional> #include <cassert> #define WIN32_LEAN_AND_MEAN #include <Windows.h> #define CT_L2_ALIGNMENT 128 #define CT_ITERS 6666666 #define CT_NODES 42 #define CT_WAITBIT 0x1UL static LONG g_memory_allocations = 0; static LONG g_memory_deallocations = 0; static LONG g_futex_signals = 0; static LONG g_futex_waits = 0; struct ct_node { ct_node* m_next; ct_node() : m_next(NULL) { InterlockedAdd(&g_memory_allocations, 1); } ~ct_node() { InterlockedAdd(&g_memory_deallocations, 1); } }; #define CT_NODE_SET_WAITBIT(mp_ptr) ((ct_node*)(((ULONG_PTR)(mp_ptr)) | CT_WAITBIT)) #define CT_NODE_CHECK_WAITBIT(mp_ptr) (((ULONG_PTR)(mp_ptr)) & CT_WAITBIT) #define CT_NODE_CLEAR_WAITBIT(mp_ptr) ((ct_node*)(((ULONG_PTR)(mp_ptr)) & ~CT_WAITBIT)) void ct_node_flush(ct_node* node) { while (node) { ct_node* next = node->m_next; delete node; node = next; } } struct ct_futex_slist { ct_node* alignas(CT_L2_ALIGNMENT) m_head; ct_futex_slist() : m_head(nullptr) { } void push(ct_node* node) { ct_node* head = m_head; for (;;) { ct_node* xchg = CT_NODE_CLEAR_WAITBIT(head); node->m_next = xchg; ct_node* ret = (ct_node*)InterlockedCompareExchangePointer((PVOID*)&m_head, node, head); if (ret == head) { if (CT_NODE_CHECK_WAITBIT(ret)) { InterlockedAdd(&g_futex_signals, 1); WakeByAddressSingle(&m_head); } return; } head = ret; } } ct_node* flush() { ct_node* head_raw = (ct_node*)InterlockedExchangePointer((PVOID*)&m_head, NULL); ct_node* head = CT_NODE_CLEAR_WAITBIT(head_raw); if (! head) { for (;;) { head_raw = (ct_node*)InterlockedExchangePointer((PVOID*)&m_head, (ct_node*)CT_WAITBIT); head = CT_NODE_CLEAR_WAITBIT(head_raw); if (head) { break; } InterlockedAdd(&g_futex_waits, 1); ct_node* waitbit = (ct_node*)CT_WAITBIT; WaitOnAddress(&m_head, &waitbit, sizeof(PVOID), INFINITE); } } return head; } }; struct ct_shared { ct_futex_slist m_slist; ~ct_shared() { ct_node* head_raw = m_slist.m_head; if (CT_NODE_CHECK_WAITBIT(head_raw)) { std::cout << "\n\nWAITBIT LEAK!\n"; } ct_node_flush(head_raw); if (g_memory_allocations != g_memory_deallocations) { std::cout << "\n\nMEMORY LEAK!\n"; } std::cout << "\ng_memory_allocations = " << g_memory_allocations << "\n"; std::cout << "g_memory_deallocations = " << g_memory_deallocations << "\n"; std::cout << "g_futex_waits = " << g_futex_waits << "\n"; std::cout << "g_futex_signals = " << g_futex_signals << "\n"; } }; void ct_thread(ct_shared& shared) { for (unsigned long i = 0; i < CT_ITERS; ++i) { for (unsigned long n = 0; n < CT_NODES; ++n) { shared.m_slist.push(new ct_node()); } ct_node* node = shared.m_slist.flush(); ct_node_flush(node); } shared.m_slist.push(new ct_node()); } int main() { unsigned int threads_n = std::thread::hardware_concurrency(); std::vector<std::thread> threads(threads_n); std::cout << "Futex Single Linked List by Chris M. Thomasson\n\n"; std::cout << "Launching " << threads_n << " threads...\n"; std::cout.flush(); { ct_shared shared; for (unsigned long i = 0; i < threads_n; ++i) { threads[i] = std::thread(ct_thread, std::ref(shared)); } std::cout << "Processing...\n"; std::cout.flush(); for (unsigned long i = 0; i < threads_n; ++i) { threads[i].join(); } } std::cout << "\nCompleted!\n"; return 0; } __________________________________________________ Here is my output: __________________________________________________ Futex Single Linked List by Chris M. Thomasson Launching 4 threads... Processing... g_memory_allocations = 1119999892 g_memory_deallocations = 1119999892 g_futex_waits = 21965 g_futex_signals = 55630 Completed! __________________________________________________ |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 11:21PM -0700 On 5/25/2021 11:17 PM, Bonita Montero wrote: > haviour of the working-set. > Better use this: > https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size Yes, you are correct. std::hardware_constructive_interference_size is the way to go. Thanks for taking a look at my code to begin with Bonita! :^) |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:32AM +0200 > I can give you a generic waiting thing called an, oh wait. I want > to see if you can do it for yourself. I can encrypt my answer.... You're stupid. We're talking about whether a lock-free queue is possible without polling. And it is obviously not; if it wouldn't be, it wouldn't be lock-free. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:30AM +0200 >> an item in the queue the only opportunity that is left is polling. > You are missing things again. Think! Remember back when you said that > mutexes must use CAS? I never said that because I implemended mutex with LOCK XADD a long time ago. And this has nothing to do with that you don't understand that lock-free programming isn't possible without polling. Describe me a lock-free algorithm that works without polling ! |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 24 11:33PM -0700 On 5/24/2021 11:32 PM, Bonita Montero wrote: >> to see if you can do it for yourself. I can encrypt my answer.... > You're stupid. We're talking about whether a lock-free queue > is possible without polling. And it is obviously not Why poll, when we can check the predicate then wait? ; if it |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 08:36AM +0200 >> that lock-free programming isn't possible without polling. >> Describe me a lock-free algorithm that works without polling ! > http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=27becd4634e98df52a89d06b2906ce8a7fdf24968475da63b3d57719e7eb01d4ab5f0e28d0a76c73883661af6abd0514412e774530b3af251b118a66f38c521f1bb4c5652c303411c9de7606f4690777a997ad6d65ac625c456673216ae073a067fd34e5f37010811231569824a22be1f5ab1ee525a30626b87986672f Have you ever been checked by a mental health doctor ? That has nothing to do with what we're talking about. |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: May 25 12:07AM -0700 On 5/25/2021 12:05 AM, Bonita Montero wrote: >>> Otherwise they woudln't be lock-free. >> Oh argh! Did you read where I mentioned predicates? > Yes, this predicates are polled with all lock-free algorithms. The predicate is polled in a condvar as well, yawn. Getting tired. Just try to think about it some more. I will get back to you tomorrow. I have some other work to do. Will but out Relacy. Its fun to use. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 12:53PM +0200 >> I've shown that fork()ing is much more slower than to create a thread. >> And even more a lot slower when pages are CoWd. > For the use cases of fork the time of creation is irrelevant. F.e. if you have a webserver that fork()s for every request, that's relevant as in most other cases. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 09:10AM +0200 > The predicate is polled in a condvar as well, yawn. ... A condvar isn't a lock-free structure. |
| MrSpook_m0gtwlu6_g@j_cyo9l9.edu: May 25 09:05AM On Mon, 24 May 2021 18:51:58 +0200 >means that plugins, extensions, tabs, etc., can crash without bringing >down the whole browser. It costs - especially in ram - but you get >benefits from it. Yup. Also some browsers - eg firefox - started out using seperate processes for seperate tabs but switched over to multithreading - presumably to make porting to Windows easier - with the consequent drop in reliability. Now some have gone back to multi process which also has the benefit of making some kind of javascript hacks difficult if not impossible if each tab is a seperate process. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 25 12:54PM +0200 >> thread( func, params ... ).detach() and forget the thread. > What a pity I can't find some ascii art of someone with their head in their > hands. The above is a trivial statement. And you won't have to have any shared memory, you simply pass the parameters to the thread and they're destructed when the thread ends. That's magnitudes more convenient than fork() ing and having shared memory. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 24 02:19PM +0200 > creating a thread. Creating a thread on my Linux Ryzen 7 1800X is about > 17.000 clock cycles and I bet fork()ing is magnitudes higher if you > include the costs of making copy-on-writes of every written page. I just wrote a little program: #include <unistd.h> #include <iostream> #include <chrono> #include <cstdlib> #include <cctype> #include <cstring> #include <cstdint> using namespace std; using namespace chrono; int main( int argc, char **argv ) { using sc_tp = time_point<steady_clock>; if( argc < 2 ) return EXIT_FAILURE; uint64_t k = 1'000 * (unsigned)atoi( argv[1] ); sc_tp start = steady_clock::now(); char buf[0x10000]; for( uint64_t n = 0; fork() == 0; ++n ) { memset( buf, 0, sizeof buf ); if( n == k ) { double ns = (double)(int64_t)duration_cast<nanoseconds>( steady_clock::now() - start ).count() / (int64_t)k; cout << ns << endl; break; } } } On my Ryzen 7 1800X Linux PC it gives me a fork-cost of about 180.000 to 222.000 clock cycles per fork. Any questions ? I'm doing a minimal writing in a 64kB-block (hopefully this isn't optimized away) to simulate some copy-on-write. As you can see forking is magnitudes slower than creating a thread (17.000 cylces on the same machine). |
| Bonita Montero <Bonita.Montero@gmail.com>: May 31 09:33AM +0200 > If here really was the wrong place then I would not have been > able to get Kaz to come back to comp.theory by posting here. No matter what you say - your halting-problem-idea are always off-topic here. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment