- recovering from std::bad_alloc in std::string reserve - 1 Update
- lamda-optimization - 8 Updates
- Would D ever stop if simulating halt decider H never stopped simulating it? - 3 Updates
- std::thread does not follow RAII principles - 3 Updates
- "STL: Amazing Speed Differences between std::vector and std::set (Observed with an UndoRedoAction)" - 1 Update
| "daniel...@gmail.com" <danielaparker@gmail.com>: May 30 03:15PM -0700 On Saturday, May 29, 2021 at 5:47:50 AM UTC-4, David Brown wrote: > The definition of intmax_t is a problem - it is a limitation for integer > types in C and C++. Hopefully eventually deprecate intmax_t. One proposal is to make intmax_t mean int64_t, and leave it at that. Have no requirement that integer types can't be larger. No more ABI problem. > type-generic macros in C and template functions in C++. From C90 there > was "abs" and "labs" - C99 could have skipped "llabs" and "imaxabs", and > similar functions. Yes, of course, and to_integer<T> and from_integer<T>, and others. Many libraries have to reinvent their own version of these things. > The gcc solution of __int128 covers most purposes without affecting > backwards compatibility. Hardly "most purposes", far from it. Without compiling with "-std=gnu++11", you don't even have std::numeric_limits<__int128>. The absence of standard support for int128_t makes genericity much harder. While other languages such as rust with better type support see rapid growth of open source libraries that cover all manner of data interchange standards, C++ is comparatively stagnant. Daniel |
| Bonita Montero <Bonita.Montero@gmail.com>: May 30 01:18PM +0200 I've just checked: #include <iostream> using namespace std; int main() { int i, j, k; auto f = [&]() -> int { return i + j + k; }; cout << sizeof f << endl; } Can anyone tell me whether the lambda has three pointers (24 bytes on 64 bit systems) instead of just one pointer inside the stack-frame, which could be an easy optimization ? |
| "Öö Tiib" <ootiib@hot.ee>: May 30 10:00AM -0700 On Sunday, 30 May 2021 at 14:18:20 UTC+3, Bonita Montero wrote: > Can anyone tell me whether the lambda has three pointers (24 bytes > on 64 bit systems) instead of just one pointer inside the stack-frame, > which could be an easy optimization ? C++ standard does not require any optimizations there so the question is about quality of implementation (QOI). QOI questions do not make sense without mentioning implementation name and version. I suspect most compilers are turning your program into equivalent of int main() { std::cout << 42 << std::endl; } The 42 there being unlikely but valid by standard. Even if you used it then most compilers would probably inline call of it and so the number would be again meaningless. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 30 07:20PM +0200 > version. > I suspect most compilers are turning your program into equivalent of > int main() { std::cout << 42 << std::endl; } I think you don't understand what I'm asking for. > The 42 there being unlikely but valid by standard. No, 24. And my question is why the compiler doesn't do the simple optimization of storing just a single pointer inside the stackframe inside the lambda-object. That would result in less memory-accesses and it would save registers. |
| Real Troll <real.troll@trolls.com>: May 30 06:23PM +0100 On 30/05/2021 12:18, Bonita Montero wrote: > Can anyone tell me whether the lambda has three pointers (24 bytes > on 64 bit systems) instead of just one pointer inside the stack-frame, > which could be an easy optimization ? Yes it is 24 on 64 bit machine; Compiled with VS2019 using > C:\Users\*******\Documents\cmdLine\cpp>cl /EHsc lambda01.cpp > Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29915 for x64 > Copyright (C) Microsoft Corporation. All rights reserved. German Manager helped by a German player won the Champions League for Chelsea ( a UK team) in Portugal!!! How wonderful it is. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 30 07:34PM +0200 > Yes it is 24 on 64 bit machine; Compiled with VS2019 using >> C:\Users\*******\Documents\cmdLine\cpp>cl /EHsc lambda01.cpp It isn't different with -Ox, but it could have been. |
| David Brown <david.brown@hesbynett.no>: May 30 08:31PM +0200 On 30/05/2021 13:18, Bonita Montero wrote: > Can anyone tell me whether the lambda has three pointers (24 bytes > on 64 bit systems) instead of just one pointer inside the stack-frame, > which could be an easy optimization ? The question doesn't really make sense. When optimising, a compiler will not generate anything for any of the variables or the lambda. When looking at this kind of thing, I like to use the online compiler at <https://godbolt.org> and look at the assembly. This is easier if you don't try to generate printed output: int foo() { int i, j, k; auto f = [&]() { return i + j + k; }; return sizeof(f); } gcc generates: foo(): movl $24, %eax ret So the compiler is trying to give you the size of storage it would need in general for a lambda that took three references. But since the optimised lambda is entirely removed, it is not the actual size of "f" in the optimised code. AFAIK the standard doesn't say anything about what size lambdas should be, or anything else about their types. But I would guess that compilers try to give consistent results for the sizeof of a lambda regardless of the optimisation or details of the implementation. |
| Bonita Montero <Bonita.Montero@gmail.com>: May 30 08:46PM +0200 > in general for a lambda that took three references. But since the > optimised lambda is entirely removed, it is not the actual size of "f" > in the optimised code. The compiler could also give the size of an optimized lambda. Consider this: #include <iostream> #include <functional> using namespace std; int main() { int i = 123, j = 456, k = 789; auto f = [&]() -> int { return i + j + k; }; function<int()> ff = f; function<int()> *volatile pFf = &ff; cout << sizeof f << " " << (*pFf)() << endl; } I pack f into a function<> and then I assign the address to a volatile pointer to prevent any optimizations on calling the function-object. So according to what you suggesst the compiler would have the chance to optimize the three references - but it doesn't. |
| "Öö Tiib" <ootiib@hot.ee>: May 30 12:26PM -0700 On Sunday, 30 May 2021 at 21:46:22 UTC+3, Bonita Montero wrote: > function<int()> *volatile pFf = &ff; > cout << sizeof f << " " << (*pFf)() << endl; > } Looks like horrible pile of garbage that still does nothing. > pointer to prevent any optimizations on calling the function-object. > So according to what you suggesst the compiler would have the chance > to optimize the three references - but it doesn't. If some optimization in some compiler of some feature is missing and so your resulting program is slow then you should write less garbage code that compiler can't optimize. Or you can take source code of compiler, implement the optimization you need and put up a pull request. |
| olcott <NoOne@NoWhere.com>: May 29 06:32PM -0500 I am cross-posting this to comp.lang.c and comp.lang.c++ because any C/C++ professional can correctly answer it and the code is written in C. #define u32 uint32_t int Simulate(u32 P, u32 I) { ((void(*)(u32))P)(I); return 1; } int D(u32 P) { if ( H(P, P) ) return 0; return 1; } int main() { H((u32)D, (u32)D); } H is simulating partial halt decider based on an x86 emulator. Its input is the machine address of a C function that has been cast to 32-bit unsigned integer. H simulates its first parameter on the input of its second parameter. In the above case H would simulate D(D). -- Copyright 2021 Pete Olcott "Great spirits have always encountered violent opposition from mediocre minds." Einstein |
| Bonita Montero <Bonita.Montero@gmail.com>: May 30 05:15AM +0200 STOP POSTING in comp.lang.c/c++. |
| red floyd <no.spam.here@its.invalid>: May 30 12:03PM -0700 On 5/29/2021 8:15 PM, Bonita Montero wrote: > STOP POSTING in comp.lang.c/c++. He's obviously not going to stop, so just killfile the idiot. |
| "Öö Tiib" <ootiib@hot.ee>: May 29 07:03PM -0700 On Saturday, 29 May 2021 at 20:15:56 UTC+3, Chris Vine wrote: > a cancellation by including a catch-all in your checking function which > logs that cancellation has begun and such other state as is available > to it to record, and then rethrows. Sorry for my bad English. I meant the thread to complete its work early or to cancel it. The flags are for that purpose. I check those in known places and so repeating same test the total count of such checks is same from run to run. > One happy outcome of using POSIX > functions is that the only exception-like thing they can emit is a > cancellation pseudo-exception. But perhaps you meant something else. So you suggest I can mock the POSIX functions to work like always but then throw sometimes something unusual for testing? It is plan but feels like quite lot of work compared to mocking the flag checking. > select() will signal it as ready for reading), and polled a flag on a > select timeout, but just cancelling it proved much easier and more > obvious. OK, but how it is easier and more obvious? Indicating with flags and letting the running work to decide itself where and how to complete early feels most obvious split of responsibilities. Otherwise the canceling thread has to know and monitor the work progress details of threads that it can potentially cancel. That feels fragile and risky. > I can recall using it to kill a thread waiting on > pthread_join() but I cannot now remember the reasons why. It can be the thread it was joining had gone insane and hung. I prefer to abort whole process or power-cycle devices on such cases if possible. That guarantees the programming errors are fixed quickest and so insane processes and trashy devices do least damage to customers. But I've met it more on Windows where some closed source garbage I'm forced to use does hang. |
| Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 30 11:48AM +0100 On Sat, 29 May 2021 19:03:08 -0700 (PDT) > On Saturday, 29 May 2021 at 20:15:56 UTC+3, Chris Vine wrote: > > On Sat, 29 May 2021 07:59:02 -0700 (PDT) > > Öö Tiib <oot...@hot.ee> wrote: [snip] > So you suggest I can mock the POSIX functions to work like always but > then throw sometimes something unusual for testing? It is plan but feels > like quite lot of work compared to mocking the flag checking. You want to know when and how many times a cancellation "request" (ie flag change) has been made in respect of a mocked version of a blocking function (that is, a function blocking on some event and/or a quit flag request), possibly without carrying out any cancellation/quitting? The "possibly without carrying out any cancellation/quitting" makes such mocking unfeasible with thread cancellation, since once cancellation has started you can catch and rethrow it (and instrument and count that) but you cannot stop it. For that you would have to mock the function which applies pthread_cancel instead. > early feels most obvious split of responsibilities. Otherwise the > canceling thread has to know and monitor the work progress details > of threads that it can potentially cancel. That feels fragile and risky. The cancelling thread doesn't need to know any of the work details of the thread which is accepting connections. With deferred cancellation the accepting thread is master of its own cancellation and (in the case I have in mind) allows cancellation only when applying accept(), so disallowing cancellation whenever accept() returns with a new connection. Once a new connection occurs it completes the establishment of the connection and hands off the new connection socket for another thread to deal with in the normal way, during which time it is uncancellable. When it has completed the hand off unhindered it loops back to accept and makes itself cancellable again, and so on. It would work the same way as having the accepting thread checking a flag via a timeout and killing itself with an exception or in some other way, but with much less faff. The simple scheme I have described works on the basis that you may want to terminate the accepting thread but not any thread(s) still dealing with previously established connections. That may not always be what you want but since every thread is master of its own cancellation, you can arrange for the termination of other threads to occur in any way you want, and (at the points where cancellation is allowed) you can save work, clean-up, log etc. in an appropriate catch-all block which rethrows when it has done the saving and clean-up. If the thread(s) handling previously established connections are doing so asynchronously via an event loop, you probably wouldn't use cancellation at all: you would bring the relevant event loop(s) maintained by those threads to an end. (That wasn't the position in the case I dealt with, but I can easily imagine it could be.) |
| "Öö Tiib" <ootiib@hot.ee>: May 30 09:35AM -0700 On Sunday, 30 May 2021 at 13:48:55 UTC+3, Chris Vine wrote: > flag change) has been made in respect of a mocked version of a blocking > function (that is, a function blocking on some event and/or a quit flag > request), possibly without carrying out any cancellation/quitting? Maybe. Idiomatic case is that we want a work to end early as it has become obsolete. Other idiomatic case is that we want thread to stop early as whatever work it can possibly do has became obsolete. With flags we can indicate that such or other case is now and then have ending the work or stopping the thread early. Also we want to test everything. By mocking flag checking functions we can test it with granularity of one check. If only sole work or whole thread is stopped early does not matter to the question as we are testing outcome of such abrupt stop. If it didn't stop then we have defect. > has started you can catch and rethrow it (and instrument and count that) > but you cannot stop it. For that you would have to mock the function > which applies pthread_cancel instead. So we have to have both flags and cancellation in place when same thread is reused for different works? Or maybe flags and then if flags do not work then cancellation as wheelchair to defective program? > would work the same way as having the accepting thread checking a flag > via a timeout and killing itself with an exception or in some other > way, but with much less faff. Sounds like some kind of master of itself suicide pattern. It is very interesting as I've never needed that thing. What happens when it kills itself? Does it signal someone to join it? > want, and (at the points where cancellation is allowed) you can save > work, clean-up, log etc. in an appropriate catch-all block which > rethrows when it has done the saving and clean-up. Ok so I can have the dead corpse of responsible removed. How to try to figure what happened? > would bring the relevant event loop(s) maintained by those threads to > an end. (That wasn't the position in the case I dealt with, but I can > easily imagine it could be.) Threads are doing anything that can take time, anything that can take time may become obsolete before it is completed. It can be let to run to end and then obsolete product discarded or it may be stopped early as performance optimization. Network communication is usually behind bottle neck of available network cards and as our processors are typically tremendously quicker just one thread can easily handle all communication going through one network card. |
| Juha Nieminen <nospam@thanks.invalid>: May 30 06:40AM >> each other) traversing set is awfully slow. > A vector isn't cache-friendly either when you do a binary search. > Random-access memory-accesses are always slow. He was talking about traversing the set, not searching it. In other words, for(auto& element: theSet). (This, of course, assuming that the amount of data is so large that it won't fit entirely even in the L3 cache. Or, if we are just traversing the set for the first time since all of its contents have been flushed from the caches.) Of course even with std::vector it depends on the size of the element. Traversing a (very large) std::vector linearly from beginning to end isn't magically going to be very fast either, if each element is large enough. And "large enough" is actually quite small. If I remember correctly, cache line sizes are typically 64 bytes or so. This means that if the vector element type is an object of size 64 bytes or more, and you are accessing just one member variable of each object, then you'll get no benefit from linear traversal compared to random access (in the case that the contents of the vector are not already in the caches). You only get a speed advantage for (very large) vectors which element size is very small, like 4 or 8 bytes. For example, if the vector represents a bitmap image, with each "pixel" element taking eg. 4 bytes, then a linear traversal will be quite efficient (assuming none of the vector contents were in any cache to begin with, you'll get an extremely heavy cache miss only each 16 pixels.) Of course almost none of this applies if the vector or set is small enough to fit in L1 cache, and it has already been loaded in there in its entirety previously. Then none of this matters almost at all. It starts mattering a bit more if the vector is too large for L1 but small enough for L2 cache, and furthermore if it's too large for L2 but small enough for L3 cache. Modern CPUs tend to have a quite large L3 cache, which mitigates the problems of cache misses in many instances. For example my CPU has a L3 cache of 12 MB. Thus if I need to, for example, do some operations repeatedly to, let's say, an image that fits comfortably within those 12 MB, then it will be very fast. It's only when the dataset is much larger than L3 that cache locality really starts having a very pronounced effect (when performing operations repeatedly on the entire dataset). |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment