- The unbelievable slowness of larger int types (in my experiment) - 9 Updates
- We've won! - 13 Updates
- Hateful abusive posts by Leigh Johnston, round #2 - 3 Updates
Juha Nieminen <nospam@thanks.invalid>: Nov 16 12:22PM > This took 2.5 seconds. However, 10 million is small enough that you can > replace 1ull by 1 with no problems. This cut the time from 2.5 seconds > to 0.98 seconds, and I did this experiment many times. I haven't checked, but when creating such benchmarks, you should *always* make sure and check that the compiler isn't actually optimizing code away because the result is unused. It's a very typical mistake to make. Just very recently I had a conversation with someone in another forum who thought that his O(n^2) linked list traversal was faster than who you traverse std::list with iterators... only to find out that the compiler was optimizing his code out completely because it noticed that the end result was not being used for anything. (Not saying you have made such a mistake. Just pointing out that you should make sure that's not what's happening.) One way to check what the compiler is doing is to make it output in asm and examine the resulting code. (Of course this requires a modicum of understanding of asm.) At the very least you should always do something with the end result (such as printing it, even if you aren't interested in what the end result actually is). Even then, there can still be situations where the compiler is overly smart and optimizing code away that shouldn't be, if it's optimizing away the code you are trying to benchmark the speed of. |
bitrex <user@example.net>: Nov 16 01:09AM -0500 On 11/15/2018 03:35 PM, Paul wrote: > ++count; > return count; > } why not use Compiler Explorer: <https://godbolt.org/> and write your code there and you can see precisely what it generates for many compiler versions and many architectures |
Paul <pepstein5@gmail.com>: Nov 17 01:25PM -0800 On Saturday, November 17, 2018 at 9:17:31 PM UTC, Ben Bacarisse wrote: > return count; > It may of course be slower, but when you have only a few bits set it may > pay off. I think this trick, popularised (or maybe even invented) by Kernighan is known to be much faster in general than naive bit by bit methods. It's a better big O. Your method is O(number of bits set) which is better than O(log n). It doesn't need to have "only a few bits set" to pay off. It pays off in general. It might be worse in cases where almost all the bits are set -- I haven't tested. Paul |
Paul <pepstein5@gmail.com>: Nov 17 01:20PM -0800 > return count; > } > FWIW, GCC generates a rather shorter loop for that too. Your code is significantly faster than mine and avoids the problematic comparison in my code. The reason I coded it the way I did is that I was looking for the most direct approach so I said "What's happening in the kth binary place" So this led to me introducing 1 << place. x/=2; seems to execute faster than x >>= 1; so maybe a problem with my code is that bit shifting isn't as fast as the alternatives. I'm not sure why my code is so much slower than yours -- it seems to take about 1.5 times as long to execute. Paul |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 17 09:17PM >>> On Thu, 15 Nov 2018 12:35:46 -0800 (PST), Paul <pepstein5@gmail.com> <snip> >>> > ++count; >>> > return count; >>> > } <snip> > return count; > } > FWIW, GCC generates a rather shorter loop for that too. There's a neat way to clear the least significant set bit: int count = 0; for (; x; count++) x &= x - 1; return count; It may of course be slower, but when you have only a few bits set it may pay off. -- Ben. |
Robert Wessel <robertwessel2@yahoo.com>: Nov 17 09:38AM -0600 On Fri, 16 Nov 2018 00:15:53 -0800 (PST), Paul <pepstein5@gmail.com> wrote: >I'm going to attempt to use these instructions to fix the issue: >https://dev.my-gate.net/2014/09/15/how-to-use-mingw-w64-with-clion/ >Please let me know if you have anything else to add. If you want a short, bit-by-bit, algorithm, why not just do the more conventional basic approach or repeatedly shifting x right? Note that I've changed the type to unsigned. Not tested, etc: int popcnt(unsigned x) { int count = 0; while (x) { count += x & 1; x /= 2; } return count; } FWIW, GCC generates a rather shorter loop for that too. |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 17 04:15PM On Thu, 2018-11-15, Paul wrote: > This explains why it wasn't all optimized away. > I changed to -O3 optimization. I'm not sure whether this is best > or what you would advise (?). With gcc I tend to use -O2 or -Os. Sometimes -O3, if I can show that it generates measurably faster code. But the main thing is enabling the optimizer at all. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Paul <pepstein5@gmail.com>: Nov 16 12:17AM -0800 On Friday, November 16, 2018 at 6:10:00 AM UTC, bitrex wrote: > <https://godbolt.org/> > and write your code there and you can see precisely what it generates > for many compiler versions and many architectures Thanks, good tip. Paul |
Paul <pepstein5@gmail.com>: Nov 16 12:15AM -0800 > performance difference on most processors. Are you by any chance > building 32-bit code? That will show a significant slowdown when > using the 64-bit type. I'm sure that building 32-bit code is the problem. Thanks for pointing this out. I'm using gcc with CLion. I'm going to attempt to use these instructions to fix the issue: https://dev.my-gate.net/2014/09/15/how-to-use-mingw-w64-with-clion/ Please let me know if you have anything else to add. Paul |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 23 02:51AM -0500 Chris M. Thomasson wrote: > Well, there was a problem with a GCC optimization that broke a POSIX mutex. It > was an old post on comp.programming.threads: > https://groups.google.com/d/topic/comp.programming.threads/Y_Y2DZOWErM/discussion I have never read that discussion before and got interested by your mentioning of it. I found the claims of people who believed that acquiring POSIX mutex is somehow supposed to protect some "associated" variable highly dubious. Just in case, I double-checked and could not find any such guarantee in my copy of POSIX standard (The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition). AFAIK (and I has been using POSIX synchronization primitives for decades), the only thing that successful locking of a posix mutex by a thread guarantees is that no other thread has or will have that mutex locked until the first thread releases it. How to use this guarantee is entirely up to the programmer. In particular, I believe that the OP's code was buggy if the intention was to maintain a total count of successful locks by all threads in `acquires_count' variable, exactly because the programmer did not take responsibility for memory access sequencing. Respectively, IMHO the code optimization by GCC was absolutely legal. |
David Brown <david.brown@hesbynett.no>: Nov 23 10:02AM +0100 On 23/11/18 08:51, Pavel wrote: > variable, exactly because the programmer did not take responsibility for memory > access sequencing. Respectively, IMHO the code optimization by GCC was > absolutely legal. I agree. C has sequence points for /logical/ ordering of operations according to the abstract machine, but these do not need to be followed in the actual generated code. Only observable behaviour needs to follow a specific order (until the 5.1.2.4 "Multi-threaded execution" was introduced in C11). So with the "trylock" function as given, the compiler can't rely on the value of "acquires_count" read before the call to "ptrhead_mutex_trylock", because that call may change the global variable. However, after the call returns, the compiler knows it has full control of the system, and that it can do as it likes with any non-volatile variables here. It is free to read acquires_count before checking res, and it is free to write it as it wants. It can read it and write it as many times as it likes, with whatever values, regardless of "res", as long as the last value written by end of the function matches the value expected by the abstract machine. int trylock() { int res; res = pthread_mutex_trylock(&mutex); if (res == 0) ++acquires_count; return res; } The answer, of course, is to use "volatile" (or now "atomic") accesses on acquires_count. Alternatively, OS-specific or compiler specific equivalents can be used, like __sync_fetch_and_add, memory barriers, etc. |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 23 01:23AM -0800 On 11/23/2018 1:02 AM, David Brown wrote: >>>>>>>>> On 2018-11-17, Vir Campestris <vir.campestris@invalid.invalid> wrote: >>>>>>>>>> On 16/11/2018 22:09, Melzzzzz wrote: >>>>>>>>>>> On 2018-11-16, Vir Campestris <vir.campestris@invalid.invalid> wrote: [...] > The answer, of course, is to use "volatile" (or now "atomic") accesses > on acquires_count. Alternatively, OS-specific or compiler specific > equivalents can be used, like __sync_fetch_and_add, memory barriers, etc. Well, wrt C11 and mtx_trylock and a return value of thrd_success, acquires_count does not need any special decorations wrt the standard. Period. Imvho, POSIX should strive for the same protections... |
David Brown <david.brown@hesbynett.no>: Nov 23 11:14AM +0100 On 23/11/18 10:23, Chris M. Thomasson wrote: >> equivalents can be used, like __sync_fetch_and_add, memory barriers, etc. > Well, wrt C11 and mtx_trylock and a return value of thrd_success, > acquires_count does not need any special decorations wrt the standard. Yes, it needs special consideration. A mutex lock is not a general barrier - it is an acquire operation that synchronises with other mutex operations on the same mutex. That is /all/. "acquires_count" should be atomic, and use atomic operations. Without atomic, or volatile, the compiler can do exactly the same sort of optimisations. In fact, using C11 mtx_trylock() allows even more optimisations than pthread_mutex_trylock() would do. The compiler knows exactly what mtx_trylock() does, since it is in the standard library - it could use that knowledge to see that the call could not possibly read or write acquires_count - and then hoist the reading of acquires_count above the call to mtx_trylock(). > Period. Imvho, POSIX should strive for the same protections... Well, since you are wrong (AFAIUI) about the guarantees you get from C11 mutex functions, this point is moot. But if you /did/ get extra guarantees from mtx_trylock(), then it would be due to special features of those functions in the standard, and a general normal function unknown to the compiler cannot get the same features. The nearest you could get was to have an atomic_thread_fence() call in the function, but that only affects atomic operations. |
scott@slp53.sl.home (Scott Lurndal): Nov 23 03:50PM >There's one thing you can't do in C with the same performance like with >C++: error handling. RAII with table-driven exception-handling has zero >overhead for the performance-relevant case that no exception is thrown. For some value of zero greater than zero. There's always overhead, whether it's instruction cache footprint, more instructions executed or poor locality of reference. You can't get more efficient then longjmp(). |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 17 03:32AM -0800 On Saturday, November 17, 2018 at 3:35:59 AM UTC-5, Juha Nieminen wrote: > I suppose something like it could be possible, but I'm not sure > how. Especially if it has to have zero overhead when exceptions > are not thrown. You have to maintain a separate manual stack that runs in parallel to the machine stack, one which holds the current winding state at each point in code. It allows structured access for recovery at any point, including a new return to x; keyword. It allows for nested exceptions / immediate or local exceptions, as well as deep parent exceptions, simply. -- Rick C. Hodgin |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 16 08:00PM -0800 On 11/16/2018 1:48 PM, Vir Campestris wrote: > That's odd. > There shouldn't be anything you can do in C++ you can't do in C - albeit > with more pain. Hardcore nests of pain. Fwiw, sometimes I use a very simplistic method of interfaces in C: https://pastebin.com/raw/f52a443b1 It is very minimalist, but it works for what I need it to work for. :) |
Juha Nieminen <nospam@thanks.invalid>: Nov 17 08:35AM > There shouldn't be anything you can do in C++ you can't do in C - albeit > with more pain. I'm not exactly sure how you would implement exceptions in C (which, need I remind, needs to take RAII into account when unwinding the stack). I suppose something like it could be possible, but I'm not sure how. Especially if it has to have zero overhead when exceptions are not thrown. |
Melzzzzz <Melzzzzz@zzzzz.com>: Nov 16 10:09PM > That's odd. > There shouldn't be anything you can do in C++ you can't do in C - albeit > with more pain. How about implenting efficient qsort function? C++ beats C there and this is std function... -- press any key to continue or any other to quit... |
Vir Campestris <vir.campestris@invalid.invalid>: Nov 16 09:48PM On 15/11/2018 22:07, Stefan Ram wrote: > AFAIK, now there C++ is reported to > be faster than C! Finally after all those years and > all those efforts by many contributors, C++ has won! That's odd. There shouldn't be anything you can do in C++ you can't do in C - albeit with more pain. Andy |
scott@slp53.sl.home (Scott Lurndal): Nov 16 09:51PM >That's odd. >There shouldn't be anything you can do in C++ you can't do in C - albeit >with more pain. And "won"? It's not a contest. |
Vir Campestris <vir.campestris@invalid.invalid>: Nov 17 05:33PM On 16/11/2018 22:09, Melzzzzz wrote: >> with more pain. > How about implenting efficient qsort function? C++ beats C there and > this is std function... An efficient _generic_ qsort function should be much _easier_ in C++. Templates, etc. But if you know what you're sorting and are willing to put that into the sort code - I don't know why C++ should be faster. Andy |
Melzzzzz <Melzzzzz@zzzzz.com>: Nov 17 05:42PM > An efficient _generic_ qsort function should be much _easier_ in C++. > Templates, etc. But if you know what you're sorting and are willing to > put that into the sort code - I don't know why C++ should be faster. Well, qsort in clib is much slower then C++ sort... -- press any key to continue or any other to quit... |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 15 08:57PM -0800 On 11/15/2018 3:07 AM, Rick C. Hodgin wrote: > Sin was here (on Earth, by man). > God had to become one of us (a man) ... here ... where we are, to > save us from sin where sin was, where we were condemned. I am wondering how many planets God has visited wrt cloning itself into the sentient intelligent inhabitants that He created anyway. God has experienced being a Human on Earth within the physical realm. Did God do the same thing for other planets in the creation? [...] |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Nov 16 04:42PM On 16/11/2018 16:33, Rick C. Hodgin wrote: [snip tl;dr] Anti-gravity shoes, mate. /Flibble -- "You won't burn in hell. But be nice anyway." – Ricky Gervais "I see Atheists are fighting and killing each other again, over who doesn't believe in any God the most. Oh, no..wait.. that never happens." – Ricky Gervais "Suppose it's all true, and you walk up to the pearly gates, and are confronted by God," Bryne asked on his show The Meaning of Life. "What will Stephen Fry say to him, her, or it?" "I'd say, bone cancer in children? What's that about?" Fry replied. "How dare you? How dare you create a world to which there is such misery that is not our fault. It's not right, it's utterly, utterly evil." "Why should I respect a capricious, mean-minded, stupid God who creates a world that is so full of injustice and pain. That's what I would say." |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 16 05:55PM -0800 On 11/16/2018 12:04 PM, Mr Flibble wrote: >>> [snip tl;dr] ... > [snip] > Anti-gravity shoes, mate. Moon boots: https://youtu.be/ELBy5stH3b8?t=93 Still, it really does sound like Jesus is a clone of a higher being wrt the Bible. It also seems like Jesus and God share the same soul. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment