- We've won! - 4 Updates
- The unbelievable slowness of larger int types (in my experiment) - 8 Updates
- A non-simultaneity of evaluation - 1 Update
- destructors explained - 3 Updates
- Hateful abusive posts by Leigh Johnston, round #2 - 2 Updates
David Brown <david.brown@hesbynett.no>: Nov 18 02:21PM +0100 On 17/11/2018 18:42, Melzzzzz wrote: >> Templates, etc. But if you know what you're sorting and are willing to >> put that into the sort code - I don't know why C++ should be faster. > Well, qsort in clib is much slower then C++ sort... That is for two reasons. One is that the standard C library qsort function is generic - as Vir said, /generic/ functions can be more efficient in C++ (at least while you want things to be clear, maintainable, etc.), but a specialised sort for a specific type will be as fast in C as in C++. The other issue is that the actual algorithm for sorting is not specified - it is quite possible that the C++ library you tested has a more efficient algorithm than the C library you tested. |
Christian Gollwitzer <auriocus@gmx.de>: Nov 18 07:29PM +0100 Am 18.11.18 um 14:21 schrieb David Brown: > function is generic - as Vir said, /generic/ functions can be more > efficient in C++ (at least while you want things to be clear, > maintainable, etc.), I'm not even sure that this holds for a sort function using a callback. The main difference between the C and the C++ version is that "qsort" is defined in a separately compiled file, while std::sort is defined in the header file and compiled simultaneously. If you define qsort like inline void qsort(void* base, size_t num, size_t size, int (*compar)(const void*,const void*)) { ... code... } *inside* a header file, and you call it with a fixed callback function, then the compiler can inline the comparator in the same manner it inlines it in C++ and I see no reason why it should be slower. Christian |
David Brown <david.brown@hesbynett.no>: Nov 18 07:40PM +0100 On 18/11/2018 19:29, Christian Gollwitzer wrote: > *inside* a header file, and you call it with a fixed callback function, > then the compiler can inline the comparator in the same manner it > inlines it in C++ and I see no reason why it should be slower. Yes, you could do that (with a compiler that is smart enough to inline functions called through pointers). But then you would not be using the qsort from the C library. So this is all proving the same point - you /can/ get the same performance from C as from C++, but it is harder to do so when you are talking about generic functions. |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 18 03:23PM -0800 On Saturday, November 17, 2018 at 12:42:21 PM UTC-5, Melzzzzz wrote: > Well, qsort in clib is much slower then C++ sort... It's much slower still in Hungarian folk dance: Quick-sort with Hungarian (Küküllőmenti legényes) folk dance https://www.youtube.com/watch?v=ywWBy6J5gz8 -- Rick C. Hodgin |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 18 02:35AM > is known to be much faster in general than naive bit by bit methods. > It's a better big O. Your method is O(number of bits set) which is > better than O(log n). You can't really say one is asymptotically better than the other unless you start to talk about the distribution of data values, and most people don't what to go there when using O(...) notation. When not otherwise stated O(...) is usually used to denote the worst-case performance, and here both algorithms are O(log n). And if you switch to "average case" from "worst case", for uniformly distributed data, O(number of bits set) = O((log n)/2) = O(log n). You'd need some non-uniform assumption about the values to be able to say that O(number of bits set) is better than O(log n). -- Ben. |
Paul <pepstein5@gmail.com>: Nov 18 01:18AM -0800 On Sunday, November 18, 2018 at 2:36:04 AM UTC, Ben Bacarisse wrote: > distributed data, O(number of bits set) = O((log n)/2) = O(log n). > You'd need some non-uniform assumption about the values to be able to > say that O(number of bits set) is better than O(log n). Ben, Thanks. Some very valid and good points here. For clarity, the two methods being discussed on these last few posts are posted at the end of this thread. The fact of the matter, which you will see if you test them, is that (with the possible exception of some rare cases) the first implementation is much faster. Your initial post on the first method doesn't acknowledge that. Try it and see if you agree with me. Naively, you'd expect it to take about half the time in general (because with random numbers you'd expect roughly half the bits to be set to 1). I just tried testing the two with numbers up to 10 million. The x &= x-1 trick reduced the time by 40%. The fact that this is not quite 50% suggests that, if nearly all the original bits are set to 1, iterating x by x/=2 is narrowly faster. Obviously, these thoughts are very tentative because they depend on compiler settings, systematic trials etc. Paul int popcnt(unsigned x) { int count = 0; for (; x; count++) x &= x - 1; return count; } and int popcnt(unsigned x) { int count = 0; while (x) { count += x & 1; x /= 2; } return count; } |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 18 12:41PM > (with the possible exception of some rare cases) the first implementation > is much faster. Your initial post on the first method doesn't > acknowledge that. Yes, I was way too vague. It's going to take a bad compiler to make it slower that the other method given here. Other constant-time methods may beat it depending on the architecture, compiler etc. > expect roughly half the bits to be set to 1). I just tried testing > the two with numbers up to 10 million. The x &= x-1 trick reduced the > time by 40%. I got a slightly better reduction in time on uniform data. On worst case data they take about the same time (as you would expect). But on my machine __builtin_popcount beats all other methods hands-down by at least a factor of 5. By the way, the while (x) x /= 2 loop does not test all bits either but it does test about twice the number that the other method tests. (I've not done the exact computation. It would be a neat exercise for a maths class.) If I get time, I may try to time these against a table look-up method. It's a lone time since I've tried that. -- Ben. |
Paul <pepstein5@gmail.com>: Nov 18 05:12AM -0800 On Sunday, November 18, 2018 at 12:41:25 PM UTC, Ben Bacarisse wrote: ... on > my machine __builtin_popcount beats all other methods hands-down by at > least a factor of 5. ... Thanks for your thoughts. Could you clarify "all other methods" please? How about the below? My tests make it more or less an exact match for the builtin, and googling for discussions, that's what others are saying, too. I don't think the below is unambiguously slower than the builtin. Admittedly, SWAR is very difficult to explain to someone who hasn't seen it before. Perhaps you mean "All other methods which are easily explainable in a few minutes so long as someone knows a bit of C++". Thanks again, Paul int swarPopcount(unsigned i) { i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24; } |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 18 03:37PM > ... > Thanks for your thoughts. Could you clarify "all other methods" > please? A mistake. I should have said both other methods -- the two loops you were comparing. I have not test all others by any means! > How about the below? > My tests make it more or less an exact match for the builtin, and googling > for discussions, that's what others are saying, too. Optimised (-O2 and -O3), it's faster than the __builtin_popcount on my system and takes about twice the time unoptimised. This: unsigned c; __asm__ volatile ("POPCNT %1, %0;" :"=r"(c) :"r"(x) ); return c; is faster than the SWAR version (and therefore the builtin one and the loops) but the difference becomes small when both are optimised. But of course it's horribly non-portable. I'd assumed that __builtin_popcount would use this method but it appear not.) (BTW, I don't know much about __asm__ in gcc so if anyone wants to tell me how I should have done it, I'd be grateful.) > Admittedly, SWAR is very difficult to explain to someone who hasn't seen > it before. Perhaps you mean "All other methods which are easily > explainable in a few minutes so long as someone knows a bit of C++". No just a slip of the fingers. "All" should have been just the two loops being compared in those recent posts. -- Ben. |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 18 04:44PM On Sat, 2018-11-17, Paul wrote: > On Saturday, November 17, 2018 at 3:38:07 PM UTC, robert...@yahoo.com wrote: ... > did is that I was looking for the most direct approach so I said > "What's happening in the kth binary place" So this led to me > introducing 1 << place. I was wondering about that (but I didn't ask). What you write makes sense: scan through position 0, 1, 2, ... and look for set bits. Yet my instinct is to do it like robert: draining all the bits out through the bottom, as it were. > x/=2; seems to execute faster than x >>= 1; so maybe a problem with > my code is that bit shifting isn't as fast as the alternatives. That seems unlikely. Back in the 1980s, people used to write n>>2 instead of n/4, because compilers were poor and DIV instructions very expensive. But I soon learned that a half-decent optimizer would turn both into the same assembly code (if n was unsigned, anyway). /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
David Brown <david.brown@hesbynett.no>: Nov 18 07:36PM +0100 On 18/11/2018 16:37, Ben Bacarisse wrote: > loops) but the difference becomes small when both are optimised. But of > course it's horribly non-portable. I'd assumed that __builtin_popcount > would use this method but it appear not.) POPCNT was added with the "nehalem" generation of Intel chips, along with the SSE4 instructions, as far as I can tell. In the AMD line, it came a little later. __builtin_popcount on gcc (and presumably also clang) will use POPCNT on x86, and equivalent instructions for other processors, if it is told to generate code for a device family that supports it. Thus if you just ask for "x86-64" instructions, gcc won't use the POPCNT instruction as it is not supported by all x86-64 devices. But if you ask for "-march=nehalem", you /will/ get it for __builtin_popcount. In particular, if you use "-march=native", gcc will generate code that makes use of all the features of the cpu you are using at the time. That is a good idea for making programs that you will run on your own system, but limits portability of the binary on other processors. If you really want fast code /and/ portability, gcc supports generation of multiple copies of functions with run-time selection of the actual version based on the processor in use at the time. > (BTW, I don't know much about __asm__ in gcc so if anyone wants to tell > me how I should have done it, I'd be grateful.) It looks fine to me. You can use names instead of numbered parameters, which is useful for more advanced inline assembly, but numbered parameters are okay here. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 18 07:48PM > On 18/11/2018 16:37, Ben Bacarisse wrote: >> Paul <pepstein5@gmail.com> writes: <snip> > clang) will use POPCNT on x86, and equivalent instructions for other > processors, if it is told to generate code for a device family that > supports it. <snip> > In particular, if you use "-march=native", gcc will generate code that > makes use of all the features of the cpu you are using at the > time. Interesting. I had forgotten that I was just doing a generic compile. I've tried -march=native and the timing is exactly as for the __asm__ version. Thanks for that tip. (Yet another way in which benchmarking code like this is increasingly complicated.) <snip> -- Ben. |
Paavo Helde <myfirstname@osa.pri.ee>: Nov 18 09:09AM +0200 On 18.11.2018 3:56, Stefan Ram wrote: > . But an evaluation happens at /run time/. How is an > evaluation which happens at run time supposed to evaluate > an expression which only exists until compile-time? Because the object code for evaluating the expression has been prepared at compile time, doh. This is the same how we can have chicken soup for dinner even though no chicken exists in the supermarket. |
ram@zedat.fu-berlin.de (Stefan Ram): Nov 18 01:08AM Ways how to explain object lifetime: "When the destructor is called, this is like when the doctor says to the object, 'I'm sorry, but you should get your things in order.'". "The type of the expression (rvalue or glvalue) is like the last will and testament of the object, determining whether or not its estate can be taken by others." |
ram@zedat.fu-berlin.de (Stefan Ram): Nov 18 01:55AM An expression is an entity of the source-code model. It sits right there in your code and consists of characters you typed. At run-time, there are no expressions; they have been lost during the translation. They are not part of the run-time model. For example, a program usually cannot print its expressions. Yet, we use the term â€ÿóÿý to evaluate an expression†. E.g., |
ram@zedat.fu-berlin.de (Stefan Ram): Nov 18 01:56AM Supersedes: <evaluation-20181118024605@ram.dialup.fu-berlin.de> An expression is an entity of the source-code model. It sits right there in your code and consists of characters you typed. At run-time, there are no expressions; they have been lost during the translation. They are not part of the run-time model. For example, a program usually cannot print its expressions. Yet, we use the term "to evaluate an expression". E.g., |The postfix expression before the dot or arrow is evaluated; n4762 . But an evaluation happens at /run time/. How is an evaluation which happens at run time supposed to evaluate an expression which only exists until compile-time? |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 17 03:45PM -0800 On 11/14/2018 7:25 PM, Pavel wrote: > alien-like fungi that enslave and kill ants (warning: not for the faint of > heart). https://www.youtube.com/watch?v=XuKjBIBBAL8 ... sorry for the > offshoot-off-topic. Will take a look. Thank you. Busy working on some animations right now. Fwiw, my new software has dozens of points to interpolate with fun functions. Here is one: https://youtu.be/1jjiCMC2izc This animation has created some interest 3d volumetric renderings: https://plus.google.com/101799841244447089430/posts/9s9DZ6SYz5x (BUGS!) https://plus.google.com/101799841244447089430/posts/E75s7Ya6fkg https://plus.google.com/101799841244447089430/posts/VLBFc2cS9ap (the GHOST!) ;^D |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 17 03:47PM -0800 On 11/14/2018 7:25 PM, Pavel wrote: >>>>> On Monday, November 12, 2018 at 1:38:05 PM UTC-5, Kenny McCormack wrote: >>>>>> In article <9a34066c-9aa4-4512-a9ef-3d8af47206cc@googlegroups.com>, >>>>>> Rick C. Hodgin <rick.c.hodgin@gmail.com> wrote: [...] > alien-like fungi that enslave and kill ants (warning: not for the faint of > heart). https://www.youtube.com/watch?v=XuKjBIBBAL8 ... sorry for the > offshoot-off-topic. ARGH! That is gnarly! https://youtu.be/XuKjBIBBAL8?t=110 Not for the faint of heart. ouch. ;^o https://youtu.be/XuKjBIBBAL8?t=134 oohhh... |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment