soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

The unbelievable slowness of larger int types (in my experiment) - 9 Updates
We've won! - 13 Updates
Hateful abusive posts by Leigh Johnston, round #2 - 3 Updates

The unbelievable slowness of larger int types (in my experiment)

Juha Nieminen <nospam@thanks.invalid>: Nov 16 12:22PM

> This took 2.5 seconds. However, 10 million is small enough that you can
> replace 1ull by 1 with no problems. This cut the time from 2.5 seconds
> to 0.98 seconds, and I did this experiment many times.

I haven't checked, but when creating such benchmarks, you should
*always* make sure and check that the compiler isn't actually
optimizing code away because the result is unused.

It's a very typical mistake to make. Just very recently I had
a conversation with someone in another forum who thought that
his O(n^2) linked list traversal was faster than who you traverse
std::list with iterators... only to find out that the compiler
was optimizing his code out completely because it noticed that
the end result was not being used for anything.

(Not saying you have made such a mistake. Just pointing out that
you should make sure that's not what's happening.)

One way to check what the compiler is doing is to make it output
in asm and examine the resulting code. (Of course this requires
a modicum of understanding of asm.)

At the very least you should always do something with the end
result (such as printing it, even if you aren't interested in
what the end result actually is). Even then, there can still
be situations where the compiler is overly smart and optimizing
code away that shouldn't be, if it's optimizing away the code
you are trying to benchmark the speed of.

bitrex <user@example.net>: Nov 16 01:09AM -0500

On 11/15/2018 03:35 PM, Paul wrote:
> ++count;

> return count;
> }

why not use Compiler Explorer:

<https://godbolt.org/>

and write your code there and you can see precisely what it generates
for many compiler versions and many architectures

Paul <pepstein5@gmail.com>: Nov 17 01:25PM -0800

On Saturday, November 17, 2018 at 9:17:31 PM UTC, Ben Bacarisse wrote:
> return count;

> It may of course be slower, but when you have only a few bits set it may
> pay off.

I think this trick, popularised (or maybe even invented) by Kernighan
is known to be much faster in general than naive bit by bit methods.

It's a better big O. Your method is O(number of bits set) which is
better than O(log n). It doesn't need to have "only a few bits set" to
pay off. It pays off in general. It might be worse in cases where
almost all the bits are set -- I haven't tested.

Paul

Paul <pepstein5@gmail.com>: Nov 17 01:20PM -0800

> return count;
> }

> FWIW, GCC generates a rather shorter loop for that too.

Your code is significantly faster than mine and avoids the problematic
comparison in my code.
The reason I coded it the way I did is that I was looking for the most
direct approach so I said "What's happening in the kth binary place" So this
led to me introducing 1 << place.
x/=2; seems to execute faster than x >>= 1; so maybe a problem with my code
is that bit shifting isn't as fast as the alternatives. I'm not sure why my code is so much slower than yours -- it seems to take about 1.5 times as long to execute.

Paul

Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 17 09:17PM

>>> On Thu, 15 Nov 2018 12:35:46 -0800 (PST), Paul <pepstein5@gmail.com>
<snip>
>>> > ++count;

>>> > return count;
>>> > }
<snip>
> return count;
> }

> FWIW, GCC generates a rather shorter loop for that too.

There's a neat way to clear the least significant set bit:

int count = 0;
for (; x; count++) x &= x - 1;
return count;

It may of course be slower, but when you have only a few bits set it may
pay off.

--
Ben.

Robert Wessel <robertwessel2@yahoo.com>: Nov 17 09:38AM -0600

On Fri, 16 Nov 2018 00:15:53 -0800 (PST), Paul <pepstein5@gmail.com>
wrote:

>I'm going to attempt to use these instructions to fix the issue:
>https://dev.my-gate.net/2014/09/15/how-to-use-mingw-w64-with-clion/

>Please let me know if you have anything else to add.

If you want a short, bit-by-bit, algorithm, why not just do the more
conventional basic approach or repeatedly shifting x right? Note that
I've changed the type to unsigned. Not tested, etc:

int popcnt(unsigned x)
{
int count = 0;
while (x)
{
count += x & 1;
x /= 2;
}
return count;
}

FWIW, GCC generates a rather shorter loop for that too.

Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 17 04:15PM

On Thu, 2018-11-15, Paul wrote:
> This explains why it wasn't all optimized away.

> I changed to -O3 optimization. I'm not sure whether this is best
> or what you would advise (?).

With gcc I tend to use -O2 or -Os. Sometimes -O3, if I can show that
it generates measurably faster code. But the main thing is enabling
the optimizer at all.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paul <pepstein5@gmail.com>: Nov 16 12:17AM -0800

On Friday, November 16, 2018 at 6:10:00 AM UTC, bitrex wrote:

> <https://godbolt.org/>

> and write your code there and you can see precisely what it generates
> for many compiler versions and many architectures

Thanks, good tip.

Paul

Paul <pepstein5@gmail.com>: Nov 16 12:15AM -0800

> performance difference on most processors. Are you by any chance
> building 32-bit code? That will show a significant slowdown when
> using the 64-bit type.

I'm sure that building 32-bit code is the problem.
Thanks for pointing this out.
I'm using gcc with CLion.
I'm going to attempt to use these instructions to fix the issue:
https://dev.my-gate.net/2014/09/15/how-to-use-mingw-w64-with-clion/

Please let me know if you have anything else to add.

Paul

We've won!

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 23 02:51AM -0500

Chris M. Thomasson wrote:

> Well, there was a problem with a GCC optimization that broke a POSIX mutex. It
> was an old post on comp.programming.threads:

> https://groups.google.com/d/topic/comp.programming.threads/Y_Y2DZOWErM/discussion

I have never read that discussion before and got interested by your mentioning
of it. I found the claims of people who believed that acquiring POSIX mutex is
somehow supposed to protect some "associated" variable highly dubious. Just in
case, I double-checked and could not find any such guarantee in my copy of POSIX
standard (The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004
Edition).

AFAIK (and I has been using POSIX synchronization primitives for decades), the
only thing that successful locking of a posix mutex by a thread guarantees is
that no other thread has or will have that mutex locked until the first thread
releases it. How to use this guarantee is entirely up to the programmer. In
particular, I believe that the OP's code was buggy if the intention was to
maintain a total count of successful locks by all threads in `acquires_count'
variable, exactly because the programmer did not take responsibility for memory
access sequencing. Respectively, IMHO the code optimization by GCC was
absolutely legal.

David Brown <david.brown@hesbynett.no>: Nov 23 10:02AM +0100

On 23/11/18 08:51, Pavel wrote:
> variable, exactly because the programmer did not take responsibility for memory
> access sequencing. Respectively, IMHO the code optimization by GCC was
> absolutely legal.

I agree. C has sequence points for /logical/ ordering of operations
according to the abstract machine, but these do not need to be followed
in the actual generated code. Only observable behaviour needs to follow
a specific order (until the 5.1.2.4 "Multi-threaded execution" was
introduced in C11). So with the "trylock" function as given, the
compiler can't rely on the value of "acquires_count" read before the
call to "ptrhead_mutex_trylock", because that call may change the global
variable. However, after the call returns, the compiler knows it has
full control of the system, and that it can do as it likes with any
non-volatile variables here. It is free to read acquires_count before
checking res, and it is free to write it as it wants. It can read it
and write it as many times as it likes, with whatever values, regardless
of "res", as long as the last value written by end of the function
matches the value expected by the abstract machine.

int trylock()
{
int res;
res = pthread_mutex_trylock(&mutex);
if (res == 0)
++acquires_count;
return res;
}

The answer, of course, is to use "volatile" (or now "atomic") accesses
on acquires_count. Alternatively, OS-specific or compiler specific
equivalents can be used, like __sync_fetch_and_add, memory barriers, etc.

"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 23 01:23AM -0800

On 11/23/2018 1:02 AM, David Brown wrote:
>>>>>>>>> On 2018-11-17, Vir Campestris <vir.campestris@invalid.invalid> wrote:
>>>>>>>>>> On 16/11/2018 22:09, Melzzzzz wrote:
>>>>>>>>>>> On 2018-11-16, Vir Campestris <vir.campestris@invalid.invalid> wrote:
[...]

> The answer, of course, is to use "volatile" (or now "atomic") accesses
> on acquires_count. Alternatively, OS-specific or compiler specific
> equivalents can be used, like __sync_fetch_and_add, memory barriers, etc.

Well, wrt C11 and mtx_trylock and a return value of thrd_success,
acquires_count does not need any special decorations wrt the standard.
Period. Imvho, POSIX should strive for the same protections...

David Brown <david.brown@hesbynett.no>: Nov 23 11:14AM +0100

On 23/11/18 10:23, Chris M. Thomasson wrote:
>> equivalents can be used, like __sync_fetch_and_add, memory barriers, etc.

> Well, wrt C11 and mtx_trylock and a return value of thrd_success,
> acquires_count does not need any special decorations wrt the standard.

Yes, it needs special consideration. A mutex lock is not a general
barrier - it is an acquire operation that synchronises with other mutex
operations on the same mutex. That is /all/. "acquires_count" should
be atomic, and use atomic operations. Without atomic, or volatile, the
compiler can do exactly the same sort of optimisations.

In fact, using C11 mtx_trylock() allows even more optimisations than
pthread_mutex_trylock() would do. The compiler knows exactly what
mtx_trylock() does, since it is in the standard library - it could use
that knowledge to see that the call could not possibly read or write
acquires_count - and then hoist the reading of acquires_count above the
call to mtx_trylock().

> Period. Imvho, POSIX should strive for the same protections...

Well, since you are wrong (AFAIUI) about the guarantees you get from C11
mutex functions, this point is moot. But if you /did/ get extra
guarantees from mtx_trylock(), then it would be due to special features
of those functions in the standard, and a general normal function
unknown to the compiler cannot get the same features. The nearest you
could get was to have an atomic_thread_fence() call in the function, but
that only affects atomic operations.

scott@slp53.sl.home (Scott Lurndal): Nov 23 03:50PM

>There's one thing you can't do in C with the same performance like with
>C++: error handling. RAII with table-driven exception-handling has zero
>overhead for the performance-relevant case that no exception is thrown.

For some value of zero greater than zero. There's always overhead,
whether it's instruction cache footprint, more instructions executed
or poor locality of reference.

You can't get more efficient then longjmp().

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 17 03:32AM -0800

On Saturday, November 17, 2018 at 3:35:59 AM UTC-5, Juha Nieminen wrote:
> I suppose something like it could be possible, but I'm not sure
> how. Especially if it has to have zero overhead when exceptions
> are not thrown.

You have to maintain a separate manual stack that runs in parallel
to the machine stack, one which holds the current winding state
at each point in code. It allows structured access for recovery
at any point, including a new return to x; keyword. It allows for
nested exceptions / immediate or local exceptions, as well as deep
parent exceptions, simply.

--
Rick C. Hodgin

"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 16 08:00PM -0800

On 11/16/2018 1:48 PM, Vir Campestris wrote:

> That's odd.

> There shouldn't be anything you can do in C++ you can't do in C - albeit
> with more pain.

Hardcore nests of pain. Fwiw, sometimes I use a very simplistic method
of interfaces in C:

https://pastebin.com/raw/f52a443b1

It is very minimalist, but it works for what I need it to work for.

:)

Juha Nieminen <nospam@thanks.invalid>: Nov 17 08:35AM

> There shouldn't be anything you can do in C++ you can't do in C - albeit
> with more pain.

I'm not exactly sure how you would implement exceptions in C
(which, need I remind, needs to take RAII into account when
unwinding the stack).

I suppose something like it could be possible, but I'm not sure
how. Especially if it has to have zero overhead when exceptions
are not thrown.

Melzzzzz <Melzzzzz@zzzzz.com>: Nov 16 10:09PM

> That's odd.

> There shouldn't be anything you can do in C++ you can't do in C - albeit
> with more pain.

How about implenting efficient qsort function? C++ beats C there and
this is std function...

--
press any key to continue or any other to quit...

Vir Campestris <vir.campestris@invalid.invalid>: Nov 16 09:48PM

On 15/11/2018 22:07, Stefan Ram wrote:
> AFAIK, now there C++ is reported to
> be faster than C! Finally after all those years and
> all those efforts by many contributors, C++ has won!

That's odd.

There shouldn't be anything you can do in C++ you can't do in C - albeit
with more pain.

Andy

scott@slp53.sl.home (Scott Lurndal): Nov 16 09:51PM

>That's odd.

>There shouldn't be anything you can do in C++ you can't do in C - albeit
>with more pain.

And "won"? It's not a contest.

Vir Campestris <vir.campestris@invalid.invalid>: Nov 17 05:33PM

On 16/11/2018 22:09, Melzzzzz wrote:
>> with more pain.

> How about implenting efficient qsort function? C++ beats C there and
> this is std function...

An efficient _generic_ qsort function should be much _easier_ in C++.
Templates, etc. But if you know what you're sorting and are willing to
put that into the sort code - I don't know why C++ should be faster.

Andy

Melzzzzz <Melzzzzz@zzzzz.com>: Nov 17 05:42PM

> An efficient _generic_ qsort function should be much _easier_ in C++.
> Templates, etc. But if you know what you're sorting and are willing to
> put that into the sort code - I don't know why C++ should be faster.

Well, qsort in clib is much slower then C++ sort...

--
press any key to continue or any other to quit...

Hateful abusive posts by Leigh Johnston, round #2

"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 15 08:57PM -0800

On 11/15/2018 3:07 AM, Rick C. Hodgin wrote:

> Sin was here (on Earth, by man).

> God had to become one of us (a man) ... here ... where we are, to
> save us from sin where sin was, where we were condemned.

I am wondering how many planets God has visited wrt cloning itself into
the sentient intelligent inhabitants that He created anyway. God has
experienced being a Human on Earth within the physical realm. Did God do
the same thing for other planets in the creation?

[...]

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Nov 16 04:42PM

On 16/11/2018 16:33, Rick C. Hodgin wrote:

[snip tl;dr]

Anti-gravity shoes, mate.

/Flibble

--
"You won't burn in hell. But be nice anyway." – Ricky Gervais

"I see Atheists are fighting and killing each other again, over who
doesn't believe in any God the most. Oh, no..wait.. that never happens." –
Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."

"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Nov 16 05:55PM -0800

On 11/16/2018 12:04 PM, Mr Flibble wrote:
>>> [snip tl;dr] ...

> [snip]

> Anti-gravity shoes, mate.

Moon boots:

https://youtu.be/ELBy5stH3b8?t=93

Still, it really does sound like Jesus is a clone of a higher being wrt
the Bible. It also seems like Jesus and God share the same soul.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, November 23, 2018

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me