- How to improve pow() - 13 Updates
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:46AM +0200 Am 06.09.2022 um 20:49 schrieb Lynn McGuire: > You need to unroll the loop for the common cases of e == 1, e == 2, e == > 3, etc. > Lynn LOL. |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:48AM +0200 Am 06.09.2022 um 19:08 schrieb Bonita Montero: > v *= sq; > return e >= 0 ? v : 1.0 / v; > } I've got it ! I've optimized away one jump and the performance is now the same: double powPow( double d, int e ) { auto run = [&]<bool Neg>( bool_constant<Neg> ) -> double { unsigned eAbs; if constexpr( !Neg ) eAbs = e; else eAbs = -e; double v = 1.0, sq = d; for( ; eAbs; eAbs >>= 1, sq *= sq ) if( eAbs & 1 ) v *= sq; if constexpr( Neg ) v = 1.0 / v; return v; }; if( e >= 0 ) return run( false_type() ); else return run( true_type() ); } |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 07:17AM +0200 Now with loop-unrolling, but not the one Lynn suggested: double powPow( double d, int e ) { auto run = [&]<bool Neg>( bool_constant<Neg> ) -> double { auto unroll_n = [&]<typename Fn, size_t ... Indices>( Fn fn, std::index_sequence<Indices ...> ) { return (fn.template operator ()<Indices>() && ...); }; unsigned eAbs; if constexpr( !Neg ) eAbs = e; else eAbs = -e; double v = 1.0, sq = d; while( unroll_n( [&]<size_t I>() -> bool { if( !eAbs ) return false; if( eAbs & 1 ) v *= sq; sq *= sq; eAbs >>= 1; return true; }, std::make_index_sequence<5>() ) ); if constexpr( Neg ) v = 1.0 / v; return v; }; if( e >= 0 ) return run( false_type() ); else return run( true_type() ); } Now it's 13% faster than MSVCRT. |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 09:42AM +0200 This is +110% faster than MSVCRT's implementation: ouble powPow( double d, int e ) { constexpr size_t UNROLL_FACTOR = 1; auto run = [&]<bool Neg>( bool_constant<Neg> ) -> double { unsigned eAbs = !Neg ? e : -e; double v = 1.0, sq = d; for( ; eAbs; eAbs >>= 1, sq *= sq ) v *= (int)(eAbs & 1) * sq + (int)(eAbs & 1 ^ 1); v = !Neg ? v : 1.0 / v; return v; }; if( e >= 0 ) return run( false_type() ); else return run( true_type() ); } There are no conditional moves for floating-point operations, so I did the decision whether to add the squared value with that: (int)(eAbs & 1) * sq + (int)(eAbs & 1 ^ 1). |
| "Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Sep 07 10:37AM +0200 On 7 Sep 2022 06:48, Bonita Montero wrote: > else > return run( true_type() ); > } Maybe you could measure also my old code with slightly different approach (no `constexpr`), at <url: https://github.com/progrock-libraries/kickstart/blob/master/source/library/kickstart/main_library/core/ns%E2%96%B8language/operations/intpow.hpp>: #include <assert.h> #include <type_traits> // is_floating_point_v // Important to not introduce possible future name conflicts with <math.h>, hence // the "lx" (short for "language extension") ns▸ namespace kickstart::language::lx::_definitions { using std::is_floating_point_v; namespace impl { // Essentially this is Horner's rule adapted to calculating a power, so that the // number of floating point multiplications is at worst O(log₂n). template< class Number_type > constexpr inline auto intpow_( const Number_type base, const int exponent ) -> Number_type { Number_type result = 1; Number_type weight = base; for( int n = exponent; n != 0; weight *= weight ) { if( n % 2 != 0 ) { result *= weight; } n /= 2; } return result; } } // namespace impl template< class Number_type > constexpr inline auto intpow( const Number_type base, const int exponent ) -> Number_type { // TODO: proper failure handling if( exponent < 0 and not is_floating_point_v<Number_type> ) { assert( "For negative powers the base must be floating point" and false ); } return (0?0 : exponent > 0? impl::intpow_<Number_type>( base, exponent ) : exponent == 0? 1 : Number_type( 1.0/impl::intpow_<Number_type>( base, -exponent ) ) ); } //----------------------------------------------------------- @exported: namespace d = _definitions; namespace exported_names { using d::intpow; } // namespace exported names } // namespace kickstart::language::lx::_definitions namespace kickstart::language::lx { using namespace _definitions::exported_names; } In the above copy+paste I removed an include of a header that defines a `bool` replacement type called `Truth`, b/c I can't see that it's used. The main difference from your code is that I discriminate the different cases in a public function that just calls the internal implementation with appropriate adjustments of arguments and result. Also, I use IMO more clear notation like `/= 2` instead of `>>= 1`. Should yield same machine code though. I never measured this so it's interesting how it would stack up to your code, and MSVC's. - Alf |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 01:38PM +0200 I think that it's very rare that someone uses an integer type for pow( x, intType ). And with a double you have more precision within the calculcaton itself and you'd just to cast back the result. So ther's no need to have that generic. |
| Juha Nieminen <nospam@thanks.invalid>: Sep 07 01:02PM > There are no conditional moves for floating-point operations, > so I did the decision whether to add the squared value with > that: (int)(eAbs & 1) * sq + (int)(eAbs & 1 ^ 1). test.cc:3:18: error: 'size_t' does not name a type test.cc:4:34: error: 'bool_constant' has not been declared test.cc:14:28: error: 'false_type' was not declared in this scope test.cc:16:28: error: 'true_type' was not declared in this scope |
| scott@slp53.sl.home (Scott Lurndal): Sep 07 02:13PM >> 3, etc. >> Lynn >LOL. Instead of insults, how about you simply disassemble the MS runtime version of the function and figure out for yourself how it works? A competent programmer would have no problem doing that. |
| scott@slp53.sl.home (Scott Lurndal): Sep 07 02:14PM >test.cc:4:34: error: 'bool_constant' has not been declared >test.cc:14:28: error: 'false_type' was not declared in this scope >test.cc:16:28: error: 'true_type' was not declared in this scope Typical unreadable code from Christoph/Bonnie. |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 05:18PM +0200 Am 07.09.2022 um 16:14 schrieb Scott Lurndal: >> test.cc:14:28: error: 'false_type' was not declared in this scope >> test.cc:16:28: error: 'true_type' was not declared in this scope > Typical unreadable code from Christoph/Bonnie. There's nothing unreadable with that. It's simply that you're both too stupid to adapt my style. |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 05:19PM +0200 Am 07.09.2022 um 16:13 schrieb Scott Lurndal: >> LOL. > Instead of insults, how about you simply disassemble the MS runtime > version of the function and figure out for yourself how it works? That's not necessary any more since I've developed some code that is more than twice as fast as the MSVC implementation. |
| Muttley@dastardlyhq.com: Sep 07 03:45PM On Wed, 7 Sep 2022 17:18:32 +0200 >> Typical unreadable code from Christoph/Bonnie. >There's nothing unreadable with that. >It's simply that you're both too stupid to adapt my style. Its quite clear you love complexity. |
| Bonita Montero <Bonita.Montero@gmail.com>: Sep 07 06:07PM +0200 >> There's nothing unreadable with that. >> It's simply that you're both too stupid to adapt my style. > Its quite clear you love complexity. I've written complex code, but this absolutely isn't complex. You're just overstrained. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment