Tuesday, April 18, 2017

Digest for comp.lang.c++@googlegroups.com - 10 updates in 2 topics

Ralf Goertz <me@myprovider.invalid>: Apr 18 12:00PM +0200

Am Sun, 16 Apr 2017 13:39:14 +0200
 
> Ouch, thanks. I should have looked at the value. So there seems to be
> no constant 2*π defined in math.h? Isn't that constant worth being
> defined? Of course I can use 2*M_PI, but then I could also use 2/M_PI.
 
I am really interested in an answer to that. Would the value produced
by the compiler for 2/M_PI be worse than the one for 2*M_PI concerning
accuracy so that one really needs M_2_PI defined but no constant for
2*π?
Ralf Goertz <me@myprovider.invalid>: Apr 18 12:43PM +0200

Am Wed, 12 Apr 2017 14:57:52 +0200
> single call on platforms that allow that. This makes use of the power
> of numerical algorithms to calculate both values simultaneously in
> significantly less time than the two separate calls.
 
I thought I'd give it a try and define my own sincos function using
Taylor approximation. I thoght all I need are the results for the
interval [0,M_PI/4] and use trigonomic identities to figure out the
rest. I guess M_PI/4 is small enough for the series to be cut at x^10.
 
The output of the program below is
 
root mean square errors: sine=1.663e-11 cosine=1.663e-11
 
But I don't know how accurate sincos is, so the error might be smaller.
Or I am measuring noise ;-). At least it is no surprise the both errors
are the same. Anyway, the function is quite fast although not as fast as
sincos or even sin and cos in succession. (By the way, is it possible to
stop g++ from optimizing the single calls into one sincos call with -O2
still intact? In order to do that here I had to cheat and use different
angles for sine and cosine, respectively.) So with my loop of 10000000
calls I got the following times:
 
mysincos: 0.670069 seconds
sincos: 0.525864 seconds
sin/cos: 0.605176 seconds
 
Here is the function and the program to find out the rms error. Comments
are very welcome.
 
 
 
#include <iostream>
#include <cmath>
 
using namespace std;
 
template<typename T> inline void mysincos(const T &rho, T* s, T*c) {
//taylor coeffs starting from 2
const static T coeffs[]{-T(1)/2,-T(1)/6,T(1)/24,T(1)/120,-T(1)/720,-T(1)/5040,T(1)/40320,T(1)/362880,-T(1)/3628800,-T(1)/39916800};
 
//sin(-x)=-sin(x) cos(-x)=cos(x) so only deal with positive angles;
bool neg=rho<0;
T r(fabs(rho));

//shift angle into the [0,2π) Intervall
r=fmod(r,2*M_PI);

//figure out in which of 8 parts the angle is
int octo(0);
while (r>=(octo+1)*M_PI/4) ++octo;

//put angle into the first of those parts
r=fmod(r,M_PI/4);

//reflect r at π/8 if original value was in an odd interval
if (octo%2) r=M_PI/4-r;
 
//initialize taylor approximation
*s=r;*c=1;
 
//initialize x powers
T x(r*r);
//loop over taylor coefficients
for (auto i=0;i<sizeof(coeffs)/sizeof(T);++i) {
if (i%2) //sine
*s+=x*coeffs[i];
else //cosine
*c+=x*coeffs[i];
//increase the exponent of x
x*=r;
}
 
//figure out what to do to get back to the full interval
//for octo € {1,2,5,6} swap sine and cosine
auto o4=octo%4;
if (o4 && o4<3) swap(*s,*c);
 
switch (octo) {
case 2:
case 3:*c=-*c;break;
case 4:
case 5:*s=-*s;*c=-*c;break;
case 6:
case 7:*s=-*s;break;
default:break;
}
if (neg) *s=-*s;
}
 
 
int main() {
double s,c;
const int max_rms_n=256;
double rms_s(0),rms_c(0);
for (int i=0;i<max_rms_n;++i) {
double phi(2*i*M_PI/max_rms_n);
sincos(phi,&s,&c);
double sq(s-sin(phi));
rms_s+=sq*sq;
sq=c-cos(phi);
rms_c+=sq*sq;
}
cout<<"root mean square errors: sine="<<sqrt(rms_s/max_rms_n)<<" cosine="<<sqrt(rms_c/max_rms_n)<<endl;
return 0;
}
Ralf Goertz <me@myprovider.invalid>: Apr 18 12:49PM +0200

Am Tue, 18 Apr 2017 12:43:57 +0200
> for (int i=0;i<max_rms_n;++i) {
> double phi(2*i*M_PI/max_rms_n);
> sincos(phi,&s,&c);
 
correction: mysincos(phi,&s,&c)
 
Manfred <noname@invalid.add>: Apr 18 01:13PM +0200

On 04/18/2017 12:00 PM, Ralf Goertz wrote:
> by the compiler for 2/M_PI be worse than the one for 2*M_PI concerning
> accuracy so that one really needs M_2_PI defined but no constant for
> 2*π?
 
I don't have a definite answer. An obvious observation is that in a
radix-2 FP representation if you have a constant x then the expression
2.0*x has exactly the same representation accuracy (being only an
increment in exponent), so there might be other reasons than accuracy or
there may be a difference for non radix-2 representations.
David Brown <david.brown@hesbynett.no>: Apr 18 01:37PM +0200

On 18/04/17 13:13, Manfred wrote:
> 2.0*x has exactly the same representation accuracy (being only an
> increment in exponent), so there might be other reasons than accuracy or
> there may be a difference for non radix-2 representations.
 
Another obvious observation is that as long as you keep everything
declared "const" (or "constexpr") or use literals, and have optimisation
enabled, then the compiler should do all these calculations at compile
time rather than run time. That may also help avoid rounding errors.
David Brown <david.brown@hesbynett.no>: Apr 18 02:34PM +0200

On 18/04/17 12:43, Ralf Goertz wrote:
> sin/cos: 0.605176 seconds
 
> Here is the function and the program to find out the rms error. Comments
> are very welcome.
 
I have a few brief comments. I haven't tested any of these points, so
they may or may not have a significant effect.
 
1. The function is too long to be "inline" - remove that, and rely on
the compiler to inline the function if appropriate.
 
2. Use local variables for your sin and cos, rather than *s and *c.
This is especially relevant when the function is not inlined.
 
3. Make sure "-ffast-math" is in use (for gcc - other compilers may have
similar options). This can make quite a difference to some floating
point code speed.
 
4. Taylor series are easy to understand, but they are not the fastest
converging polynomial series for sin/cos.
 
5. You first put the angle into the range [0, π/4). Taylor series are
faster and/or more accurate for a small range, centred around 0.
Consider rotating another π/8 to get in the range [-π/8, π/8). Use:
 
sin(x ± π/8) = cos(π/8).sin(x) ± sin(π/8).cos(x)
cos(x ± π/8) = cos(π/8).cos(x) ± -sin(π/8).sin(x)
 
6. Every iteration of your loop, you multiply x by r. This makes it
easy to calculate, but your rounding errors can build up over the ten
cycles. Tracking r^i and r^(i + 1) separately, multiplying each by r²,
will let you have a loop of 5 steps, avoid the conditional on (i % 2),
allow better instruction scheduling, and half the error buildup.
 
7. Consider only calculating sin or cos, and using c = sqrt(1 - s²) to
calculate the other half.
Ralf Goertz <me@myprovider.invalid>: Apr 18 05:15PM +0200

Am Tue, 18 Apr 2017 14:34:25 +0200
> > Comments are very welcome.
 
> I have a few brief comments. I haven't tested any of these points, so
> they may or may not have a significant effect.
 
Hi David, thanks for your comments!
 
> 1. The function is too long to be "inline" - remove that, and rely on
> the compiler to inline the function if appropriate.
 
Point taken. But the program is actually faster with inline, see below.
 
> 2. Use local variables for your sin and cos, rather than *s and *c.
> This is especially relevant when the function is not inlined.
 
Yep, normally I would have used references, but I wanted to be
compatible to sincos function call. I am aware that references are
probably nothing but pointers in disguise, so the problem wouldn't go
away.
 
> 3. Make sure "-ffast-math" is in use (for gcc - other compilers may
> have similar options). This can make quite a difference to some
> floating point code speed.
 
I always thought that -ffast-math is dangerous. But with that option my
function actually wins (when inlined)!
 
mysincos: 0.496899 seconds. (inline)
mysincos: 0.545307 seconds. (no inline)
sincos: 0.522989 seconds.
 
> 4. Taylor series are easy to understand, but they are not the fastest
> converging polynomial series for sin/cos.
 
Sigh, as you said, Taylor is easy!
 
> Consider rotating another π/8 to get in the range [-π/8, π/8). Use:
 
> sin(x ± π/8) = cos(π/8).sin(x) ± sin(π/8).cos(x)
> cos(x ± π/8) = cos(π/8).cos(x) ± -sin(π/8).sin(x)
 
This would mean to use hexo instead of octo ;-) Looking at the
extraordinary match of the tenth Taylor polynom even for an Intervall
like [-5,5]
<https://upload.wikimedia.org/wikipedia/commons/c/cc/Sine_GIF.gif> I
thought I could get away with π/4. And I am pretty sure that I don't
need to use the negative half.
 
> cycles. Tracking r^i and r^(i + 1) separately, multiplying each by
> r², will let you have a loop of 5 steps, avoid the conditional on (i
> % 2), allow better instruction scheduling, and half the error buildup.
 
I will consider this.
 
> 7. Consider only calculating sin or cos, and using c = sqrt(1 - s²) to
> calculate the other half.
 
But won't sqrt be as costly as cos itself?
 
Thanks again for your time.
 
Ralf
Christian Gollwitzer <auriocus@gmx.de>: Apr 18 08:48PM +0200

Am 18.04.17 um 17:15 schrieb Ralf Goertz:
>> 4. Taylor series are easy to understand, but they are not the fastest
>> converging polynomial series for sin/cos.
 
> Sigh, as you said, Taylor is easy!
 
Taylor is a bad choice for one simple reason: the Taylor approximation
is exact at the point of the approximation, and with all derivatives of
the order of the series; the error increases when you move away from the
expansion point. So your series must be expanded until the error at the
endpoint of the interval reaches is small enough, and close to the point
of expansion the error is much smaller / precision is wasted.
 
Instead, for numerical approximation, a series should spread the error
evenly across the interval, which is called the a "minimax
approximation". A Chebyshev series approximates this with polynomials.
A very good approximation can often be done with rational functions
(Rational Chebyshev approximation), which cost only a single division in
the end. There is the Remez algorithm to compute this kind of series,
which is a bit complex. A simpler approach yields almost the same result
by least-squares fitting the rational function; see
http://www.aip.de/groups/soe/local/numres/bookfpdf/f5-13.pdf
 
Boost includes an implementation of the Remez algorithm to implement
special functions
http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/remez.html
 
For sin/cos there is another method, there are recurrence formulae for
doubling or tripling the angle. If you divide the angle in the octant by
4, you can apply a shorter series and then use the angle-doubling
formula twice. This was done in this article
https://arxiv.org/pdf/cs/0406049
This formula also generates both sine and cosine together, therefore
should be faster than any of the aforementioned methods.
 
Christian
David Brown <david.brown@hesbynett.no>: Apr 18 09:44PM +0200

On 18/04/17 17:15, Ralf Goertz wrote:
> compatible to sincos function call. I am aware that references are
> probably nothing but pointers in disguise, so the problem wouldn't go
> away.
 
Don't use references either - a reference is just a pointer in disguise.
Use local variables for your calculations, and assign the results to
*s and *c at the end. When you are trying to make fast code, never use
your external data for intermediary results. (Personally, I prefer not
to do that anyway - I don't like my "real" data objects to hold
half-calculated values.)
 
>> floating point code speed.
 
> I always thought that -ffast-math is dangerous. But with that option my
> function actually wins (when inlined)!
 
-ffast-math is not "dangerous" - but it does mean that you lose certain
IEEE floating point guarantees. For example, it assumes your maths is
all finite - no NaNs, infinities, etc. And it lets the compiler treat
maths as associative, use multiplication by reciprocals instead of
division, etc.
 
 
>> sin(x ± π/8) = cos(π/8).sin(x) ± sin(π/8).cos(x)
>> cos(x ± π/8) = cos(π/8).cos(x) ± -sin(π/8).sin(x)
 
> This would mean to use hexo instead of octo ;-)
 
No, still octants. But instead of using 0..45 degree octants, you
balance them about 0 for -22.5 to +22.5 degree sectors.
 
> <https://upload.wikimedia.org/wikipedia/commons/c/cc/Sine_GIF.gif> I
> thought I could get away with π/4. And I am pretty sure that I don't
> need to use the negative half.
 
It is all a balance between more complicated algorithms, speed, and
accuracy.
 
 
>> 7. Consider only calculating sin or cos, and using c = sqrt(1 - s²) to
>> calculate the other half.
 
> But won't sqrt be as costly as cos itself?
 
Not necessarily - sqrt is usually cheaper.
 
 
> Thanks again for your time.
 
Another method that I didn't mention, but which you might like to look
up on Wikipedia, is the CORDIC algorithm.
scott@slp53.sl.home (Scott Lurndal): Apr 18 05:26PM

>> off format=flowed then it too is a somewhat deficient newsreader.)
 
>Flowed format is old, common technology now (I remember we added support
>for it in clc++m in the 2000's), and it solves the problem.
 
Usenet was in existence for 30 years at that point, without
format=flowed. Google groups adding format=flowed was unnecessary.
 
>It's a simple technical solution to a technical problem.
 
>If Emacs doesn't handle it well, then just use a newsreader that does. ;-)
 
Not an acceptable solution to many.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: