- What's the correct fmod()-algorithm - 18 Updates
| Keith Thompson <Keith.S.Thompson+u@gmail.com>: Apr 04 04:23PM -0700 Bart <bc@freeuk.com> writes: [...] > I don't have trunc() on Windows. If I emulate it with a simple version > the converts to int64 then I get the same result as you. [...] How do you not have trunc()? It's been standard C since C99, and standard C++ since, what, C++11? #include <math.h> double trunc(double x); float truncf(float x); long double truncl(long double x); Description The trunc functions round their argument to the integer value, in floating format, nearest to but no larger in magnitude than the argument. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
| Richard Damon <Richard@Damon-Family.org>: Apr 04 07:32PM -0400 On 4/4/21 10:32 AM, Bonita Montero wrote: > ... this doesn't always give correct values, f.e. if x is very large > without a fraction and y is very samll. So what's the correct alogithm > for fmod ? The basic issue is that you need to realize that computing the value of x / y when done in floating point is going to be, in general inexact, and that to use this sort of formula, you are going to need to take this into account. In particular, if the rounding of x / y cause it to round up to an integer value, the value of trunc(x/y) will be off, so you need to guard for that. One option would be to see if trunc(x / y)*y is bigger than x, and if so then reduce this value by y to effecitvely decrease the trunc by one (or even save the trunc value and decrease it by 1). (This assumes that x and y are positive, if they might be negative you need to handle those signs properly). You also need to handle the case where the value of x/y doesn't even have a 1's digit because the value is too big, in this case you have really had a total loss of precision, I would suspect the defined answer in this case is 0, and accept that in this case it just won't match at all what you would get with x - trunc(x/y)*y since that might require the value to be bigger than y. |
| Bart <bc@freeuk.com>: Apr 05 01:00AM +0100 On 05/04/2021 00:23, Keith Thompson wrote: > The trunc functions round their argument to the integer value, in > floating format, nearest to but no larger in magnitude than the > argument. I forget to check other compilers which do have it. But it's not part of msvcrt.dll which my bcc relies on. It doesn't seem to be defined on top of anything else either. But there /is/ 'floor' which I'd forgotten about. |
| Keith Thompson <Keith.S.Thompson+u@gmail.com>: Apr 04 07:04PM -0700 >> argument. > I forget to check other compilers which do have it. > But it's not part of msvcrt.dll which my bcc relies on. Wikipedia says that's the C standard library for the Visual C++ (MSVC) compiler from version 4.2 to 6.0. I have that file on my Windows 10 laptop, and it appears that it doesn't provide the trunc() function -- which means that the compiler has to find it somewhere else. Are you using an old version of Microsoft's C compiler? Or some other compiler that doesn't know where to look for library functions that aren't in msvcrt.dll? I'm able to use trunc() with Visual Studio 2019. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
| Kaz Kylheku <563-365-8930@kylheku.com>: Apr 05 03:06AM ["Followup-To:" header set to comp.lang.c.] > ... this doesn't always give correct values, f.e. if x is very large > without a fraction and y is very samll. So what's the correct alogithm > for fmod ? While mathematically, fmod(x, y) is always defined for real x, y, y /= 0, it simply might not make sense in floating point for fabs(x) >> fabs(y). Think about it. Suppose that x is so larger that the distance between x and x + epsilon (x and the next consecutive floating-point value) is larger than y. The intuitition behind fmod(x, y) is that if we repeat y enough times (start with 0 and add y), we will get to a value that is just below x. The remaining gap to x is then fmod(x, y). But that point "just below x" is not necessarily representable in that range where consecutive floats are farther apart than y! Nevertheless, the mathematical value fabs(y) exists, and is some real number in the [0, y) range, which may have an excellent floating-point approximation, regardless of the above. Take the next part with a grain of salt; I am no numerical analyst. I'm thinking that it could be fruitful to scale the exponent (by powers of two) of y, to get it into the "ballpark" as x. So that is, suppose we multiply y by pow(2, k), just so that y * pow(2, k) <= x, but y * pow(2, k + 1) > x. I.e. k is the highest exponent such that y * pow(2, k) does not exceed x. Multiplications by powers of two preserve the mantissa exactly: they are precise. Having procured this k we, could subtract these two terms to obtain z: z = x - y*pow(2, k) Note that all we have done is subtract a multiple of y from x. The result has the same fmod against y, namely: fmod(z, y) = fmod(x, y) we can calculate fmod(z, y) instead. Even though x >> y, it is not the case that z >> y. If we can work with floats at the bit level, the calculation for determining k should be fast. Basically we just take the exponent field from x, and plant it into y, tweaking it by at most one to produce a value that does not exceed x, by comparing mantissas. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Apr 04 09:06PM -0700 On 4/4/2021 7:32 AM, Bonita Montero wrote: > ... this doesn't always give correct values, f.e. if x is very large > without a fraction and y is very samll. So what's the correct alogithm > for fmod ? I do not think that standard C/C++ gives any hardcore guarantees on the level of precision its going to give a fmod. You should reap existing code in compilers. |
| antispam@math.uni.wroc.pl: Apr 05 04:46AM > x/y =5290912749423342.00000000000000000000 > trunc(x/y) =5290912749423342.00000000000000000000 > (x/y)*y =52909127494233424.00000000000000000000 Actually (x/y)*y does not equal 52909127494233424, and that is core problem in this example. Namely, decimal 52909127494233424 is not representable exactly in floating point. > x/y (y is 10.0) should be 5290912749423341.6, but it gives ...3342.0. This does not matter much in this example: 5290912749423342 is the closest integer to quitient. Look: 52909127494233416 - 5290912749423342*10 (33) - 4 The result is -4 and it is enough to add 10 to get exact result. -- Waldek Hebisch |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Apr 04 11:39PM -0700 On 4/4/2021 7:32 AM, Bonita Montero wrote: > ... this doesn't always give correct values, f.e. if x is very large > without a fraction and y is very samll. So what's the correct alogithm > for fmod ? The fight between a programmer and a floating point number: https://youtu.be/TJRaLnLDMWg |
| Bonita Montero <Bonita.Montero@gmail.com>: Apr 05 08:45AM +0200 > I do not think that standard C/C++ gives any hardcore guarantees on the > level of precision its going to give a fmod. You should reap existing > code in compilers. fmod works correctly in current implementations and gives results >= 0 and < y; |
| Bonita Montero <Bonita.Montero@gmail.com>: Apr 05 08:46AM +0200 > if (std::signbit(result)) result += y; > return std::copysign(result, x); > } Sorry, that doesn't represent the difficulty in implenting fmod(). |
| Bonita Montero <Bonita.Montero@gmail.com>: Apr 05 08:49AM +0200 That's a correct trunc(): double trunc( double d ) { static_assert(sizeof(double) == sizeof(uint64_t), "sizeof(double) not equal to sizeof(uint64_t)"); static_assert(numeric_limits<double>::is_iec559, "double must be IEEE-754"); // assume size_t is our CPU's native register-width static_assert(sizeof(size_t) == sizeof(uint64_t) || sizeof(size_t) == sizeof(uint32_t), "register-width must be 32 or 64 bit"); if constexpr( sizeof(size_t) == sizeof(uint64_t) ) // we have 64 bit registers { unsigned const MANTISSA_BITS = 52, EXP_BIAS = 0x3FF, INF_NAN_BASE = 0x7FF; uint64_t const EXP_MASK = (uint64_t)0x7FF << MANTISSA_BITS, SIGN_MASK = (uint64_t)0x800 << MANTISSA_BITS , MANTISSA_MASK = 0x000FFFFFFFFFFFFFu, NAN_MASK = 0x0008000000000000u, MIN_INTEGRAL_DIGITS_EXP = (uint64_t) EXP_BIAS << MANTISSA_BITS, MIN_INTEGRAL_ONLY_EXP = (uint64_t)(EXP_BIAS + MANTISSA_BITS) << MANTISSA_BITS, INF_NAN_EXP = (uint64_t)INF_NAN_BASE << MANTISSA_BITS; int64_t const MANTISSA_SHIFT_MASK = 0xFFF0000000000000u; union { double du; uint64_t dx; }; du = d; uint64_t exp = dx & EXP_MASK; if( exp >= MIN_INTEGRAL_DIGITS_EXP ) // value has integral digits if( exp < MIN_INTEGRAL_ONLY_EXP ) { // there are fraction-digits to mask out, mask them unsigned shift = (unsigned)(exp >> MANTISSA_BITS) - EXP_BIAS; dx &= MANTISSA_SHIFT_MASK >> shift; return du; } else if( exp < INF_NAN_EXP ) // value is integral return du; else { uint64_t mantissa = dx & MANTISSA_MASK; // infinite, NaN: return value if( !mantissa || mantissa & NAN_MASK ) return du; // SNaN: raise exception on SNaN if necessary feraiseexcept( FE_INVALID ); return du; } else { // below +/-1.0 // return +/-0.0 dx &= SIGN_MASK; return du; } } else if constexpr( sizeof(size_t) == sizeof(uint32_t) ) // we have 32 bit registers { unsigned const MANTISSA_BITS = 52, HI_MANTISSA_BITS = 20, EXP_BIAS = 0x3FF, INF_NAN_BASE = 0x7FF; uint32_t const EXP_MASK = (uint32_t)0x7FFu << HI_MANTISSA_BITS, SIGN_MASK = (uint32_t)0x800u << HI_MANTISSA_BITS, HI_MANTISSA_MASK = 0x000FFFFFu, NAN_MASK = 0x00080000u, MIN_INTEGRAL_DIGITS_EXP = (uint32_t) EXP_BIAS << HI_MANTISSA_BITS, MAX_INTEGRAL32_EXP = (uint32_t)(EXP_BIAS + HI_MANTISSA_BITS) << HI_MANTISSA_BITS, MIN_INTEGRAL_ONLY_EXP = (uint32_t)(EXP_BIAS + MANTISSA_BITS) << HI_MANTISSA_BITS, INF_NAN_EXP = (uint32_t)INF_NAN_BASE << HI_MANTISSA_BITS, NEG_LO_MANTISSA_SHIFT_MASK = 0xFFFFFFFFu; int32_t const HI_MANTISSA_SHIFT_MASK = 0xFFF00000; union { double du; struct { uint32_t dxLo; uint32_t dxHi; }; }; du = d; uint32_t exp = dxHi & EXP_MASK; if( exp >= MIN_INTEGRAL_DIGITS_EXP ) // value has integral digits if( exp < MIN_INTEGRAL_ONLY_EXP ) // there are fraction-digits to mask out if( exp <= MAX_INTEGRAL32_EXP ) { // the fraction digits are in the upper dword, mask them and zero the lower dword unsigned shift = (unsigned)(exp >> HI_MANTISSA_BITS) - EXP_BIAS; dxHi &= HI_MANTISSA_SHIFT_MASK >> shift; dxLo = 0; return du; } else { // the fraction digits are in the lower dword, mask them unsigned shift = (unsigned)(exp >> HI_MANTISSA_BITS) - EXP_BIAS - HI_MANTISSA_BITS; dxLo &= ~(NEG_LO_MANTISSA_SHIFT_MASK >> shift); return du; } else if( exp < INF_NAN_EXP ) // value is integral return du; else { uint32_t hiMantissa = dxHi & HI_MANTISSA_MASK; // infinite, NaN: return value if( !(hiMantissa | dxLo) || hiMantissa & NAN_MASK ) return du; // SNaN: raise exception on SNaN if necessary feraiseexcept( FE_INVALID ); return du; } else { // below +/-1.0 // return +/-0.0 dxHi &= SIGN_MASK; dxLo = 0; return du; } } } |
| Paavo Helde <myfirstname@osa.pri.ee>: Apr 05 10:17AM +0300 05.04.2021 03:00 Bart kirjutas: >> argument. > I forget to check other compilers which do have it. > But it's not part of msvcrt.dll which my bcc relies on. In my setup, trunc() resides in ucrtbase.dll (C:\Windows\System32\ucrtbase.dll). UCRT stands for "universal C run-time" which is an optional component when installing Visual Studio (I gather "universal" means it does not depend on the MSVC version). |
| Bonita Montero <Bonita.Montero@gmail.com>: Apr 05 09:49AM +0200 That's the code from glibc: /* @(#)e_fmod.c 5.1 93/09/24 */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ #if defined(LIBM_SCCS) && !defined(lint) static char rcsid[] = "$NetBSD: e_fmod.c,v 1.8 1995/05/10 20:45:07 jtc Exp $";
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment