Home » Posts tagged 'remez exchange'

# Tag Archives: remez exchange

## Minimax approximation to arctan / atan / atan2

Even during 10 years in a past life working in a group that designed FIR filters, I never grappled with the Remez exchange algorithm hands-on.  This week I finally found the excuse I needed.

A while ago, I needed a crude but very fast approximation to arctan.  In fact it was atan2, but I’ll start with the easier problem first.  With a release of the application imminent, I wanted to credit the original source I got the formula from, but I couldn’t find it any longer.  So I set about recomputing the formula for myself.

The version I found a while ago looked like this:

`atan(x) ≈ (0.97179803008 * x) - (0.19065470515 * x**3)`

for x in the range [-1,+1], i.e. the result is in the range [-pi/4,+pi/4].  The maximum error in the result is just less than the equivalent of 0.3 degrees.

[If the person behind these coefficients gets in touch, I’ll be more than happy to give credit and link to the original work here.]

Having now got the Remez exchange algorithm working, I found a slightly improved version is

`atan(x) ≈ (0.97239 * x) - (0.19195 * x**3)`

which has a worst-case error of 0.2837 degrees.  Even better, a plot shows the 6 extrema all have this absolute error, as expected.

Moving to the original atan2 problem, the requirement was rather unusual.  Starting with the x and y arguments as integers, the answer is to be expressed as a 32-bit integer, where 2**32 represents 360 degrees.  This allows subsequent rotations to be done using integer add and subtract, with the convenient property that multiples of 360 degrees are lost by the modulo wraparound at 2**32.  The resulting function looks like this, with the coefficients as shown earlier:

```#define K1 (0.97239)
#define K2 (0.19195)
#define K3 ((float)((double)(1<<29) * 4.0 * K1 / M_PI))
#define K4 ((float)((double)(1<<29) * 4.0 * K2 / M_PI))

int32_t atan2i_approx2(int32_t y, int32_t x)
{
int32_t u, v, s;
uint32_t w;
float f, f2, g;
int32_t r;
float fx, fy;
float tt, bb;

fx = (float) x;
fy = (float) y;
u = x + y;
v = x - y;
w = (u ^ v) >> 31;
tt =  ? -fx : fy;
bb =  ? fy : fx;
f = tt / bb;
f2 = f * f;
g = f * (K3 - K4*f2);
s = (((u >> 30) & 3) ^ ((v >> 31) & 1)) << 30;
r = s + (int32_t) g;
return r;
}```