Newton–Raphson division Newton–Raphson uses
Newton's method to find the
reciprocal of D and multiply that reciprocal by N to find the The steps of Newton–Raphson division are: • Calculate an estimate X_0 for the reciprocal 1/D of the divisor D. • Compute successively more accurate estimates X_1,X_2,\ldots,X_S of the reciprocal. This is where one employs the Newton–Raphson method as such. • Compute the quotient by multiplying the dividend by the reciprocal of the divisor: Q = N X_S. In order to apply Newton's method to find the reciprocal of D, it is necessary to find a function f(X) that has a zero at X=1/D. The obvious such function is f(X)=DX-1, but the Newton–Raphson iteration for this is unhelpful, since it cannot be computed without already knowing the reciprocal of D (moreover it attempts to compute the exact reciprocal in one step, rather than allow for iterative improvements). A function that does work is f(X)=(1/X)-D, for which the Newton–Raphson iteration gives : X_{i+1} = X_i - {f(X_i)\over f'(X_i)} = X_i - {1/X_i - D\over -1/X_i^2} = X_i + X_i(1-DX_i) = X_i(2-DX_i), which can be calculated from X_i using only multiplication and subtraction, or using two
fused multiply–adds. From a computation point of view, the expressions X_{i+1} = X_i + X_i(1-DX_i) and X_{i+1} = X_i(2-DX_i) are not equivalent. To obtain a result with a precision of 2
n bits while making use of the second expression, one must compute the product between X_i and (2-DX_i) with double the given precision of X_i(
n bits). In contrast, the product between X_i and (1-DX_i) need only be computed with a precision of
n bits, because the leading
n bits (after the binary point) of (1-DX_i) are zeros. If the error is defined as \varepsilon_i = 1 - D X_i, then: :\begin{align} \varepsilon_{i+1} &= 1 - D X_{i+1} \\ &= 1 - D (X_i(2-DX_i)) \\ &= 1 - 2DX_i + D^2X_i^2 \\ &= (1 - DX_i)^2 \\ &= {\varepsilon_i}^2. \\ \end{align} This squaring of the error at each iteration step the so-called
quadratic convergence of Newton–Raphson's method has the effect that the number of correct digits in the result roughly
doubles for every iteration, a property that becomes extremely valuable when the numbers involved have many digits (e.g. in the large integer domain). But it also means that the initial convergence of the method can be comparatively slow, especially if the initial estimate X_0 is poorly chosen.
Initial estimate For the subproblem of choosing an initial estimate X_0, it is convenient to apply a bit-shift to the divisor
D to scale it so that 0.5 ≤
D ≤ 1. Applying the same bit-shift to the numerator
N ensures the quotient does not change. Once within a bounded range, a simple polynomial
approximation can be used to find an initial estimate. The linear
approximation with minimum worst-case absolute error on the interval [0.5,1] is: :X_0 = {48 \over 17} - {32 \over 17} D. The coefficients of the linear approximation T_0 +T_1 D are determined as follows. The absolute value of the error is |\varepsilon_0| = |1 - D(T_0 + T_1 D)|. The minimum of the maximum absolute value of the error is determined by the
Chebyshev equioscillation theorem applied to F(D) = 1 - D(T_0 + T_1 D). The local minimum of F(D) occurs when F'(D) = 0, which has solution D = -T_0/(2T_1). The function at that minimum must be of opposite sign as the function at the endpoints, namely, F(1/2) = F(1) = -F(-T_0/(2T_1)). The two equations in the two unknowns have a unique solution T_0 = 48/17 and T_1 = -32/17, and the maximum error is F(1) = 1/17. Using this approximation, the absolute value of the error of the initial value is less than :\vert \varepsilon_0 \vert \leq {1 \over 17} \approx 0.059 . The best quadratic fit to 1/D in the interval is : X := \frac{140}{33} - \frac{64}{11} D + \frac{256}{99} D^2. It is chosen to make the error equal to a re-scaled third order
Chebyshev polynomial of the first kind, and gives an absolute value of the error less than or equal to 1/99. This improvement is equivalent to \log_2(\log 99/\log 17) \approx 0.7 Newton–Raphson iterations, at a computational cost of less than one iteration. It is possible to generate a polynomial fit of degree larger than 2, computing the coefficients using the
Remez algorithm. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson. Since for this method the
convergence is exactly quadratic, it follows that, from an initial error \varepsilon_0, S iterations will give an answer accurate to :P = -2^S \log_2 \varepsilon_0 - 1 = 2^S \log_2(1/\varepsilon_0) - 1 binary places. Typical values are: A quadratic initial estimate plus two iterations is accurate enough for IEEE
single precision, but three iterations are marginal for
double precision. A linear initial estimate plus four iterations is sufficient for both double and
double extended formats.
Pseudocode The following computes the quotient of and with a precision of binary places: Express D as M × 2e where 1 ≤ M D' := D / 2e+1
// scale between 0.5 and 1, can be performed with bit shift / exponent subtraction N' := N / 2e+1 X := 48/17 − 32/17 × D'
// precompute constants with same precision as D {{nowrap|
repeat \left \lceil \log_2 \frac{P + 1}{\log_2 17} \right \rceil \,
times}}
// can be precomputed based on fixed P X := X + X × (1 - D' × X)
end return N' × X For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts.
Cubic iteration There is an iteration which uses three multiplications to cube the error: : \varepsilon_i = 1 - D X_i : Y_i = X_i \varepsilon_i : X_{i+1} = X_i + Y_i + Y_i \varepsilon_i . The
Yiεi term is new. Expanding out the above, X_{i+1} can be written as :\begin{align} X_{i+1} &= X_i + X_i\varepsilon_i + X_i\varepsilon_i^2 \\ &= X_i + X_i(1-DX_i) + X_i(1-DX_i)^2 \\ &= 3X_i - 3DX_i^2 + D^2X_i^3 , \end{align} with the result that the error term :\begin{align} \varepsilon_{i+1} &= 1 - DX_{i+1} \\ &= 1 - 3DX_i + 3D^2X_i^2 - D^3X_i^3 \\ &= (1 - DX_i)^3 \\ &= \varepsilon_i^3 . \end{align} This is 3/2 the computation of the quadratic iteration, but achieves \log 3 / \log 2 \approx 1.585 as much convergence, so is slightly more efficient. Put another way, two iterations of this method raise the error to the ninth power at the same computational cost as three quadratic iterations, which only raise the error to the eighth power. The number of correct bits after S iterations is :P = -3^S \log_2 \varepsilon_0 - 1 = 3^S \log_2(1/\varepsilon_0) - 1 binary places. Typical values are: A quadratic initial estimate plus two cubic iterations provides ample precision for an IEEE double-precision result. It is also possible to use a mixture of quadratic and cubic iterations. Using at least one quadratic iteration ensures that the error is positive, i.e. the reciprocal is underestimated. This can simplify a following rounding step if an exactly-rounded quotient is required. Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations.
Goldschmidt division Goldschmidt division (after Robert Elliott Goldschmidt) uses an iterative process of repeatedly multiplying both the dividend and divisor by a common factor
Fi, chosen such that the divisor converges to 1. This causes the dividend to converge to the sought quotient
Q: :Q = \frac{N}{D} \frac{F_1}{F_1} \frac{F_2}{F_2} \frac{F_\ldots}{F_\ldots}. The steps for Goldschmidt division are: • Generate an estimate for the multiplication factor
Fi . • Multiply the dividend and divisor by
Fi . • If the divisor is sufficiently close to 1, return the dividend, otherwise, loop to step 1. Assuming
N/
D has been scaled so that 0 i
is based on D'': :F_{i+1} = 2 - D_i. Multiplying the dividend and divisor by the factor yields: :\frac{N_{i+1}}{D_{i+1}} = \frac{N_i}{D_i}\frac{F_{i+1}}{F_{i+1}}. After a sufficient number
k of iterations Q=N_k. The Goldschmidt method is used in
AMD Athlon CPUs and later models. It is also known as Anderson Earle Goldschmidt Powers (AEGP) algorithm and is implemented by various
IBM processors. Although it converges at the same rate as a Newton–Raphson implementation, one advantage of the Goldschmidt method is that the multiplications in the numerator and in the denominator can be done in parallel.
Binomial theorem The Goldschmidt method can be used with factors that allow simplifications by the
binomial theorem. Assume has been scaled by a
power of two such that D\in\left(\tfrac{1}{2},1\right]. We choose D = 1-x and F_{i} = 1+x^{2^i}. This yields : \frac{N}{1-x} = \frac{N\cdot(1+x)}{1-x^2} = \frac{N\cdot(1+x)\cdot(1+x^2)}{1-x^4} = \cdots = Q' = \frac{N' = N\cdot(1+x)\cdot(1+x^2)\cdot\cdot\cdot(1+x^{2^{(n-1)}})}{D' = 1-x^{2^n} \approx 1} . After steps \left( x\in\left[0,\tfrac{1}{2}\right) \right), the denominator 1-x^{2^n} can be rounded to with a
relative error : \varepsilon_n = \frac{Q' - N'}{Q'} = x^{2^n} which is maximum at 2^{-2^n} when x = \tfrac{1}{2}, thus providing a minimum precision of 2^n binary digits. == Large-integer methods ==