Horner's method

Given the polynomialp(x) = \sum_{i=0}^n a_i x^i = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_n x^n,where a_0, \ldots, a_n are constant coefficients, the problem is to evaluate the polynomial at a specific value x_0 of x. For this, a new sequence of constants is defined recursively as follows: {{NumBlk||\begin{align} b_n & := a_n \\ b_{n-1} & := a_{n-1} + b_n x_0 \\ & ~~~ \vdots \\ b_1 & := a_1 + b_2 x_0 \\ b_0 & := a_0 + b_1 x_0. \end{align}|}} Then b_0 is the value of p(x_0). To see why this works, the polynomial can be written in the form p(x) = a_0 + x \bigg(a_1 + x \Big(a_2 + x \big(a_3 + \cdots + x(a_{n-1} + x \, a_n) \cdots \big) \Big) \bigg) \ . Thus, by iteratively substituting the b_i into the expression, \begin{align} p(x_0) & = a_0 + x_0\Big(a_1 + x_0\big(a_2 + \cdots + x_0(a_{n-1} + b_n x_0) \cdots \big)\Big) \\ & = a_0 + x_0\Big(a_1 + x_0\big(a_2 + \cdots + x_0 b_{n-1}\big)\Big) \\ & ~~ \vdots \\ & = a_0 + x_0 b_1 \\ & = b_0. \end{align} Similarly, it can be shown that: {{NumBlk|| p(x) = \left(b_1 + b_2 x + b_3 x^2 + b_4x^3 + \cdots + b_{n-1} x^{n-2} +b_nx^{n-1}\right) \left(x - x_0\right) + b_0 |}} Suggesting a convenient procedure for determining the result of the polynomial division p(x) / (x-x_0) with b_0 (which is equal to p(x_0)) being the division's remainder. If x_0 is a root of p(x), then b_0 = 0 (meaning the remainder is 0) and x-x_0 a factor of p(x). Examples Evaluate f(x)=2x^3-6x^2+2x-1 for x=3. We use synthetic division as follows: :\begin{array}{cc} \begin{array}{r} \\ 3 \\ \\ \\ \end{array} & \begin{array}{|rrrr} \ 2 & -6 & 2 & -1 \\ & 6 & 0 & 6 \\ \hline 2 & 0 & 2 & 5 \end{array} \end{array} The entries in the third row are the sum of those in the first two. Each entry in the second row is the product of the -value ( in this example) with the third-row entry immediately to the left. The entries in the first row are the coefficients of the polynomial to be evaluated. Then the remainder of f(x) on division by x-3 is . But by the polynomial remainder theorem, we know that the remainder is f(3) . Thus, f(3) = 5. In this example, if a_3 = 2, a_2 = -6, a_1 = 2, a_0 = -1 we can see that b_3 = 2, b_2 = 0, b_1 = 2, b_0 = 5 , the entries in the third row. So, synthetic division (which was actually invented and published by Ruffini 10 years before Horner's publication) is easier to use; it can be shown to be equivalent to Horner's method. As a consequence of the polynomial remainder theorem, the entries in the third row are the coefficients of the second-degree polynomial, the quotient of f(x) on division by x-3 . The remainder is . This makes Horner's method useful for polynomial long division. Divide x^3-6x^2+11x-6 by x-2: :\begin{array}{cc} \begin{array}{r} \\ 2 \\ \\ \\ \end{array} & \begin{array}{|rrrr} \ 1 & -6 & 11 & -6 \\ & 2 & -8 & 6 \\ \hline 1 & -4 & 3 & 0 \end{array} \end{array} The quotient is x^2-4x+3. Let f_1(x)=4x^4-6x^3+3x-5 and f_2(x)=2x-1. Divide f_1(x) by f_2\,(x) using Horner's method. \begin{array}{cc} \begin{array}{r} \\ 0.5 \\ \\ \\ \end{array} & \begin{array}{|rrrrr} \ 4 & -6 & 0 & 3 & -5 \\ & 2 & -2 & -1 & 1 \\ \hline 2 & -2 & -1 & 1 & -4 \end{array} \end{array} The third row is the sum of the first two rows, divided by . Each entry in the second row is the product of with the third-row entry to the left. The answer is \frac{f_1(x)}{f_2(x)}=2x^3-2x^2-x+1-\frac{4}{2x-1}. Efficiency Evaluation using the monomial form of a degree n polynomial requires at most n additions and (n^2+n)/2 multiplications, if powers are calculated by repeated multiplication and each monomial is evaluated individually. The cost can be reduced to n additions and 2n-1 multiplications by evaluating the powers of x by iteration. If numerical data are represented in terms of digits (or bits), then the naive algorithm also entails storing approximately 2n times the number of bits of x: the evaluated polynomial has approximate magnitude x^n, and one must also store x^n itself. By contrast, Horner's method requires only n additions and n multiplications, and its storage requirements are only n times the number of bits of x. Alternatively, Horner's method can be computed with n fused multiply–adds. Horner's method can also be extended to evaluate the first k derivatives of the polynomial with kn additions and multiplications. Horner's method is optimal, in the sense that any algorithm to evaluate an arbitrary polynomial must use at least as many operations. Alexander Ostrowski proved in 1954 that the number of additions required is minimal. Victor Pan proved in 1966 that the number of multiplications is minimal. However, when x is a matrix, Horner's method is not optimal. This assumes that the polynomial is evaluated in monomial form and no preconditioning of the representation is allowed, which makes sense if the polynomial is evaluated only once. However, if preconditioning is allowed and the polynomial is to be evaluated many times, then faster algorithms are possible. They involve a transformation of the representation of the polynomial. In general, a degree-n polynomial can be evaluated using only +2 multiplications and n additions. Parallel evaluation A disadvantage of Horner's rule is that all of the operations are sequentially dependent, so it is not possible to take advantage of instruction level parallelism on modern computers. In most applications where the efficiency of polynomial evaluation matters, many low-order polynomials are evaluated simultaneously (for each pixel or polygon in computer graphics, or for each grid square in a numerical simulation), so it is not necessary to find parallelism within a single polynomial evaluation. If, however, one is evaluating a single polynomial of very high order, it may be useful to break it up as follows: \begin{align} p(x) & = \sum_{i=0}^n a_i x^i \\[1ex] & = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_n x^n \\[1ex] & = \left( a_0 + a_2 x^2 + a_4 x^4 + \cdots\right) + \left(a_1 x + a_3 x^3 + a_5 x^5 + \cdots \right) \\[1ex] & = \left( a_0 + a_2 x^2 + a_4 x^4 + \cdots\right) + x \left(a_1 + a_3 x^2 + a_5 x^4 + \cdots \right) \\[1ex] & = \sum_{i=0}^{\lfloor n/2 \rfloor} a_{2i} x^{2i} + x \sum_{i=0}^{\lfloor n/2 \rfloor} a_{2i+1} x^{2i} \\[1ex] & = p_0(x^2) + x p_1(x^2). \end{align} More generally, the summation can be broken into k parts: p(x) = \sum_{i=0}^n a_i x^i = \sum_{j=0}^{k-1} x^j \sum_{i=0}^{\lfloor n/k \rfloor} a_{ki+j} x^{ki} = \sum_{j=0}^{k-1} x^j p_j(x^k) where the inner summations may be evaluated using separate parallel instances of Horner's method. This requires slightly more operations than the basic Horner's method, but allows k-way SIMD execution of most of them. Modern compilers generally evaluate polynomials this way when advantageous, although for floating-point calculations this requires enabling (unsafe) reassociative math. Another use of breaking a polynomial down this way is to calculate steps of the inner summations in an alternating fashion to take advantage of instruction-level parallelism. Application to floating-point multiplication and division Horner's method is a fast, code-efficient method for multiplication and division of binary numbers on a microcontroller with no hardware multiplier. One of the binary numbers to be multiplied is represented as a trivial polynomial, where (using the above notation) a_i = 1, and x = 2. Then, x (or x to some power) is repeatedly factored out. In this binary numeral system (base 2), x = 2, so powers of 2 are repeatedly factored out. Example For example, to find the product of two numbers (0.15625) and m: \begin{align} (0.15625) m & = (0.00101_b) m = \left( 2^{-3} + 2^{-5} \right) m = \left( 2^{-3})m + (2^{-5} \right)m \\ & = 2^{-3} \left(m + \left(2^{-2}\right)m\right) = 2^{-3} \left(m + 2^{-2} (m)\right). \end{align} Method To find the product of two binary numbers d and m: • A register holding the intermediate result is initialized to d. • Begin with the least significant (rightmost) non-zero bit in m. {{ordered list | list-style-type = lower-alpha | start = 2 • If all the non-zero bits were counted, then the intermediate result register now holds the final result. Otherwise, add d to the intermediate result, and continue in step 2 with the next most significant bit in m. Derivation In general, for a binary number with bit values ( d_3 d_2 d_1 d_0 ) the product is (d_3 2^3 + d_2 2^2 + d_1 2^1 + d_0 2^0)m = d_3 2^3 m + d_2 2^2 m + d_1 2^1 m + d_0 2^0 m. At this stage in the algorithm, it is required that terms with zero-valued coefficients are dropped, so that only binary coefficients equal to one are counted, thus the problem of multiplication or division by zero is not an issue, despite this implication in the factored equation: = d_0\left(m + 2 \frac{d_1}{d_0} \left(m + 2 \frac{d_2}{d_1} \left(m + 2 \frac{d_3}{d_2} (m)\right)\right)\right). The denominators all equal one (or the term is absent), so this reduces to = d_0(m + 2 {d_1} (m + 2 {d_2} (m + 2 {d_3} (m)))), or equivalently (as consistent with the "method" described above) = d_3(m + 2^{-1} {d_2} (m + 2^{-1}{d_1} (m + {d_0} (m)))). In binary (base-2) math, multiplication by a power of 2 is merely a register shift operation. Thus, multiplying by 2 is calculated in base-2 by an arithmetic shift. The factor (2−1) is a right arithmetic shift, a (0) results in no operation (since 20 = 1 is the multiplicative identity element), and a (21) results in a left arithmetic shift. The multiplication product can now be quickly calculated using only arithmetic shift operations, addition and subtraction. The method is particularly fast on processors supporting a single-instruction shift-and-addition-accumulate. Compared to a C floating-point library, Horner's method sacrifices some accuracy, however it is nominally 13 times faster (16 times faster when the "canonical signed digit" (CSD) form is used) and uses only 20% of the code space. Other applications Horner's method can be used to convert between different positional numeral systems – in which case x is the base of the number system, and the ai coefficients are the digits of the base-x representation of a given number – and can also be used if x is a matrix, in which case the gain in computational efficiency is even greater. However, for such cases faster methods are known. == Polynomial root finding ==