Toom–Cook multiplication

This section discusses exactly how to perform Toom-k for any given value of k, and is a simplification of a description of Toom–Cook polynomial multiplication described by Marco Bodrato. The algorithm has five main steps: • Splitting • Evaluation • Pointwise multiplication • Interpolation • Recomposition In a typical large integer implementation, each integer is represented as a sequence of digits in positional notation, with the base or radix set to some (typically large) value b; for this example we use b = 10000, so that each digit corresponds to a group of four decimal digits (in a computer implementation, b would typically be a power of 2 instead). Say the two integers being multiplied are: : These are much smaller than would normally be processed with Toom–Cook (grade-school multiplication would be faster) but they will serve to illustrate the algorithm. Splitting In Toom-k, we want to split the factors into k parts. The first step is to select the base B = bi, such that the number of digits of both m and n in base B is at most k (e.g., 3 in Toom-3). A typical choice for i is given by: :i = \max\left\{\left\lfloor\frac{\left\lfloor\log_b m\right\rfloor}{k}\right\rfloor, \left\lfloor\frac{\left\lfloor\log_b n\right\rfloor}{k}\right\rfloor\right\} + 1. In our example we'll be doing Toom-3, so we choose . We then separate m and n into their base B digits mi, ni: : \begin{align} m_2 & {} = 123456 \\ m_1 & {} = 78901234 \\ m_0 & {} = 56789012 \\ n_2 & {} = 98765 \\ n_1 & {} = 43219876 \\ n_0 & {} = 54321098 \end{align} We then use these digits as coefficients in degree- polynomials p and q, with the property that p(B) = m and q(B) = n: :p(x) = m_2x^2 + m_1x + m_0 = 123456x^2 + 78901234x + 56789012 \, :q(x) = n_2x^2 + n_1x + n_0 = 98765x^2 + 43219876x + 54321098 \, The purpose of defining these polynomials is that if we can compute their product , our answer will be . In the case where the numbers being multiplied are of different sizes, it's useful to use different values of k for m and n, which we'll call km and kn. For example, the algorithm "Toom-2.5" refers to Toom–Cook with km = 3 and kn = 2. In this case the i in B = bi is typically chosen by: :i = \max\left\{\left\lfloor\frac{\left\lceil\log_b m\right\rceil}{k_m}\right\rfloor, \left\lfloor\frac{\left\lceil\log_b n\right\rceil}{k_n}\right\rfloor\right\}. Evaluation The Toom–Cook approach to computing the polynomial product p(x)q(x) is a commonly used one. Note that a polynomial of degree d is uniquely determined by d+1 points (for example, a line – polynomial of degree one is specified by two points). The idea is to evaluate p(\cdot) and q(\cdot) at various points. Then multiply their values at these points to get points on the product polynomial. Finally interpolate to find its coefficients. Since \deg(pq) = \deg(p) + \deg(q), we will need \deg(p) + \deg(q) + 1 = k_m + k_n - 1 points to determine the final result. Call this d. In the case of Toom-3, d = 5. The algorithm will work no matter what points are chosen (with a few small exceptions, see matrix invertibility requirement in Interpolation), but in the interest of simplifying the algorithm it's better to choose small integer values like 0, 1, −1, and −2. One unusual point value that is frequently used is infinity, written \infty or 1/0. To "evaluate" a polynomial p at infinity actually means to take the limit of p(x)/x^{\deg p} as x goes to infinity. Consequently, p(\infty) is always the value of its highest-degree coefficient (in the example above coefficient m_2). In our Toom-3 example, we will use the points 0, 1, -1, -2, and \infty. These choices simplify evaluation, producing the formulas: : \begin{array}{lrlrlr} p(0) & = & m_0 + m_1(0) + m_2(0)^2 & = & m_0 \\ p(1) & = & m_0 + m_1(1) + m_2(1)^2 & = & m_0 + m_1 + m_2 \\ p(-1) & = & m_0 + m_1(-1) + m_2(-1)^2 & = & m_0 - m_1 + m_2 \\ p(-2) & = & m_0 + m_1(-2) + m_2(-2)^2 & = & m_0 - 2m_1 + 4m_2 \\ p(\infty) & = & m_2 & & \end{array} and analogously for q. In our example, the values we get are: : \begin{array}{lrlrlr} p(0) & = & m_0 & = & 56789012 & = & 56789012 \\ p(1) & = & m_0 + m_1 + m_2 & = & 56789012 + 78901234 + 123456 & = & 135813702 \\ p(-1) & = & m_0 - m_1 + m_2 & = & 56789012 - 78901234 + 123456 & = & -21988766 \\ p(-2) & = & m_0 - 2m_1 + 4m_2 & = & 56789012 - 2\times 78901234 + 4\times 123456 & = & -100519632 \\ p(\infty) & = & m_2 & = & 123456 & = & 123456 \\[4pt] q(0) & = & n_0 & = & 54321098 & = & 54321098 \\ q(1) & = & n_0 + n_1 + n_2 & = & 54321098 + 43219876 + 98765 & = & 97639739 \\ q(-1) & = & n_0 - n_1 + n_2 & = & 54321098 - 43219876 + 98765 & = & 11199987 \\ q(-2) & = & n_0 - 2n_1 + 4n_2 & = & 54321098 - 2\times 43219876 + 4\times 98765 & = & -31723594 \\ q(\infty) & = & n_2 & = & 98765 & = & 98765 \end{array} As shown, these values may be negative. For the purpose of later explanation, it will be useful to view this evaluation process as a matrix-vector multiplication, where each row of the matrix contains powers of one of the evaluation points, and the vector contains the coefficients of the polynomial: : \left(\begin{matrix}p(0) \\ p(1) \\ p(-1) \\ p(-2) \\ p(\infty)\end{matrix}\right) = \left(\begin{matrix} 0^0 & 0^1 & 0^2 \\ 1^0 & 1^1 & 1^2 \\ (-1)^0 & (-1)^1 & (-1)^2 \\ (-2)^0 & (-2)^1 & (-2)^2 \\ 0 & 0 & 1 \end{matrix}\right) \left(\begin{matrix}m_0 \\ m_1 \\ m_2\end{matrix}\right) = \left(\begin{matrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & -2 & 4 \\ 0 & 0 & 1 \end{matrix}\right) \left(\begin{matrix}m_0 \\ m_1 \\ m_2\end{matrix}\right). The dimensions of the matrix are d by km for p and d by kn for q. The row for infinity is always all zero except for a 1 in the last column. Faster evaluation Multipoint evaluation can be obtained faster than with the above formulas. The number of elementary operations (addition/subtraction) can be reduced. The sequence given by Bodrato for Toom-3, executed here over the first operand (polynomial p) of the running example is the following: \begin{array}{l c l c l c r} p_0 & \leftarrow & m_0 + m_2 & = & 56789012 + 123456 & = & 56912468 \\ p(0) & = & m_0 & = & 56789012 & = & 56789012 \\ p(1) & = & p_0 + m_1 & = & 56912468 + 78901234 & = & 135813702 \\ p(-1) & = & p_0 - m_1 & = & 56912468 - 78901234 & = & -21988766 \\ p(-2) & = & (p(-1) + m_2) \times 2 - m_0 & = & (-21988766 + 123456) \times 2 - 56789012 & = & -100519632 \\ p(\infty) & = & m_2 & = & 123456 & = & 123456. \end{array} This sequence requires five addition/subtraction operations, one less than the straightforward evaluation. Moreover the multiplication by 4 in the calculation of p(-2) was saved. Pointwise multiplication Unlike multiplying the polynomials p(\cdot) and q(\cdot), multiplying the evaluated values p(a) and q(a) just involves multiplying integers — a smaller instance of the original problem. We recursively invoke our multiplication procedure to multiply each pair of evaluated points. In practical implementations, as the operands become smaller, the algorithm will switch to schoolbook long multiplication. Letting r be the product polynomial, in our example we have: \begin{array}{l c l c l c r} r(0) & = & p(0)\,q(0) & = & 56789012 \times 54321098 & = & 3084841486175176 \\ r(1) & = & p(1)\,q(1) & = & 135813702 \times 97639739 & = & 13260814415903778 \\ r(-1) & = & p(-1)\,q(-1) & = & -21988766 \times 11199987 & = & -246273893346042 \\ r(-2) & = & p(-2)\,q(-2) & = & -100519632 \times -31723594 & = & 3188843994597408 \\ r(\infty) & = & p(\infty)\,q(\infty) & = & 123456 \times 98765 & = & 12193131840. \end{array} As shown, these can also be negative. For large enough numbers, this is the most expensive step, the only step that is not linear in the sizes of m and n. Interpolation This is the most complex step, the reverse of the evaluation step: given our d points on the product polynomial r(\cdot), we need to determine its coefficients. In other words, we want to solve this matrix equation for the vector on the right-hand side: : \begin{align} \left(\begin{matrix}r(0) \\ r(1) \\ r(-1) \\ r(-2) \\ r(\infty)\end{matrix}\right) & {} = \left(\begin{matrix} 0^0 & 0^1 & 0^2 & 0^3 & 0^4 \\ 1^0 & 1^1 & 1^2 & 1^3 & 1^4 \\ (-1)^0 & (-1)^1 & (-1)^2 & (-1)^3 & (-1)^4 \\ (-2)^0 & (-2)^1 & (-2)^2 & (-2)^3 & (-2)^4 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}\right) \left(\begin{matrix}r_0 \\ r_1 \\ r_2 \\ r_3 \\ r_4\end{matrix}\right) \\ & {} = \left(\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & -2 & 4 & -8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}\right) \left(\begin{matrix}r_0 \\ r_1 \\ r_2 \\ r_3 \\ r_4\end{matrix}\right). \end{align} This matrix is constructed the same way as the one in the evaluation step, except that it's d\times d. We could solve this equation with a technique like Gaussian elimination, but this is too expensive. Instead, we use the fact that, provided the evaluation points were chosen suitably, this matrix is invertible (see also Vandermonde matrix), and so: : \begin{align} \left(\begin{matrix}r_0 \\ r_1 \\ r_2 \\ r_3 \\ r_4\end{matrix}\right) & {} = \left(\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & -2 & 4 & -8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}\right)^{-1} \left(\begin{matrix}r(0) \\ r(1) \\ r(-1) \\ r(-2) \\ r(\infty)\end{matrix}\right) \\ & {} = \left(\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ \tfrac12 & \tfrac13 & -1 & \tfrac16 & -2 \\ -1 & \tfrac12 & \tfrac12 & 0 & -1 \\ -\tfrac12 & \tfrac16 & \tfrac12 & -\tfrac16 & 2 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}\right) \left(\begin{matrix}r(0) \\ r(1) \\ r(-1) \\ r(-2) \\ r(\infty)\end{matrix}\right). \end{align} All that remains is to compute this matrix-vector product. Although the matrix contains fractions, the resulting coefficients will be integers — so this can all be done with integer arithmetic, just additions, subtractions, and multiplication/division by small constants. A difficult design challenge in Toom–Cook is to find an efficient sequence of operations to compute this product; one sequence given by Bodrato for Toom-3 is the following, executed here over the running example: \begin{array}{l c l c r} r_0 & \leftarrow & r(0) & = & 3084841486175176 \\ r_4 & \leftarrow & r(\infty) & = & 12193131840 \\ r_3 & \leftarrow & (r(-2) - r(1))/3 & = & (3188843994597408 - 13260814415903778)/3 \\ & & & = & -3357323473768790 \\ r_1 & \leftarrow & (r(1) - r(-1))/2 & = & (13260814415903778 - (-246273893346042))/2 \\ & & & = & 6753544154624910 \\ r_2 & \leftarrow & r(-1) - r(0) & = & -246273893346042 - 3084841486175176 \\ & & & = & -3331115379521218 \\ r_3 & \leftarrow & (r_2 - r_3)/2 + 2r(\infty) & = & (-3331115379521218 - (-3357323473768790))/2 + 2 \times 12193131840 \\ & & & = & 13128433387466 \\ r_2 & \leftarrow & r_2 + r_1 - r_4 & = & -3331115379521218 + 6753544154624910 - 12193131840 \\ & & & = & 3422416581971852 \\ r_1 & \leftarrow & r_1 - r_3 & = & 6753544154624910 - 13128433387466 \\ & & & = & 6740415721237444 \end{array} We now know our product polynomial r: : \begin{array}{rrrl} r(x) = && 3084841486175176 & \\ & + & 6740415721237444 & \!\!\!\! x \\ & + & 3422416581971852 & \!\!\!\! x^2 \\ & + & 13128433387466 & \!\!\!\! x^3 \\ & + & 12193131840 & \!\!\!\! x^4 \end{array} If we were using different k_m, k_n, or evaluation points, the matrix and so our interpolation strategy would change; but it does not depend on the inputs and so can be hard-coded for any given set of parameters. Recomposition Finally, we evaluate r(B) to obtain our final answer. This is straightforward since B is a power of b and so the multiplications by powers of B are all shifts by a whole number of digits in base b. In the running example b = 104 and B = b2 = 108. : And this is in fact the product of 1234567890123456789012 and 987654321987654321098. ==Asymmetric splitting==