Binary long multiplication The method taught in school for multiplying decimal numbers is based on calculating partial products, shifting them to the left and then adding them together. The most difficult part is to obtain the partial products, as that involves multiplying a long number by one digit (from 0 to 9): 123 × 456 ===== 738 (this is 123 × 6) 615 (this is 123 × 5, shifted one position to the left) + 492 (this is 123 × 4, shifted two positions to the left) ===== 56088 A binary computer does exactly the same multiplication as decimal numbers do, but with binary numbers. In binary encoding each long number is multiplied by one digit (either 0 or 1), and that is much easier than in decimal, as the product by 0 or 1 is just 0 or the same number. Therefore, the multiplication of two binary numbers comes down to calculating partial products (which are 0 or the first number),
shifting them left, and then adding them together (a binary addition, of course): 1011 (this is binary for decimal 11) × 1110 (this is binary for decimal 14) ====== 0000 (this is 1011 × 0) 1011 (this is 1011 × 1, shifted one position to the left) 1011 (this is 1011 × 1, shifted two positions to the left) + 1011 (this is 1011 × 1, shifted three positions to the left) ========= 10011010 (this is binary for decimal 154) This is much simpler than in the decimal system, as there is no table of multiplication to remember: just shifts and adds. This method is mathematically correct and has the advantage that a small CPU may perform the multiplication by using the shift and add features of its arithmetic logic unit rather than a specialized circuit. The method is slow, however, as it involves many intermediate additions. These additions are time-consuming. Faster multipliers may be engineered in order to do fewer additions; a modern processor might implement a dedicated parallel adder for partial products, letting the multiplication of two 64-bit numbers be done with only 6 rounds of additions, rather than 63.
A concrete view Suppose we want to multiply two
unsigned 8-bit integers together:
a[7:0] and
b[7:0]. We can produce eight partial products by performing eight 1-bit multiplications, one for each bit in multiplicand
a: p0[7:0] = a[0] × b[7:0] = {8{a[0]}} & b[7:0] p1[7:0] = a[1] × b[7:0] = {8{a[1]}} & b[7:0] p2[7:0] = a[2] × b[7:0] = {8{a[2]}} & b[7:0] p3[7:0] = a[3] × b[7:0] = {8{a[3]}} & b[7:0] p4[7:0] = a[4] × b[7:0] = {8{a[4]}} & b[7:0] p5[7:0] = a[5] × b[7:0] = {8{a[5]}} & b[7:0] p6[7:0] = a[6] × b[7:0] = {8{a[6]}} & b[7:0] p7[7:0] = a[7] × b[7:0] = {8{a[7]}} & b[7:0] where the following
Verilog notation is used: • {8{a[0]}} means repeating a[0] (the 0th bit of a) 8 times. •
a[7:0] stands for the selecting
a from its 7th bit to its 0th bit, inclusive, for a total of 8 bits. • & symbol stands for a bitwise AND. In order to obtain our product, we then need to add up all eight of our partial products, as shown here: p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0] + p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[0] 0 + p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[0] 0 0 + p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[0] 0 0 0 + p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[0] 0 0 0 0 + p5[7] p5[6] p5[5] p5[4] p5[3] p5[2] p5[1] p5[0] 0 0 0 0 0 + p6[7] p6[6] p6[5] p6[4] p6[3] p6[2] p6[1] p6[0] 0 0 0 0 0 0 + p7[7] p7[6] p7[5] p7[4] p7[3] p7[2] p7[1] p7[0] 0 0 0 0 0 0 0 ----------------------------------------------------------------------------------------------- P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0] In other words,
P[15:0] is produced by summing
p0,
p1 << 1,
p2 << 2, and so forth, to produce our final unsigned 16-bit product:
P[15:0] = p0[7:0] + (p1[7:0] << 1) + (p2[7:0] << 2) + (p3[7:0] << 3) + (p4[7:0] << 4) + (p5[7:0] << 5) + (p6[7:0] << 6) + (p7[7:0] << 7)
Building on top of smaller blocks Suppose we now want to multiply two
unsigned 16-bit integers together:
u[15:0] and
v[15:0] using the 8-by-8-bit multiplier from before. Using the
partial products method (justified by the associativity of multiplication) and running in 8-bit chunks, we have: p0[15:0] = u[7:0] × v[7:0] p1[15:0] = u[15:8] × v[7:0] p2[15:0] = u[7:0] × v[15:8] p3[15:0] = u[15:8] × v[15:8] The final product would then be: P[31:0] = p0 + p1 << 8 + p2 << 8 + p3 << 16 One notices that if only P[15:0] (the lower 16 bits of the result) is required, no computation of p3 is needed. ==Hardware implementation==