Multiplication algorithm: Difference between revisions

Content deleted Content added
Fourier transform methods: fmt; decapitalization
m Bot: http → https
 
(184 intermediate revisions by more than 100 users not shown)
Line 1:
{{short description|Algorithm to multiply two numbers}}
{{Use dmy dates|date=May 2019|cs1-dates=y}}
A '''multiplication algorithm''' is an [[algorithm]] (or method) to [[multiplication|multiply]] two numbers. Depending on the size of the numbers, different algorithms are in use. Efficient multiplication algorithms have existed since the advent of the decimal system.
 
A '''multiplication algorithm''' is an [[algorithm]] (or method) to [[multiplication|multiply]] two numbers. Depending on the size of the numbers, different algorithms are more efficient than others. Numerous algorithms are known and there has been much research into the topic.
==Grid method==
{{main|Grid method multiplication}}
The [[grid method multiplication|grid method]] (or box method) is an introductory method for multiple-digit multiplication that is often taught to pupils at [[primary school]] or [[elementary school]] level. It has been a standard part of the national primary-school mathematics curriculum in England and Wales since the late 1990s.<ref>Gary Eason, [http://news.bbc.co.uk/1/hi/education/639937.stm Back to school for parents], ''[[BBC News]]'', 13 February 2000<br>[[Rob Eastaway]], [https://www.bbc.co.uk/news/magazine-11258175 Why parents can't do maths today], ''[[BBC News]]'', 10 September 2010</ref>
 
The oldest and simplest method, known since [[Ancient history|antiquity]] as '''long multiplication''' or '''grade-school multiplication''', consists of multiplying every digit in the first number by every digit in the second and adding the results. This has a [[time complexity]] of <math>O(n^2)</math>, where ''n'' is the number of digits. When done by hand, this may also be reframed as [[grid method multiplication]] or [[lattice multiplication]]. In software, this may be called "shift and add" due to [[bitshifts]] and addition being the only two operations needed.
Both factors are broken up ("partitioned") into their hundreds, tens and units parts, and the products of the parts are then calculated explicitly in a relatively simple multiplication-only stage, before these contributions are then totalled to give the final answer in a separate addition stage.
 
In 1960, [[Anatoly Karatsuba]] discovered [[Karatsuba multiplication]], unleashing a flood of research into fast multiplication algorithms. This method uses three multiplications rather than four to multiply two two-digit numbers. (A variant of this can also be used to multiply [[complex numbers]] quickly.) Done [[recursively]], this has a time complexity of <math>O(n^{\log_2 3})</math>. Splitting numbers into more than two parts results in [[Toom-Cook multiplication]]; for example, using three parts results in the '''Toom-3''' algorithm. Using many parts can set the exponent arbitrarily close to 1, but the constant factor also grows, making it impractical.
The calculation 34 × 13, for example, could be computed using the grid:
<div style="float:right">
<pre> 300
40
90
+ 12
————
442</pre></div>
:{|class="wikitable" border=1 cellspacing=0 cellpadding=15 style="text-align: center;"
! width="40" scope="col" | ×
! width="40" scope="col" | 30
! width="40" scope="col" | 4
|-
! scope="row" | 10
|300
|40
|-
! scope="row" | 3
|90
|12
|}
 
In 1968, the [[Schönhage-Strassen algorithm]], which makes use of a [[Fourier transform]] over a [[Modulus (modular arithmetic)|modulus]], was discovered. It has a time complexity of <math>O(n\log n\log\log n)</math>. In 2007, [[Martin Fürer]] proposed an algorithm with complexity <math>O(n\log n 2^{\Theta(\log^* n)})</math>. In 2014, Harvey, [[Joris van der Hoeven]], and Lecerf proposed one with complexity <math>O(n\log n 2^{3\log^* n})</math>, thus making the [[implicit constant]] explicit; this was improved to <math>O(n\log n 2^{2\log^* n})</math> in 2018. Lastly, in 2019, Harvey and van der Hoeven came up with a [[galactic algorithm]] with complexity <math>O(n\log n)</math>. This matches a guess by Schönhage and Strassen that this would be the optimal bound, although this remains a [[conjecture]] today.
followed by addition to obtain 442, either in a single sum (see right), or through forming the row-by-row totals (300 + 40) + (90 + 12) = 340 + 102 = 442.
 
Integer multiplication algorithms can also be used to multiply polynomials by means of the method of [[Kronecker substitution]].
This calculation approach (though not necessarily with the explicit grid arrangement) is also known as the [[partial products algorithm]]. Its essence is the calculation of the simple multiplications separately, with all addition being left to the final gathering-up stage.
 
The grid method can in principle be applied to factors of any size, although the number of sub-products becomes cumbersome as the number of digits increases. Nevertheless, it is seen as a usefully explicit method to introduce the idea of multiple-digit multiplications; and, in an age when most multiplication calculations are done using a calculator or a spreadsheet, it may in practice be the only multiplication algorithm that some students will ever need.
 
==Long multiplication==
If a [[numeral system|positional numeral system]] is used, a natural way of multiplying numbers is taught in schools
as '''long multiplication''', sometimes called '''grade-school multiplication''', sometimes called the '''Standard Algorithm''':
multiply the [[wikt:multiplicand|multiplicand]] by each digit of the [[wikt:multiplier|multiplier]] and then add up all the properly shifted results. It requires memorization of the [[multiplication table]] for single digits.
 
This is the usual algorithm for multiplying larger numbers by hand in base 10. Computers initially used a very similar [[#Shift and add|shift and add]] algorithm in base 2, but modern processors have optimized circuitry for fast multiplications using more efficient algorithms, at the price of a more complex hardware realization. A person doing long multiplication on paper will write down all the products and then add them together; an [[abacus]]-user will sum the products as soon as each one is computed.
 
===Example===
This example uses ''long multiplication'' to multiply 23,958,233 (multiplicand) by 5,830 (multiplier) and arrives at 139,676,498,390 for the result (product).
23958233
× 5830
———————————————
00000000 ( = 23,958,233 × 0)
71874699 ( = 23,958,233 × 30)
191665864 ( = 23,958,233 × 800)
+ 119791165 ( = 23,958,233 × 5,000)
———————————————
139676498390 ( = 139,676,498,390 )
 
====Other notations====
In some countries such as [[Germany]], the above multiplication is depicted similarly but with the original product kept horizontal and computation starting with the first digit of the multiplier:<ref>{{Cite web |title=Multiplication |url=https://www.mathematische-basteleien.de/multiplication.htm |access-date=2022-03-15 |website=www.mathematische-basteleien.de}}</ref>
 
23958233 · 5830
———————————————
119791165
191665864
71874699
00000000
———————————————
139676498390
 
Below pseudocode describes the process of above multiplication. It keeps only one row to maintain the sum which finally becomes the result. Note that the '+=' operator is used to denote sum to existing value and store operation (akin to languages such as Java and C) for compactness.
 
<sourcesyntaxhighlight lang="pascal" line>
multiply(a[1..p], b[1..q], base) // Operands containing rightmost digits at index 1
product = [1..p+q] // Allocate space for result
Line 67 ⟶ 56:
product[b_i + p] = carry // last digit comes from final carry
return product
</syntaxhighlight>
</source>
 
===Usage in computers<span class="anchor" id="Shift and add"></span>===
===Optimizing space complexity===
Some [[Integrated circuit|chips]] implement long multiplication, in [[computer hardware|hardware]] or in [[microcode]], for various integer and floating-point word sizes. In [[arbitrary-precision arithmetic]], it is common to use long multiplication with the base set to 2<sup>''w''</sup>, where ''w'' is the number of bits in a word, for multiplying relatively small numbers. To multiply two numbers with ''n'' digits using this method, one needs about ''n''<sup>2</sup> operations. More formally, multiplying two ''n''-digit numbers using long multiplication requires [[Bachmann-Landau notation|Θ]](''n''<sup>2</sup>) single-digit operations (additions and multiplications).
{{unreferenced section|date=September 2012}}
Let ''n'' be the total number of digits in the two input numbers in [[Radix|base]] ''D''. If the result must be kept in memory then the space complexity is trivially Θ(''n''). However, in certain applications, the entire result need not be kept in memory and instead the digits of the result can be streamed out as they are computed (for example, to system console or file). In these scenarios, long multiplication has the advantage that it can easily be formulated as a [[FL (complexity)|log space]] algorithm; that is, an algorithm that only needs working space proportional to the logarithm of the number of digits in the input ([[Bachmann-Landau notation|Θ]](log&nbsp;''n'')). This is the ''double'' logarithm of the numbers being multiplied themselves (log&nbsp;log&nbsp;''N''). Note that operands themselves still need to be kept in memory and their Θ(''n'') space is not considered in this analysis.
 
When implemented in software, long multiplication algorithms must deal with overflow during additions, which can be expensive. A typical solution is to represent the number in a small base, ''b'', such that, for example, 8''b'' is a representable machine integer. Several additions can then be performed before an overflow occurs. When the number becomes too large, we add part of it to the result, or we carry and map the remaining part back to a number that is less than ''b''. This process is called ''normalization''. Richard Brent used this approach in his Fortran package, MP.<ref>{{cite journal|first1=Richard P|last1=Brent|title=A Fortran Multiple-Precision Arithmetic Package. |doi=10.1145/355769.355775|journal=ACM Transactions on Mathematical Software|date=March 1978|volume=4|pages=57–70|citeseerx=10.1.1.117.8425|s2cid=8875817}}</ref>
The method is based on the observation that each digit of the result can be computed from right to left with only knowing the carry from the previous step. Let ''a''<sub>''i''</sub> and ''b''<sub>''i''</sub> be the ''i''-th digit of the operand, with ''a'' and ''b'' padded on the left by zeros to be length ''n'', ''r''<sub>''i''</sub> be the ''i''-th digit of the result and ''c''<sub>''i''</sub> be the carry generated for ''r''<sub>''i''</sub> (i=1 is the right most digit) then
 
Computers initially used a very similar algorithm to long multiplication in base 2, but modern processors have optimized circuitry for fast multiplications using more efficient algorithms, at the price of a more complex hardware realization.{{cn|date=March 2022}} In base two, long multiplication is sometimes called '''"shift and add"''', because the algorithm simplifies and just consists of shifting left (multiplying by powers of two) and adding. Most currently available microprocessors implement this or other similar algorithms (such as [[Booth encoding]]) for various integer and floating-point sizes in [[hardware multiplier]]s or in [[microcode]].{{cn|date=March 2022}}
:<math>\begin{align}
r_i &= \left( c_{i-1} + \sum_{j+k=i+1} a_j b_k \right) \mod D \\
c_i &= \left\lfloor (c_{i-1} + \sum_{j+k=i} a_j b_k) / D \right\rfloor \\
c_0 &= 0
\end{align}</math>
or
:<math>
c_i = \left( \sum_{m = 0}^{i - 2} \sum_{j + k = i - m} a_j b_k \right) / D.
</math>
 
On currently available processors, a bit-wise shift instruction is usually (but not always) faster than a multiply instruction and can be used to multiply (shift left) and divide (shift right) by powers of two. Multiplication by a constant and [[division algorithm#Division by a constant|division by a constant]] can be implemented using a sequence of shifts and adds or subtracts. For example, there are several ways to multiply by 10 using only bit-shift and addition.
A simple inductive argument shows that the carry can never exceed ''n'' and the total sum for ''r''<sub>''i''</sub> can never exceed ''D'' * ''n'': the carry into the first column is zero, and for all other columns, there are at most ''n'' digits in the column, and a carry of at most ''n'' from the previous column (by the induction hypothesis). The sum is at most ''D'' * ''n'', and the carry to the next column is at most ''D'' * ''n'' / ''D'', or ''n''. Thus both these values can be stored in O(log ''n'') digits.
<syntaxhighlight lang="php">
 
((x << 2) + x) << 1 # Here 10*x is computed as (x*2^2 + x)*2
In pseudocode, the log-space algorithm is:<syntaxhighlight lang="pascal">
(x << 3) + (x << 1) # Here 10*x is computed as x*2^3 + x*2
multiply(a[1..p], b[1..q], base) // Operands containing rightmost digits at index 1
tot = 0
for ri = 1 to p + q - 1 //For each digit of result
for bi = MAX(1, ri - p + 1) to MIN(ri, q) //Digits from b that need to be considered
ai = ri − bi + 1 //Digits from a follow "symmetry"
tot = tot + (a[ai] * b[bi])
product[ri] = tot mod base
tot = floor(tot / base)
product[p+q] = tot mod base //Last digit of the result comes from last carry
return product
</syntaxhighlight>
In some cases such sequences of shifts and adds or subtracts will outperform hardware multipliers and especially dividers. A division by a number of the form <math>2^n</math> or <math>2^n \pm 1</math> often can be converted to such a short sequence.
 
==Algorithms for multiplying by hand==
===Usage in computers===
Some [[Integrated circuit|chips]] implement this algorithm for various integer and floating-point sizes in [[computer hardware]] or in [[microcode]]. In [[arbitrary-precision arithmetic]], it's common to use long multiplication with the base set to 2<sup>''w''</sup>, where ''w'' is the number of bits in a word, for multiplying relatively small numbers.
 
In addition to the standard long multiplication, there are several other methods used to perform multiplication by hand. Such algorithms may be devised for speed, ease of calculation, or educational value, particularly when computers or [[multiplication table]]s are unavailable.
To multiply two numbers with ''n'' digits using this method, one needs about ''n''<sup>2</sup> operations. More formally: using a natural size metric of number of digits, the time complexity of multiplying two ''n''-digit numbers using long multiplication is [[Bachmann-Landau notation|Θ]](''n''<sup>2</sup>).
 
===Grid method===
When implemented in software, long multiplication algorithms have to deal with overflow during additions, which can be expensive. For this reason, a typical approach is to represent the number in a small base ''b'' such that, for example, 8''b'' is a representable machine integer (for example Richard Brent used this approach in his Fortran package MP<ref>Richard P. Brent. A Fortran Multiple-Precision Arithmetic Package. Australian National University. March 1978.</ref>); we can then perform several additions before having to deal with overflow. When the number becomes too large, we add part of it to the result or carry and map the remaining part back to a number less than ''b''; this process is called ''normalization''.
{{main|Grid method multiplication}}
The [[grid method multiplication|grid method]] (or box method) is an introductory method for multiple-digit multiplication that is often taught to pupils at [[primary school]] or [[elementary school]]. It has been a standard part of the national primary school mathematics curriculum in England and Wales since the late 1990s.<ref>{{cite news |first=Gary |last=Eason |url=http://news.bbc.co.uk/1/hi/education/639937.stm |title=Back to school for parents |publisher=[[BBC News]] |date=13 February 2000}}<br>{{cite news |first=Rob |last=Eastaway |author-link=Rob Eastaway |url=https://www.bbc.co.uk/news/magazine-11258175 |title=Why parents can't do maths today |publisher=BBC News |date=10 September 2010}}</ref>
 
Both factors are broken up ("partitioned") into their hundreds, tens and units parts, and the products of the parts are then calculated explicitly in a relatively simple multiplication-only stage, before these contributions are then totalled to give the final answer in a separate addition stage.
 
The calculation 34 × 13, for example, could be computed using the grid:
<div style="float:right">
<pre> 300
40
90
+ 12
————
442</pre></div>
{| class="wikitable" style="text-align: center;"
! width="40" scope="col" | ×
! width="40" scope="col" | 30
! width="40" scope="col" | 4
|-
! scope="row" | 10
|300
|40
|-
! scope="row" | 3
|90
|12
|}
 
followed by addition to obtain 442, either in a single sum (see right), or through forming the row-by-row totals
: (300 + 40) + (90 + 12) = 340 + 102 = 442.
 
This calculation approach (though not necessarily with the explicit grid arrangement) is also known as the [[partial products algorithm]]. Its essence is the calculation of the simple multiplications separately, with all addition being left to the final gathering-up stage.
 
The grid method can in principle be applied to factors of any size, although the number of sub-products becomes cumbersome as the number of digits increases. Nevertheless, it is seen as a usefully explicit method to introduce the idea of multiple-digit multiplications; and, in an age when most multiplication calculations are done using a calculator or a spreadsheet, it may in practice be the only multiplication algorithm that some students will ever need.
 
===Lattice multiplication===
{{main|Lattice multiplication}}
[[File:Hindu lattice.svg|thumb|right|First, set up the grid by marking its rows and columns with the numbers to be multiplied. Then, fill in the boxes with tens digits in the top triangles and units digits on the bottom.]]
[[File:Hindu lattice 2.svg|thumb|right|Finally, sum along the diagonal tracts and carry as needed to get the answer]]
 
Lattice, or sieve, multiplication is algorithmically equivalent to long multiplication. It requires the preparation of a lattice (a grid drawn on paper) which guides the calculation and separates all the multiplications from the [[addition]]s. It was introduced to Europe in 1202 in [[Fibonacci]]'s [[Liber Abaci]]. Fibonacci described the operation as mental, using his right and left hands to carry the intermediate calculations. [[Matrakçı Nasuh]] presented 6 different variants of this method in this 16th-century book, Umdet-ul Hisab. It was widely used in [[Enderun]] schools across the Ottoman Empire.<ref>{{cite journal |last1=Corlu, |first1=M. S., |last2=Burlbaw, |first2=L. M., |last3=Capraro, |first3=R. M., |last4=Corlu, |first4=M. A.,& |last5=Han, |first5=S. (2010). |title=The Ottoman Palace School Enderun and Thethe Man with Multiple Talents, Matrakçı Nasuh. |journal=Journal of the Korea Society of Mathematical Education Series D: Research in Mathematical Education. |volume=14( |issue=1), pp. |pages=19–31 |date=2010 |doi= |url=https://koreascience.kr/article/JAKO201017337333137.page}}</ref> [[Napier's bones]], or [[Napier's rods]] also used this method, as published by Napier in 1617, the year of his death.
 
As shown in the example, the multiplicand and multiplier are written above and to the right of a lattice, or a sieve. It is found in [[Muhammad ibn Musa al-Khwarizmi]]'s "Arithmetic", one of Leonardo's sources mentioned by Sigler, author of "Fibonacci's Liber Abaci", 2002.{{citation needed|date=January 2016}}
Line 120 ⟶ 124:
* Finally, if a carry phase is necessary, the answer as shown along the left and bottom sides of the lattice is converted to normal form by carrying ten's digits as in long addition or multiplication.
 
====Example====
The pictures on the right show how to calculate 345 × 12 using lattice multiplication. As a more complicated example, consider the picture below displaying the computation of 23,958,233 multiplied by 5,830 (multiplier); the result is 139,676,498,390. Notice 23,958,233 is along the top of the lattice and 5,830 is along the right side. The products fill the lattice and the sum of those products (on the diagonal) are along the left and bottom sides. Then those sums are totaled as shown.
{|
Line 166 ⟶ 170:
|}
 
==Peasant=Russian or binarypeasant multiplication===
{{Main|Peasant multiplication}}
The binary method is also known as peasant multiplication, because it has been widely used by people who are classified as peasants and thus have not memorized the [[multiplication table]]s required for long multiplication.<ref>{{Cite web|url=https://www.cut-the-knot.org/Curriculum/Algebra/PeasantMultiplication.shtml|title=Peasant Multiplication|author-link=Alexander Bogomolny|last=Bogomolny|first= Alexander |website=www.cut-the-knot.org|access-date=2017-11-04}}</ref>{{failed verification|date=March 2020}} The algorithm was in use in ancient Egypt.<ref>{{Cite book |first=D. |last=Wells | author-link=David G. Wells | year=1987 |page=44 |title=The Penguin Dictionary of Curious and Interesting Numbers |publisher=Penguin Books |isbn=978-0-14-008029-2}}</ref> Its main advantages are that it can be taught quickly, requires no memorization, and can be performed using tokens, such as [[poker chips]], if paper and pencil aren't available. The disadvantage is that it takes more steps than long multiplication, so it can be unwieldy for large numbers.
{{unreferenced section|date=January 2013}}
In base 2, long multiplication reduces to a nearly trivial operation. For each '1' bit in the [[wikt:multiplier|multiplier]], shift the [[wikt:multiplicand|multiplicand]] an appropriate amount and then sum the shifted values. Depending on computer processor architecture and choice of multiplier, it may be faster to code this algorithm using hardware bit shifts and adds rather than depend on multiplication instructions, when the multiplier is fixed and the number of adds required is small.
 
This [[algorithm]] is also known as peasant multiplication, because it has been widely used among those who are classified as peasants and thus have not memorized the [[multiplication table]]s required by long multiplication.<ref>{{Cite web|url=https://www.cut-the-knot.org/Curriculum/Algebra/PeasantMultiplication.shtml|title=Peasant Multiplication|author-link=Alexander Bogomolny|last=Bogomolny|first= Alexander |website=www.cut-the-knot.org|access-date=2017-11-04}}</ref> The algorithm was also in use in ancient Egypt.<ref>{{Cite book |author=D. Wells |year=1987 |page=44 |title=The Penguin Dictionary of Curious and Interesting Numbers |publisher=Penguin Books}}</ref>
 
====Description====
On paper, write down in one column the numbers you get when you repeatedly halve the multiplier, ignoring the remainder; in a column beside it repeatedly double the multiplicand. Cross out each row in which the last digit of the first number is even, and add the remaining numbers in the second column to obtain the product.
 
====Examples====
The main advantages of this method are that it can be taught quickly, no memorization is required, and it can be performed using tokens such as [[poker chip]]s if paper and pencil are not available. It does however take more steps than long multiplication so it can be unwieldy when large numbers are involved.
 
===Examples===
This example uses peasant multiplication to multiply 11 by 3 to arrive at a result of 33.
 
Line 226:
139676498390 10000010000101010111100011100111010110
 
===Quarter square multiplication===
==Shift and add==
 
This formula can in some cases be used, to make multiplication tasks easier to complete:
Historically, computers used a "shift and add" algorithm to multiply small integers. Both base 2 [[#Long multiplication|long multiplication]] and base 2 [[peasant multiplication]] reduce to this same algorithm.
In base 2, multiplying by the single digit of the multiplier reduces to a simple series of [[logical AND]] operations. Each partial product is added to a running sum as soon as each partial product is computed. Most currently available microprocessors implement this or other similar algorithms (such as [[Booth encoding]]) for various integer and floating-point sizes in [[hardware multiplier]]s or in [[microcode]].
 
On currently available processors, a bit-wise shift instruction is faster than a multiply instruction and can be used to multiply (shift left) and divide (shift right) by powers of two. Multiplication by a constant and [[division algorithm#Division by a constant|division by a constant]] can be implemented using a sequence of shifts and adds or subtracts. For example, there are several ways to multiply by 10 using only bit-shift and addition.
 
((x << 2) + x) << 1 # Here 10*x is computed as (x*2^2 + x)*2
(x << 3) + (x << 1) # Here 10*x is computed as x*2^3 + x*2
 
In some cases such sequences of shifts and adds or subtracts will outperform hardware multipliers and especially dividers. A division by a number of the form <math>2^n</math> or <math>2^n \pm 1</math> often can be converted to such a short sequence.
 
These types of sequences have to always be used for computers that do not have a "multiply" instruction,<ref>
[http://techref.massmind.org/techref/method/math/muldiv.htm "Novel Methods of Integer Multiplication and Division"] by G. Reichborn-Kjennerud</ref> and can also be used by extension to floating point numbers if one replaces the shifts with computation of ''2*x'' as ''x+x'', as these are logically equivalent.
 
==Quarter square multiplication==
Two quantities can be multiplied using quarter squares by employing the following identity involving the [[Floor and ceiling functions|floor function]] that some sources<ref>{{citation |title= Quarter Tables Revisited: Earlier Tables, Division of Labor in Table Construction, and Later Implementations in Analog Computers |last=McFarland |first=David|url=http://escholarship.org/uc/item/5n31064n |page=1 |year=2007}}</ref><ref>{{cite book| title=Mathematics in Ancient Iraq: A Social History |last=Robson |first=Eleanor |page=227 |year=2008 |isbn= 978-0691091822 }}</ref> attribute to [[Babylonian mathematics]] (2000–1600 BC).
 
: <math>
\left\lfloor \frac{\left(x+y\right)^2}{4} \right\rfloor - \left\lfloor \frac{\left(x-y\right)^2}{4} \right\rfloor =
\frac{1}{4}\left(\left(x^2+2xy+y^2\right) - \left(x^2-2xy+y^2\right)\right) =
\frac{1}{4}\left(4xy\right) = xy.
</math>
 
In the case where <math>x</math> and <math>y</math> are integers, we have that
If one of {{math|''x''+''y''}} and {{math|''x''&minus;''y''}} is odd, the other is odd too; this means that the fractions, if any, will cancel out, and discarding the remainders does not introduce any error. Below is a lookup table of quarter squares with the remainder discarded for the digits 0 through 18; this allows for the multiplication of numbers up to {{math|9×9}}.
:<math> (x+y)^2 \equiv (x-y)^2 \bmod 4</math>
because <math>x+y</math> and <math>x-y</math> are either both even or both odd. This means that
:<math>\begin{align}
xy &= \frac14(x+y)^2 - \frac14(x-y)^2 \\
&= \left((x+y)^2 \text{ div } 4\right)- \left((x-y)^2 \text{ div } 4\right)
\end{align}</math>
and it's sufficient to (pre-)compute the integral part of squares divided by 4 like in the following example.
 
====Examples ====
Below is a lookup table of quarter squares with the remainder discarded for the digits 0 through 18; this allows for the multiplication of numbers up to {{math|9×9}}.
 
{| border="1" cellspacing="0" cellpadding="3" style="margin:0 0 0 0.5em; background:#fff; border-collapse:collapse; border-color:#7070090;" class="wikitable"
Line 261 ⟶ 257:
If, for example, you wanted to multiply 9 by 3, you observe that the sum and difference are 12 and 6 respectively. Looking both those values up on the table yields 36 and 9, the difference of which is 27, which is the product of 9 and 3.
 
====History of quarter square multiplication====
Antoine Voisin published a table of quarter squares from 1 to 1000 in 1817 as an aid in multiplication. A larger table of quarter squares from 1 to 100000 was published by Samuel Laundy in 1856,<ref>{{Citation |title=Reviews |journal=The Civil Engineer and Architect's Journal |year=1857 |pages=54–55 |url=https://books.google.com/books?id=gcNAAAAAcAAJ&pg=PA54#v=onepage&f=false |postscript=.}}</ref> and a table from 1 to 200000 by Joseph Blater in 1888.<ref>{{Citation|title=Multiplying with quarter squares |first=Neville |last=Holmes| journal=The Mathematical Gazette |volume=87 |issue=509 |year=2003 |pages=296–299 |jstor=3621048|postscript=.|doi=10.1017/S0025557200172778 }}</ref>
 
In prehistoric time, quarter square multiplication involved [[Floor and ceiling functions|floor function]]; that some sources<ref>{{citation |title= Quarter Tables Revisited: Earlier Tables, Division of Labor in Table Construction, and Later Implementations in Analog Computers |last=McFarland |first=David|url=https://escholarship.org/uc/item/5n31064n |page=1 |year=2007}}</ref><ref>{{cite book| title=Mathematics in Ancient Iraq: A Social History |last=Robson |first=Eleanor |page=227 |year=2008 |publisher=Princeton University Press |isbn= 978-0691201405 }}</ref> attribute to [[Babylonian mathematics]] (2000–1600 BC).
 
Antoine Voisin published a table of quarter squares from 1 to 1000 in 1817 as an aid in multiplication. A larger table of quarter squares from 1 to 100000 was published by Samuel Laundy in 1856,<ref>{{Citation |title=Reviews |journal=The Civil Engineer and Architect's Journal |year=1857 |pages=54–55 |url=https://books.google.com/books?id=gcNAAAAAcAAJ&pg=PA54 |postscript=.}}</ref> and a table from 1 to 200000 by Joseph Blater in 1888.<ref>{{Citation|title=Multiplying with quarter squares |first=Neville |last=Holmes| journal=The Mathematical Gazette |volume=87 |issue=509 |year=2003 |pages=296–299 |jstor=3621048|postscript=.|doi=10.1017/S0025557200172778 |s2cid=125040256 }}</ref>
 
Quarter square multipliers were used in [[analog computer]]s to form an [[analog signal]] that was the product of two analog input signals. In this application, the sum and difference of two input [[voltage]]s are formed using [[operational amplifier]]s. The square of each of these is approximated using [[piecewise linear function|piecewise linear]] circuits. Finally the difference of the two squares is formed and scaled by a factor of one fourth using yet another operational amplifier.
 
In 1980, Everett L. Johnson proposed using the quarter square method in a [[Digital data|digital]] multiplier.<ref name=eljohnson>{{Citation |last = Everett L. |first = Johnson |date = March 1980 |title = A Digital Quarter Square Multiplier |periodical = IEEE Transactions on Computers |___location = Washington, DC, USA |publisher = IEEE Computer Society |volume = C-29 |issue = 3 |pages = 258–261 |issn = 0018-9340 |doi =10.1109/TC.1980.1675558 |s2cid = 24813486 }}</ref> To form the product of two 8-bit integers, for example, the digital device forms the sum and difference, looks both quantities up in a table of squares, takes the difference of the results, and divides by four by shifting two bits to the right. For 8-bit integers the table of quarter squares will have 2<sup>9</sup>-&minus;1=511 entries (one entry for the full range 0..510 of possible sums, the differences using only the first 256 entries in range 0..255) or 2<sup>9</sup>-&minus;1=511 entries (using for negative differences the technique of 2-complements and 9-bit masking, which avoids testing the sign of differences), each entry being 16-bit wide (the entry values are from (0²/4)=0 to (510²/4)=65025).
 
The Quarterquarter square multiplier technique has also benefittedbenefited 8-bit systems that do not have any support for a hardware multiplier. StevenCharles JuddPutney implemented this for the [[MOS Technology 6502|6502]].<ref name=sjuddcputney>{{CitationCite journal |last = JuddPutney |first = StevenCharles |title = AFastest Different Perspective,6502 partMultiplication IIYet|date = JanMarch 19951986 |periodicaljournal = CApple Assembly Line |volume =Hacking 6 |issue = 96 |url = http://www.ffd2txbobsc.com/fridgeaal/chacking1986/c=hacking9aal8603.txthtml#a5}}</ref>
 
==Computational complexity of multiplication==
==Fast multiplication algorithms for large inputs==
{{Anchor|Computational complexity}}
{{unsolved|computer science|What is the fastest algorithm for multiplication of two <math>n</math>-digit numbers?}}
 
A line of research in [[theoretical computer science]] is about the number of single-bit arithmetic operations necessary to multiply two <math>n</math>-bit integers. This is known as the [[computational complexity]] of multiplication. Usual algorithms done by hand have asymptotic complexity of <math>O(n^2)</math>, but in 1960 [[Anatoly Karatsuba]] discovered that better complexity was possible (with the [[Karatsuba algorithm]]).<ref>{{cite web | url=https://youtube.com/watch?v=AMl6EJHfUWo | title= The Genius Way Computers Multiply Big Numbers| website=[[YouTube]]| date= 2 January 2025}}</ref>
=== Complex multiplication algorithm===
Complex multiplication normally involves four multiplications and two additions.
 
Currently, the algorithm with the best computational complexity is a 2019 algorithm of [[David Harvey (mathematician)|David Harvey]] and [[Joris van der Hoeven]], which uses the strategies of using [[number-theoretic transform]]s introduced with the [[Schönhage–Strassen algorithm]] to multiply integers using only <math>O(n\log n)</math> operations.<ref>{{cite journal | last1 = Harvey | first1 = David | last2 = van der Hoeven | first2 = Joris | author2-link = Joris van der Hoeven | doi = 10.4007/annals.2021.193.2.4 | issue = 2 | journal = [[Annals of Mathematics]] | mr = 4224716 | pages = 563–617 | series = Second Series | title = Integer multiplication in time <math>O(n \log n)</math> | volume = 193 | year = 2021| s2cid = 109934776 | url = https://hal.archives-ouvertes.fr/hal-02070778v2/file/nlogn.pdf }}</ref> This is conjectured to be the best possible algorithm, but lower bounds of <math>\Omega(n\log n)</math> are not known.
:<math>(a+bi) (c+di) = (ac-bd) + (bc+ad)i.</math>
 
===Karatsuba multiplication===
Or
{{Main|Karatsuba algorithm}}
 
Karatsuba multiplication is an O(''n''<sup>log<sub>2</sub>3</sup>) ≈ O(''n''<sup>1.585</sup>) divide and conquer algorithm, that uses recursion to merge together sub calculations.
 
By rewriting the formula, one makes it possible to do sub calculations / recursion. By doing recursion, one can solve this in a fast manner.
 
Let <math>x</math> and <math>y</math> be represented as <math>n</math>-digit strings in some base <math>B</math>. For any positive integer <math>m</math> less than <math>n</math>, one can write the two given numbers as
 
:<math>x = x_1 B^m + x_0,</math>
:<math>y = y_1 B^m + y_0,</math>
\begin{array}{c|c|c}
 
\times & a & bi \\
where <math>x_0</math> and <math>y_0</math> are less than <math>B^m</math>. The product is then
\hline
 
c & ac & bci \\
<math>
\hline
\begin{align}
di & adi & -bd
xy &= (x_1 B^m + x_0)(y_1 B^m + y_0) \\
\end{array}
&= x_1 y_1 B^{2m} + (x_1 y_0 + x_0 y_1) B^m + x_0 y_0 \\
&= z_2 B^{2m} + z_1 B^m + z_0, \\
\end{align}
</math>
 
where
But there is a way of reducing the number of multiplications to three.<ref name="taocp-vol2-sec464-ex41">{{Citation | last1=Knuth | first1=Donald E. | author1-link=Donald Knuth | title=The Art of Computer Programming volume 2: Seminumerical algorithms | publisher=[[Addison-Wesley]] | year=1988 | pages=519, 706| title-link=The Art of Computer Programming }}
</ref>
 
:<math>z_2 = x_1 y_1,</math>
The product (''a''&nbsp;+&nbsp;''bi'') · (''c''&nbsp;+&nbsp;''di'') can be calculated in the following way.
:<math>z_1 = x_1 y_0 + x_0 y_1,</math>
:<math>z_0 = x_0 y_0.</math>
 
These formulae require four multiplications and were known to [[Charles Babbage]].<ref>Charles Babbage, Chapter VIII – Of the Analytical Engine, Larger Numbers Treated, [https://archive.org/details/bub_gb_Fa1JAAAAMAAJ/page/n142 <!-- pg=125 --> Passages from the Life of a Philosopher], Longman Green, London, 1864; page 125.</ref> Karatsuba observed that <math>xy</math> can be computed in only three multiplications, at the cost of a few extra additions. With <math>z_0</math> and <math>z_2</math> as before one can observe that
:''k''<sub>1</sub> = ''c'' · (''a'' + ''b'')
:''k''<sub>2</sub> = ''a'' · (''d'' − ''c'')
:''k''<sub>3</sub> = ''b'' · (''c'' + ''d'')
:Real part = ''k''<sub>1</sub> − ''k''<sub>3</sub>
:Imaginary part = ''k''<sub>1</sub> + ''k''<sub>2</sub>.
 
:<math>
This algorithm uses only three multiplications, rather than four, and five additions or subtractions rather than two. If a multiply is more expensive than three adds or subtracts, as when calculating by hand, then there is a gain in speed. On modern computers a multiply and an add can take about the same time so there may be no speed gain. There is a trade-off in that there may be some loss of precision when using floating point.
\begin{align}
z_1 &= x_1 y_0 + x_0 y_1 \\
&= x_1 y_0 + x_0 y_1 + x_1 y_1 - x_1 y_1 + x_0 y_0 - x_0 y_0 \\
&= x_1 y_0 + x_0 y_0 + x_0 y_1 + x_1 y_1 - x_1 y_1 - x_0 y_0 \\
&= (x_1 + x_0) y_0 + (x_0 + x_1) y_1 - x_1 y_1 - x_0 y_0 \\
&= (x_1 + x_0) (y_0 + y_1) - x_1 y_1 - x_0 y_0 \\
&= (x_1 + x_0) (y_1 + y_0) - z_2 - z_0. \\
\end{align}
</math>
 
For [[fast Fourier transform]]s (FFTs) (or any [[Linear map|linear transformation]]) the complex multiplies are by constant coefficients ''c''&nbsp;+&nbsp;''di'' (called [[twiddle factor]]s in FFTs), in which case two of the additions (''d''−''c'' and ''c''+''d'') can be precomputed. Hence, only three multiplies and three adds are required.<ref>P. Duhamel and M. Vetterli, [http://math.berkeley.edu/~strain/273.F10/duhamel.vetterli.fft.review.pdf Fast Fourier transforms: A tutorial review and a state of the art"] {{webarchive|url=https://web.archive.org/web/20140529212847/http://math.berkeley.edu/~strain/273.F10/duhamel.vetterli.fft.review.pdf |date=2014-05-29 }}, ''Signal Processing'' vol. 19, pp. 259–299 (1990), section 4.1.</ref> However, trading off a multiplication for an addition in this way may no longer be beneficial with modern [[floating-point unit]]s.<ref>S. G. Johnson and M. Frigo, "[http://fftw.org/newsplit.pdf A modified split-radix FFT with fewer arithmetic operations]," ''IEEE Trans. Signal Process.'' vol. 55, pp. 111–119 (2007), section IV.</ref>
 
===Karatsuba multiplication===
{{Main|Karatsuba algorithm}}
For systems that need to multiply numbers in the range of several thousand digits, such as [[computer algebra system]]s and [[bignum]] libraries, long multiplication is too slow. These systems may employ '''Karatsuba multiplication''', which was discovered in 1960 (published in 1962). The heart of [[Anatoly Karatsuba|Karatsuba]]'s method lies in the observation that two-digit multiplication can be done with only three rather than the four multiplications classically required. This is an example of what is now called a ''[[divide and conquer algorithm]]''. Suppose we want to multiply two 2-digit base-''m'' numbers: ''x''<sub>1</sub>'' m + x''<sub>2</sub> and ''y''<sub>1</sub>'' m + y''<sub>2</sub>:
 
Because of the overhead of recursion, Karatsuba's multiplication is slower than long multiplication for small values of ''n''; typical implementations therefore switch to long multiplication for small values of ''n''.
# compute ''x''<sub>1</sub> · ''y''<sub>1</sub>, call the result ''F''
# compute ''x''<sub>2</sub> · ''y''<sub>2</sub>, call the result ''G''
# compute (''x''<sub>1</sub> + ''x''<sub>2</sub>) · (''y''<sub>1</sub> + ''y''<sub>2</sub>), call the result ''H''
# compute ''H'' − ''F'' − ''G'', call the result ''K''; this number is equal to ''x''<sub>1</sub> · ''y''<sub>2</sub> + ''x''<sub>2</sub> · ''y''<sub>1</sub>
# compute ''F'' · ''m''<sup>2</sup> + ''K'' · ''m'' + ''G''.
 
==== General case with multiplication of N numbers ====
To compute these three products of ''m''-digit numbers, we can employ the same trick again, effectively using [[recursion]]. Once the numbers are computed, we need to add them together (steps 4 and 5), which takes about ''n'' operations.
 
By exploring patterns after expansion, one see following:
Karatsuba multiplication has a time complexity of [[Big O notation|O]](''n''<sup>log<sub>2</sub>3</sup>) ≈ O(''n''<sup>1.585</sup>), making this method significantly faster than long multiplication. Because of the overhead of recursion, Karatsuba's multiplication is slower than long multiplication for small values of ''n''; typical implementations therefore switch to long multiplication if ''n'' is below some threshold.
 
<math display="block">\begin{alignat}{5} (x_1 B^{ m} + x_0) (y_1 B^{m} + y_0) (z_1 B^{ m} + z_0) (a_1 B^{ m} + a_0) &=
Karatsuba's algorithm is the first known algorithm for multiplication that is asymptotically faster than long multiplication,<ref>D. Knuth, ''The Art of Computer Programming'', vol. 2, sec. 4.3.3 (1998)</ref> and can thus be viewed as the starting point for the theory of fast multiplications.
a_1 x_1 y_1 z_1 B^{4 m} &+ a_1 x_1 y_1 z_0 B^{3m} &+ a_1 x_1 y_0 z_1 B^{3 m} &+ a_1 x_0 y_1 z_1 B^{3 m} \\
&+ a_0 x_1 y_1 z_1 B^{3 m} &+ a_1 x_1 y_0 z_0 B^{2 m} &+ a_1 x_0 y_1 z_0 B^{2 m} &+ a_0 x_1 y_1 z_0 B^{2 m}\\
&+ a_1 x_0 y_0 z_1 B^{2 m} &+ a_0 x_1 y_0 z_1 B^{2 m} &+ a_0 x_0 y_1 z_1 B^{2 m} &+ a_1 x_0 y_0 z_0 B^{m\phantom{1}}\\
&+ a_0 x_1 y_0 z_0 B^{m\phantom{1}} &+ a_0 x_0 y_1 z_0 B^{m\phantom{1}} &+ a_0 x_0 y_0 z_1 B^{m\phantom{1}} &+ a_0 x_0 y_0 z_0 \phantom{B^{1 m}}
\end{alignat}</math>
 
Each summand is associated to a unique binary number from 0 to
In 1963, Peter Ungar suggested setting ''m'' to ''i'' to obtain a similar reduction in the complex multiplication algorithm.<ref name="taocp-vol2-sec464-ex41"/> To multiply (''a''&nbsp;+&nbsp;''b i'') · (''c''&nbsp;+&nbsp;''d i''), follow these steps:
<math> 2^{N+1}-1 </math>, for example <math> a_1 x_1 y_1 z_1 \longleftrightarrow 1111,\ a_1 x_0 y_1 z_0 \longleftrightarrow 1010 </math> etc. Furthermore; B is powered to number of 1, in this binary string, multiplied with m.
# compute ''b'' · ''d'', call the result ''F''
# compute ''a'' · ''c'', call the result ''G''
# compute (''a'' + ''b'') · (''c'' + ''d''), call the result ''H''
# the imaginary part of the result is ''K'' = ''H'' − ''F'' − ''G'' = ''a'' · ''d'' + ''b'' · ''c''
# the real part of the result is ''G'' − ''F'' = ''a'' · ''c'' − ''b'' · ''d''
 
If we express this in fewer terms, we get:
Like the algorithm in the previous section, this requires three multiplications and five additions or subtractions.
 
<math display="block">\prod_{j=1}^N (x_{j,1} B^{ m} + x_{j,0}) = \sum_{i=1}^{2^{N+1}-1}\prod_{j=1}^N x_{j,c(i,j)}B^{m\sum_{j=1}^N c(i,j)} = \sum_{j=0}^{N}z_jB^{jm}
</math>, where <math> c(i,j) </math> means digit in number i at position j. Notice that <math> c(i,j) \in \{0,1\} </math>
 
<math display="block">
\begin{align}
z_{0} &= \prod_{j=1}^N x_{j,0}
\\
z_{N} &= \prod_{j=1}^N x_{j,1}
\\
z_{N-1} &= \prod_{j=1}^N (x_{j,0} + x_{j,1}) - \sum_{i \ne N-1}^{N} z_i
\end{align}
</math>
 
==== History ====
Karatsuba's algorithm was the first known algorithm for multiplication that is asymptotically faster than long multiplication,<ref>D. Knuth, ''The Art of Computer Programming'', vol. 2, sec. 4.3.3 (1998)</ref> and can thus be viewed as the starting point for the theory of fast multiplications.
 
===Toom–Cook===
{{Main|Toom–Cook multiplication}}
Another method of multiplication is called Toom–Cook or Toom-3. The Toom–Cook method splits each number to be multiplied into multiple parts. The Toom–Cook method is one of the generalizations of the Karatsuba method. A three-way Toom–Cook can do a size-''3N'' multiplication for the cost of five size-''N'' multiplications,. This improvementaccelerates the operation by a factor of 9/5, compared towhile the Karatsuba method's improvementaccelerates it by a factor of 4/3.
 
Although using more and more parts can reduce the time spent on recursive multiplications further, the overhead from additions and digit management also grows. For this reason, the method of Fourier transforms is typically faster for numbers with several thousand digits, and asymptotically faster for even larger numbers.
 
===Schönhage–Strassen===
===Fourier transform methods===
{{Main|Schönhage–Strassen algorithm}}
[[File:Integer multiplication by FFT.svg|thumb|350px|Demonstration of multiplying 1234 × 5678 = 7006652 using fast Fourier transforms (FFTs). [[Number-theoretic transform]]s in the integers modulo 337 are used, selecting 85 as an 8th root of unity. Base 10 is used in place of base 2<sup>''w''</sup> for illustrative purposes.]]
The basic idea due to [[Volker Strassen|Strassen]] (1968) is to use fast polynomial multiplication to perform fast integer multiplication. The algorithm was made practical and theoretical guarantees were provided in 1971 by [[Arnold Schönhage|Schönhage]] and Strassen resulting in the [[Schönhage–Strassen algorithm]].<ref name="schönhage">A. Schönhage and V. Strassen, "Schnelle Multiplikation großer Zahlen", ''Computing'' '''7''' (1971), pp. 281–292.</ref> The details are the following: We choose the largest integer ''w'' that will not cause [[Integer overflow|overflow]] during the process outlined below. Then we split the two numbers into ''m'' groups of ''w'' bits as follows
 
Every number in base B, can be written as a polynomial:
: <math>a=\sum_{i=0}^{m-1} {a_i 2^{wi}}\text{ and }b=\sum_{j=0}^{m-1} {b_j 2^{wj}}.</math>
 
<math display="block"> X = \sum_{i=0}^N {x_iB^i} </math>
We look at these numbers as polynomials in ''x'', where ''x'' = 2<sup>''w''</sup>, to get,
 
Furthermore, multiplication of two numbers could be thought of as a product of two polynomials:
: <math>a=\sum_{i=0}^{m-1} {a_i x^{i}}\text{ and }b=\sum_{j=0}^{m-1} {b_j x^{j}}.</math>
 
<math display="block">XY = (\sum_{i=0}^N {x_iB^i})(\sum_{j=0}^N {y_iB^j}) </math>
Then we can then say that,
 
Because,for <math> B^k </math>: <math>abc_k =\sum_{(i=0}^{m-1} \sum_{,j):i+j=0}^{m-1k} a_i b_j x^{(i+j)a_ib_j} = \sum_{ki=0}^k {a_ib_{2mk-2i}} c_k x^{k}.</math>,
we have a convolution.
 
By using fft (fast fourier transformation) with convolution rule, we can get
Clearly the above setting is realized by polynomial multiplication, of two polynomials ''a'' and ''b''. The crucial step now is to use [[Discrete Fourier transform#Polynomial multiplication|Fast Fourier multiplication]] of polynomials to realize the multiplications above faster than in naive ''O''(''m''<sup>2</sup>) time.
 
<math display="block"> \hat{f}(a * b) = \hat{f}(\sum_{i=0}^k {a_ib_{k-i}}) = \hat{f}(a) \bullet \hat{f}(b) </math>. That is; <math> C_k = a_k \bullet b_k </math>, where <math> C_k </math>
To remain in the modular setting of Fourier transforms, we look for a ring with a (2''m'')th root of unity. Hence we do multiplication modulo ''N'' (and thus in the ''Z''/''NZ'' [[Ring (mathematics)|ring]]). Further, ''N'' must be chosen so that there is no 'wrap around', essentially, no reductions modulo ''N'' occur. Thus, the choice of ''N'' is crucial. For example, it could be done as,
is the corresponding coefficient in fourier space. This can also be written as: <math>\mathrm{fft}(a * b) = \mathrm{fft}(a) \bullet \mathrm{fft}(b)</math>.
 
: <math> N = 2^{3m} + 1.</math>
 
We have the same coefficient due to linearity under fourier transformation, and because these polynomials
The ring ''Z''/''NZ'' would thus have a (2''m'')th root of unity, namely 8. Also, it can be checked that ''c<sub>k</sub>'' < ''N'', and thus no wrap around will occur.
only consist of one unique term per coefficient:
 
<math display="block"> \hat{f}(x^n) = \left(\frac{i}{2\pi}\right)^n \delta^{(n)} </math> and
The algorithm has a time complexity of [[Bachmann-Landau notation|Θ]](''n''&nbsp;log(''n'')&nbsp;log(log(''n''))) and is used in practice for numbers with more than 10,000 to 40,000 decimal digits. In 2007 this was improved by Martin Fürer ([[Fürer's algorithm]]) <ref name="fürer_1">Fürer, M. (2007). "[https://web.archive.org/web/20130425232048/http://www.cse.psu.edu/~furer/Papers/mult.pdf Faster Integer Multiplication]" in Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, June 11–13, 2007, San Diego, California, USA</ref> to give a time complexity of ''n''&nbsp;log(''n'')&nbsp;2<sup>Θ([[iterated logarithm|log<sup>*</sup>]](''n''))</sup> using Fourier transforms over complex numbers. Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi<ref>Anindya De, Piyush P Kurur, Chandan Saha, Ramprasad Saptharishi. Fast Integer Multiplication Using Modular Arithmetic. Symposium on Theory of Computation (STOC) 2008.</ref> gave a similar algorithm using [[modular arithmetic]] in 2008 achieving the same running time. In context of the above material, what these latter authors have achieved is to find ''N'' much less than 2<sup>3''k''</sup> + 1, so that ''Z''/''NZ'' has a (2''m'')th root of unity. This speeds up computation and reduces the time complexity. However, these latter algorithms are only faster than Schönhage–Strassen for impractically large inputs.
<math display="block"> \hat{f}(a\, X(\xi) + b\, Y(\xi)) = a\, \hat{X}(\xi) + b\, \hat{Y}(\xi)</math>
 
* Convolution rule: <math> \hat{f}(X * Y) = \ \hat{f}(X) \bullet \hat{f}(Y) </math>
In March 2019, [[David Harvey (mathematician)|David Harvey]] and [[Joris van der Hoeven]] ([[:de:Joris_van_der_Hoeven|de]]) released a paper describing an {{nowrap|''O''(''n'' log ''n'')}} multiplication algorithm.<ref>David Harvey, Joris Van Der Hoeven. Integer multiplication in time {{nowrap|''O''(''n'' log ''n'')}}. 2019. ffhal-02070778</ref><ref>{{Cite web|url=https://rjlipton.wordpress.com/2019/03/29/integer-multiplication-in-nlogn-time/|title=Integer Multiplication in NlogN Time|last=KWRegan|date=2019-03-29|website=Gödel's Lost Letter and P=NP|language=en|access-date=2019-05-03}}</ref><ref>{{Cite web|url=https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/|title=Mathematicians Discover the Perfect Way to Multiply|last=Hartnett|first=Kevin|website=Quanta Magazine|access-date=2019-05-03}}</ref><ref>{{Cite web|url=https://web.maths.unsw.edu.au/~davidharvey/papers/nlogn/|title=Integer multiplication in time {{nowrap|''O''(''n'' log ''n'')}}|last=Harvey|first=David}}</ref>
 
We have reduced our convolution problem
Using [[number-theoretic transform]]s instead of [[discrete Fourier transform]]s avoids [[rounding error]] problems by using modular arithmetic instead of [[floating point|floating-point]] arithmetic. In order to apply the factoring which enables the FFT to work, the length of the transform must be factorable to small primes and must be a factor of {{nowrap|''N'' − 1}}, where ''N'' is the field size. In particular, calculation using a Galois field GF(''k''<sup>2</sup>), where ''k'' is a [[Mersenne prime]], allows the use of a transform sized to a power of 2; e.g. {{nowrap|1=''k'' = 2<sup>31</sup> − 1}} supports transform sizes up to 2<sup>32</sup>.
to product problem, through fft.
 
By finding ifft (polynomial interpolation), for each <math>c_k </math>, one get the desired coefficients.
==Lower bounds==
 
Algorithm uses divide and conquer strategy, to divide problem to subproblems.
 
It has a time complexity of O(''n''&nbsp;log(''n'')&nbsp;log(log(''n''))).
 
==== History ====
 
The algorithm was invented by [[Volker Strassen|Strassen]] (1968). It was made practical and theoretical guarantees were provided in 1971 by [[Arnold Schönhage|Schönhage]] and Strassen resulting in the [[Schönhage–Strassen algorithm]].<ref name="schönhage">{{cite journal |first1=A. |last1=Schönhage |first2=V. |last2=Strassen |title=Schnelle Multiplikation großer Zahlen |journal=Computing |volume=7 |issue= 3–4|pages=281–292 |date=1971 |doi=10.1007/BF02242355 |s2cid=9738629 |url=https://link.springer.com/article/10.1007/BF02242355|url-access=subscription }}</ref>
 
=== Further improvements ===
 
In 2007 the [[asymptotic complexity]] of integer multiplication was improved by the Swiss mathematician [[Martin Fürer]] of Pennsylvania State University to <math display="inline">O(n \log n \cdot {2}^{\Theta(\log^*(n))})</math> using Fourier transforms over [[complex number]]s,<ref name="fürer_1">{{cite book |first=M. |last=Fürer |chapter=Faster Integer Multiplication |chapter-url=https://ivv5hpp.uni-muenster.de/u/cl/WS2007-8/mult.pdf |doi=10.1145/1250790.1250800 |title=Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, June 11–13, 2007, San Diego, California, USA |publisher= |___location= |date=2007 |isbn=978-1-59593-631-8 |pages=57–66 |s2cid=8437794 |url=}}</ref> where log<sup>*</sup> denotes the [[iterated logarithm]]. Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi gave a similar algorithm using [[modular arithmetic]] in 2008 achieving the same running time.<ref>{{cite book |first1=A. |last1=De |first2=C. |last2=Saha |first3=P. |last3=Kurur |first4=R. |last4=Saptharishi |chapter=Fast integer multiplication using modular arithmetic |chapter-url= |doi=10.1145/1374376.1374447 |title=Proceedings of the 40th annual ACM Symposium on Theory of Computing (STOC) |publisher= |___location= |date=2008 |isbn=978-1-60558-047-0 |pages=499–506 |url= |arxiv=0801.1416|s2cid=3264828 }}</ref> In context of the above material, what these latter authors have achieved is to find ''N'' much less than 2<sup>3''k''</sup> + 1, so that ''Z''/''NZ'' has a (2''m'')th root of unity. This speeds up computation and reduces the time complexity. However, these latter algorithms are only faster than Schönhage–Strassen for impractically large inputs.
 
In 2014, Harvey, [[Joris van der Hoeven]] and Lecerf<ref>{{cite journal
| last1 = Harvey | first1 = David
| last2 = van der Hoeven | first2 = Joris
| last3 = Lecerf | first3 = Grégoire
| arxiv = 1407.3360
| doi = 10.1016/j.jco.2016.03.001
| journal = Journal of Complexity
| mr = 3530637
| pages = 1–30
| title = Even faster integer multiplication
| volume = 36
| year = 2016}}</ref> gave a new algorithm that achieves a running time of <math>O(n\log n \cdot 2^{3\log^* n})</math>, making explicit the implied constant in the <math>O(\log^* n)</math> exponent. They also proposed a variant of their algorithm which achieves <math>O(n\log n \cdot 2^{2\log^* n})</math> but whose validity relies on standard conjectures about the distribution of [[Mersenne prime]]s. In 2016, Covanov and Thomé proposed an integer multiplication algorithm based on a generalization of [[Fermat primes]] that conjecturally achieves a complexity bound of <math>O(n\log n \cdot 2^{2\log^* n})</math>. This matches the 2015 conditional result of Harvey, van der Hoeven, and Lecerf but uses a different algorithm and relies on a different conjecture.<ref>{{cite journal |first1=Svyatoslav |last1=Covanov |first2=Emmanuel |last2=Thomé |title=Fast Integer Multiplication Using Generalized Fermat Primes |journal=[[Mathematics of Computation|Math. Comp.]] |volume=88 |year=2019 |issue=317 |pages=1449–1477 |doi=10.1090/mcom/3367 |arxiv=1502.02800 |s2cid=67790860 }}</ref> In 2018, Harvey and van der Hoeven used an approach based on the existence of short lattice vectors guaranteed by [[Minkowski's theorem]] to prove an unconditional complexity bound of <math>O(n\log n \cdot 2^{2\log^* n})</math>.<ref>{{cite journal |first1=D. |last1=Harvey |first2=J. |last2=van der Hoeven |year=2019 |title=Faster integer multiplication using short lattice vectors |journal=The Open Book Series |volume=2 |pages=293–310 |doi=10.2140/obs.2019.2.293 |arxiv=1802.07932|s2cid=3464567 }}</ref>
 
In March 2019, [[David Harvey (mathematician)|David Harvey]] and [[Joris van der Hoeven]] announced their discovery of an {{nowrap|''O''(''n'' log ''n'')}} multiplication algorithm.<ref>{{Cite magazine|url=https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/|title=Mathematicians Discover the Perfect Way to Multiply|last=Hartnett|first=Kevin|magazine=Quanta Magazine|date=11 April 2019|access-date=2019-05-03}}</ref> It was published in the ''[[Annals of Mathematics]]'' in 2021.<ref>{{cite journal | last1 = Harvey | first1 = David | last2 = van der Hoeven | first2 = Joris | author2-link = Joris van der Hoeven | doi = 10.4007/annals.2021.193.2.4 | issue = 2 | journal = [[Annals of Mathematics]] | mr = 4224716 | pages = 563–617 | series = Second Series | title = Integer multiplication in time <math>O(n \log n)</math> | volume = 193 | year = 2021| s2cid = 109934776 | url = https://hal.archives-ouvertes.fr/hal-02070778v2/file/nlogn.pdf }}</ref> Because Schönhage and Strassen predicted that ''n''&nbsp;log(''n'') is the "best possible" result, Harvey said: "...{{nbsp}}our work is expected to be the end of the road for this problem, although we don't know yet how to prove this rigorously."<ref>{{cite news |last1=Gilbert |first1=Lachlan |title=Maths whiz solves 48-year-old multiplication problem |url=https://newsroom.unsw.edu.au/news/science-tech/maths-whiz-solves-48-year-old-multiplication-problem |access-date=18 April 2019 |publisher=UNSW |date=4 April 2019}}</ref>
 
===Lower bounds===
There is a trivial lower bound of [[Big O notation#Family of Bachmann–Landau notations|Ω]](''n'') for multiplying two ''n''-bit numbers on a single processor; no matching algorithm (on conventional machines, that is on Turing equivalent machines) nor any sharper lower bound is known. The [[Hartmanis–Stearns conjecture]] would imply that <math>O(n)</math> cannot be achieved. Multiplication lies outside of [[ACC0|AC<sup>0</sup>[''p'']]] for any prime ''p'', meaning there is no family of constant-depth, polynomial (or even subexponential) size circuits using AND, OR, NOT, and MOD<sub>''p''</sub> gates that can compute a product. This follows from a constant-depth reduction of MOD<sub>''q''</sub> to multiplication.<ref>{{cite book |first1=Sanjeev |last1=Arora |first2=Boaz |last2=Barak |title=Computational Complexity: A Modern Approach |publisher=Cambridge University Press |date=2009 |isbn=978-0-521-42426-4 |url={{GBurl|8Wjqvsoo48MC|pg=PR7}}}}</ref> Lower bounds for multiplication are also known for some classes of [[branching program]]s.<ref>{{cite journal |first1=F. |last1=Ablayev |first2=M. |last2=Karpinski |title=A lower bound for integer multiplication on randomized ordered read-once branching programs |journal=Information and Computation |volume=186 |issue=1 |pages=78–89 |date=2003 |doi=10.1016/S0890-5401(03)00118-4 |url=https://core.ac.uk/download/pdf/82445954.pdf}}</ref>
 
==Complex number multiplication==
Complex multiplication normally involves four multiplications and two additions.
 
:<math>(a+bi) (c+di) = (ac-bd) + (bc+ad)i.</math>
 
Or
 
:<math>
\begin{array}{c|c|c}
\times & a & bi \\
\hline
c & ac & bci \\
\hline
di & adi & -bd
\end{array}
</math>
 
As observed by Peter Ungar in 1963, one can reduce the number of multiplications to three, using essentially the same computation as [[Karatsuba's algorithm]].<ref name="taocp-vol2-sec464-ex41">{{Citation | last1=Knuth | first1=Donald E. | author1-link=Donald Knuth | title=The Art of Computer Programming volume 2: Seminumerical algorithms | publisher=[[Addison-Wesley]] | year=1988 | pages=519, 706| title-link=The Art of Computer Programming }}
</ref> The product (''a''&nbsp;+&nbsp;''bi'') · (''c''&nbsp;+&nbsp;''di'') can be calculated in the following way.
 
:''k''<sub>1</sub> = ''c'' · (''a'' + ''b'')
:''k''<sub>2</sub> = ''a'' · (''d'' − ''c'')
:''k''<sub>3</sub> = ''b'' · (''c'' + ''d'')
:Real part = ''k''<sub>1</sub> − ''k''<sub>3</sub>
:Imaginary part = ''k''<sub>1</sub> + ''k''<sub>2</sub>.
 
This algorithm uses only three multiplications, rather than four, and five additions or subtractions rather than two. If a multiply is more expensive than three adds or subtracts, as when calculating by hand, then there is a gain in speed. On modern computers a multiply and an add can take about the same time so there may be no speed gain. There is a trade-off in that there may be some loss of precision when using floating point.
 
For [[fast Fourier transform]]s (FFTs) (or any [[Linear map|linear transformation]]) the complex multiplies are by constant coefficients ''c''&nbsp;+&nbsp;''di'' (called [[twiddle factor]]s in FFTs), in which case two of the additions (''d''−''c'' and ''c''+''d'') can be precomputed. Hence, only three multiplies and three adds are required.<ref>{{cite journal |first1=P. |last1=Duhamel |first2=M. |last2=Vetterli |title=Fast Fourier transforms: A tutorial review and a state of the art |journal=Signal Processing |volume=19 |issue=4 |pages=259–299 See Section 4.1 |date=1990 |doi=10.1016/0165-1684(90)90158-U |bibcode=1990SigPr..19..259D |url=https://core.ac.uk/download/pdf/147907050.pdf}}</ref> However, trading off a multiplication for an addition in this way may no longer be beneficial with modern [[floating-point unit]]s.<ref>{{cite journal |first1=S.G. |last1=Johnson |first2=M. |last2=Frigo |title=A modified split-radix FFT with fewer arithmetic operations |journal=IEEE Trans. Signal Process. |volume=55 |issue= 1|pages=111–9 See Section IV |date=2007 |doi=10.1109/TSP.2006.882087 |bibcode=2007ITSP...55..111J |s2cid=14772428 |url=https://www.fftw.org/newsplit.pdf }}</ref>
There is a trivial lower bound of [[Big O notation#Family of Bachmann–Landau notations|Ω]](''n'') for multiplying two ''n''-bit numbers on a single processor; no matching algorithm (on conventional machines, that is on Turing equivalent machines) nor any sharper lower bound is known. Multiplication lies outside of [[ACC0|AC<sup>0</sup>[''p'']]] for any prime ''p'', meaning there is no family of constant-depth, polynomial (or even subexponential) size circuits using AND, OR, NOT, and MOD<sub>''p''</sub> gates that can compute a product. This follows from a constant-depth reduction of MOD<sub>''q''</sub> to multiplication.<ref>Sanjeev Arora and Boaz Barak, ''Computational Complexity: A Modern Approach'', Cambridge University Press, 2009.</ref> Lower bounds for multiplication are also known for some classes of [[branching program]]s.<ref>Farid Ablayev and Marek Karpinski, ''A lower bound for integer multiplication on randomized ordered read-once branching programs'', Information and Computation 186 (2003), 78–89.</ref>
 
==Polynomial multiplication==
All the above multiplication algorithms can also be expanded to multiply [[polynomial]]s. For instanceAlternatively the Strassen[[Kronecker substitution]] algorithmtechnique may be used forto polynomialconvert the problem of multiplying polynomials into a single binary multiplication.<ref>{{citecitation web|urlfirst1 =http://everything2.com/title/Strassen+algorithm+for+polynomial+multiplication Joachim |titlelast1 =Strassen algorithmvon forzur polynomialGathen multiplication| author1-link = Joachim von zur Gathen |first2 = Jürgen | last2 = Gerhard |title = Modern Computer Algebra |publisher =Everything2 Cambridge University Press |year = 1999 |isbn = 978-0-521-64176-0 |pages = 243–244 |url = https://books.google.com/books?id=AE5PN5QGgvUC&pg=PA245 }}.</ref>
Alternatively the [[Kronecker substitution]] technique may be used to convert the problem of multiplying polynomials into a single binary multiplication.<ref>{{citation |first1 = Joachim |last1 = von zur Gathen | author1-link = Joachim von zur Gathen |first2 = Jürgen | last2 = Gerhard |title = Modern Computer Algebra |publisher = Cambridge University Press |year = 1999 |isbn = 978-0-521-64176-0 |pages = 243–244 |url = https://books.google.com/books?id=AE5PN5QGgvUC&pg=PA245 }}.</ref>
 
Long multiplication methods can be generalised to allow the multiplication of algebraic formulae:
Line 383 ⟶ 469:
———————————————————————————————————————
14a<sup>2</sup>c<sup>2</sup> -17a<sup>2</sup>bc 16ac 3a<sup>2</sup>b<sup>2</sup> -5ab +2
<nowiki>=======================================</nowiki><ref>{{cite book|last1=Castle|first1=Frank|title=Workshop Mathematics|url=https://archive.org/details/workshopmathema00castgoog|date=1900|publisher=MacMillan and Co|___location=London|page=[https://archive.org/details/workshopmathema00castgoog/page/n88 74|ref=harv]}}</ref>
 
As a further example of column based multiplication, consider multiplying 23 long tons (t), 12 hundredweight (cwt) and 2 quarters (qtr) by 47. This example uses [[avoirdupois]] measures: 1 t = 20 cwt, 1 cwt = 4 qtr.
Line 391 ⟶ 477:
47 x
————————————————
161141 8494 94
920940 480470
29 23
————————————————
Line 400 ⟶ 486:
<nowiki>=================</nowiki> Answer: 1110 ton 7 cwt 2 qtr
 
First multiply the quarters by 247, the result 94 is written into the first workspace. Next, multiply cwt 12*47 x= (2 + 10)*47 but don't add up the partial results (8494, 480470) yet. Likewise multiply 23 by 47 yielding (141, 940). The quarters column is totaled and the result placed in the second workspace (a trivial move in this case). 94 quarters is 23 cwt and 2 qtr, so place the 2 in the answer and put the 23 in the next column left. Now add up the three entries in the cwt column giving 587. This is 29 t 7 cwt, so write the 7 into the answer and the 29 in the column to the left. Now add up the tons column. There is no adjustment to make, so the result is just copied down.
 
The same layout and methods can be used for any traditional measurements and non-decimal currencies such as the old British [[£sd]] system.
Line 406 ⟶ 492:
==See also==
* [[Binary multiplier]]
* [[Dadda multiplier]]
* [[Division algorithm]]
* [[Horner scheme]] for evaluating of a polynomial
* [[Logarithm]]
* [[Matrix multiplication algorithm]]
* [[Mental calculation]]
* [[Number-theoretic transform]]
* [[Prosthaphaeresis]]
* [[Slide rule]]
* [[Trachtenberg system]]
* {{section link|Residue number system#Multiplication}} for another fast multiplication algorithm, specially efficient when many operations are done in sequence, such as in linear algebra
* [[Horner scheme]] for evaluating of a polynomial
* [[Wallace tree]]
* [[Residue number system#Multiplication]] for another fast multiplication algorithm, specially efficient when many operations are done in sequence, such as in linear algebra
 
==References==
Line 421 ⟶ 511:
* {{Cite book |title=Hacker's Delight |first=Henry S. |last=Warren Jr. |date=2013 |edition=2 |publisher=[[Addison Wesley]] - [[Pearson Education, Inc.]] |isbn=978-0-321-84268-8|title-link=Hacker's Delight }}
* {{cite web |title=Advanced Arithmetic Techniques |author-first=John J. G. |author-last=Savard |date=2018 |orig-year=2006 |work=quadibloc |url=http://www.quadibloc.com/comp/cp0202.htm |access-date=2018-07-16 |url-status=live |archive-url=https://web.archive.org/web/20180703001722/http://www.quadibloc.com/comp/cp0202.htm |archive-date=2018-07-03}}
* {{cite book |title=Low Power and Low Complexity Shift-and-Add Based Computations |author-first=Kenny |author-last=Johansson |series=Linköping Studies in Science and Technology |year=2008 |type=Dissertation thesis |id=No. 1201 |publisher=Department of Electrical Engineering, [[Linköping University]]<!-- printed by LiU-Tryck --> |publication-place=Linköping, Sweden |edition=1 |isbn=978-91-7393-836-5 |issn=0345-7524 |url=https://www.diva-portal.org/smash/get/diva2:1733/FULLTEXT02.pdf |access-date=2021-08-23 |url-status=live |archive-url=https://web.archive.org/web/20170813200504/http://www.diva-portal.org/smash/get/diva2:1733/FULLTEXT02.pdf |archive-date=2017-08-13}} (x+268 pages)
 
==External links==
 
===Basic arithmetic===
* [httphttps://www.nychold.com/em-arith.html The Many Ways of Arithmetic in UCSMP Everyday Mathematics]
* [httphttps://math.widulski.net/slides/CH05_MustAllGoodThings.ppt A Powerpoint presentation about ancient mathematics]
* [httphttps://www.pedagonet.com/maths/lattice.htm Lattice Multiplication Flash Video]
 
===Advanced algorithms===
* [httphttps://gmplib.org/manual/Multiplication-Algorithms.html#Multiplication%20Algorithms Multiplication Algorithms used by GMP]
 
{{Number-theoretic algorithms}}
Line 437 ⟶ 528:
[[Category:Computer arithmetic algorithms|*]]
[[Category:Multiplication]]
[[Category:Articles with example pseudocode]]