Pollard's rho algorithm: Difference between revisions

Content deleted Content added
The example {{mvar|n}} = 10403 = 101 · 103: Remove C++ and Python code; pseudocode is enough. The table with elaborated example can stay.
m added wikilinks
Line 1:
{{about|the integer factorization algorithm|the discrete logarithm algorithm|Pollard's rho algorithm for logarithms}}
 
'''Pollard's rho algorithm''' is an [[algorithm]] for [[integer factorization]]. It was invented by [[John Pollard (mathematician)|John Pollard]] in 1975.<ref>{{cite journal |last=Pollard |first=J. M. |year=1975 |title=A Monte Carlo method for factorization |journal=BIT Numerical Mathematics |volume=15 |issue=3 |pages=331–334 |doi=10.1007/bf01933667}}</ref> It uses only a small amount of space, and its expected running time is proportional to the [[square root]] of the size of the smallest [[prime factor]] of the [[composite number]] being factorized.
 
== Core ideas ==
The algorithm is used to factorize a number <math>n = pq</math>, where <math>p</math> is a non-trivial factor. A [[polynomial]] modulo <math>n</math>, called <math>g(x)</math> (e.g., <math>g(x) = (x^2 + 1) \bmod n</math>), is used to generate a [[pseudorandom sequence]]: A starting value, say 2, is chosen, and the sequence continues as <math>x_1 = g(2)</math>, <math>x_2 = g(g(2))</math>, <math>x_3 = g(g(g(2)))</math>, etc. The sequence is related to another sequence <math>\{x_k \bmod p\}</math>. Since <math>p</math> is not known beforehand, this sequence cannot be explicitly computed in the algorithm. Yet, in it lies the core idea of the algorithm.
 
Because the number of possible values for these sequences is finite, both the <math>\{x_k\}</math> sequence, which is mod <math>n</math>, and <math>\{x_k \bmod p\}</math> sequence will eventually repeat, even though these values are unknown. If the sequences were to behave like random numbers, the [[birthday paradox]] implies that the number of <math>x_k</math> before a repetition occurs would be expected to be <math>O(\sqrt N)</math>, where <math>N</math> is the number of possible values. So the sequence <math>\{x_k \bmod p\}</math> will likely repeat much earlier than the sequence <math>\{x_k\}</math>. Once a sequence has a repeated value, the sequence will cycle, because each value depends only on the one before it. This structure of eventual cycling gives rise to the name "rho algorithm", owing to similarity to the shape of the Greek characterletter ''ρ'' when the values <math>x_1 \bmod p</math>, <math>x_2 \bmod p</math>, etc. are represented as nodes in a [[directed graph]].
 
[[File:Pollard rho cycle.svg|thumb|Cycle diagram resembling the Greek letter&nbsp;''ρ'']]
 
This is detected by [[Floyd's cycle-finding algorithm]]: two nodes <math>i</math> and <math>j</math> (i.e., <math>x_i</math> and <math>x_j</math>) are kept. In each step, one moves to the next node in the sequence and the other moves forward by two nodes. After that, it is checked whether <math>\gcd(x_i - x_j, n) \ne 1</math>. If it is not 1, then this implies that there is a repetition in the <math>\{x_k \bmod p\}</math> sequence (i.e. <math>x_i \bmod p = x_j \bmod p)</math>. This works because if the <math>x_i</math> is the same as <math>x_j</math>, the difference between <math>x_i</math> and <math>x_j</math> is necessarily a multiple of <math>p</math>. Although this always happens eventually, the resulting [[Greatestgreatest common divisor]] (GCD) is a divisor of <math>n</math> other than 1. This may be <math>n</math> itself, since the two sequences might repeat at the same time. In this (uncommon) case the algorithm fails, and can be repeated with a different parameter.
 
== Algorithm ==
The algorithm takes as its inputs {{mvar|n}}, the [[integer]] to be factored; and {{tmath|g(x)}}, a polynomial in {{mvar|x}} computed modulo {{mvar|n}}. In the original algorithm, <math>g(x) = (x^2 - 1) \bmod n</math>, but nowadays it is more common to use <math>g(x) = (x^2 + 1) \bmod n</math>. The output is either a non-trivial factor of {{mvar|n}}, or failure. It performs the following steps:<ref>{{cite book |last=Cormen |first=Thomas H. |authorlink=Thomas H. Cormen |last2=Leiserson |first2=Charles E. |authorlink2=Charles E. Leiserson |last3=Rivest |first3=Ronald L. |authorlink3=Ronald L. Rivest |last4=Stein |first4=Clifford |authorlink4=Clifford Stein |name-list-style=amp |chapter=Section 31.9: Integer factorization |title=[[Introduction to Algorithms]] |year=2009 |edition=third |publisher=MIT Press |___location=Cambridge, MA |pages=975–980|isbn=978-0-262-03384-8 }} (this section discusses only Pollard's rho algorithm).</ref>
 
x ← 2
Line 29:
'''return''' d
 
Here {{mvar|x}} and {{mvar|y}} corresponds to {{tmath|x_i}} and {{tmath|x_j}} in the previous section about core idea. Note that this algorithm may fail to find a nontrivial factor even when {{mvar|n}} is composite. In that case, the method can be tried again, using a starting value other than 2 or a different {{tmath|g(x)}}.
 
== Example factorization ==
Line 35:
[[File:Rho-example-animated.gif|thumb|348x348px|Pollard's rho algorithm example factorization for <math>n=253</math> and <math>g(x)=x^2 \bmod 253</math>, with starting value 2. The example is using [[Floyd's cycle-finding algorithm]].]]
{| class="wikitable" style="text-align:right"
! width=30 | {{mvar|i}} || width=60 | {{mvar|x}} || width=60 | {{mvar|y}} || {{math|GCDgcd({{abs|''x'' &minus; ''y''}}, 8051)}}
|-
| 1 || 5 || 26 || 1
Line 51:
In 1980, [[Richard Brent (scientist)|Richard Brent]] published a faster variant of the rho algorithm. He used the same core ideas as Pollard but a different method of cycle detection, replacing [[Floyd's cycle-finding algorithm]] with the related [[Cycle detection#Brent.27s algorithm|Brent's cycle finding method]].<ref>{{cite journal |last=Brent |first=Richard P. |authorlink=Richard Brent (scientist) |year=1980 |title=An Improved Monte Carlo Factorization Algorithm |journal=BIT |volume=20 |pages=176–184 |url=http://maths-people.anu.edu.au/~brent/pub/pub051.html |doi=10.1007/BF01933190}}</ref>
 
A further improvement was made by Pollard and Brent. They observed that if <math>\gcd (a,n) > 1</math>, then also <math>\gcd (ab,n) > 1</math> for any positive integer {{tmath|b}}. In particular, instead of computing <math>\gcd (|x-y|,n)</math> at every step, it suffices to define {{tmath|z}} as the product of 100 consecutive <math>|x-y|</math> terms modulo {{tmath|n}}, and then compute a single <math>\gcd (z,n)</math>. A major speed up results as 100 {{math|''gcd''}} steps are replaced with 99 multiplications modulo {{tmath|n}} and a single {{math|''gcd''}}. Occasionally it may cause the algorithm to fail by introducing a repeated factor, for instance when {{tmath|n}} is a [[square (algebra)|square]]. But it then suffices to go back to the previous {{math|''gcd''}} term, where <math>\gcd(z,n)=1</math>, and use the regular ''ρ'' algorithm from there.
 
== Application ==