Content deleted Content added
restructured according to MOS:APPENDIX |
m typo; mos |
||
Line 1:
'''Average-case complexity''' is a subfield of [[computational complexity theory]] that studies the complexity of [[algorithms]] over inputs drawn randomly from a particular [[probability distribution]]. It is frequently contrasted with [[worst-case complexity]] which considers the maximal complexity of the algorithm over all possible inputs.
There are three primary motivations for studying average-case complexity.<ref name="gol07">O. Goldreich and S. Vadhan, Special issue on worst-case versus average-case complexity, Comput. Complex. 16,
==History and
The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.<ref name="bog06">A. Bogdanov and L. Trevisan, "Average-Case Complexity," Foundations and Trends in Theoretical Computer Science, Vol. 2, No 1 (2006)
An efficient algorithm for [[NP-complete]] problems in generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two.
The fundamental notions of average-case complexity were developed by [[Leonid Levin]] in 1986 when he published a one-page paper<ref name="levin86">L. Levin, "Average case complete problems," SIAM Journal on Computing, vol. 15, no. 1, pp.
==Definitions==
===Efficient
The first task is to precisely define what is meant by an algorithm which is efficient "on average". An initial attempt might define an efficient average-case algorithm as one which runs in expected polynomial time over all possible inputs. Such a definition has various shortcomings; in particular, it is not robust to changes in the computational model. For example, suppose algorithm A runs in time t<sub>A</sub>(x) on input x and algorithm B runs in time t<sub>A</sub>(x)<sup>2</sup> on input x; that is, B is quadratically slower than A. Intuitively, any definition of average-case efficiency should capture the idea that A is efficient-on-average if and only if B is efficient on-average. Suppose, however, that the inputs are drawn randomly from the uniform distribution of strings with length n, and that A runs in time n<sup>2</sup> on all inputs except the string 1<sup>n</sup> for which A takes time 2<sup>n</sup>. Then it can be easily checked that the expected running time of A is polynomial but the expected running time of B is exponential.<ref name="bog06" />
Line 23:
</math>
for every n, t, ε > 0 and polynomial p, where t<sub>A</sub>(x) denotes the running time of algorithm A on input x.<ref name="wangsurvey">J. Wang, "Average-case computational complexity theory," Complexity Theory Retrospective II, pp.
<math>
Line 31:
for some constant C, where n = |x|.<ref name="ab09">S. Arora and B. Barak, Computational Complexity: A Modern Approach, Cambridge University Press, New York, NY, 2009.</ref> In other words, an algorithm A has good average-case complexity if, after running for t<sub>A</sub>(n) steps, A can solve all but a <math>\frac{n^c}{(t_A(n))^{\epsilon}}</math> fraction of inputs of length n, for some ε, c > 0.<ref name="bog06"/>
===Distributional
The next step is to define the "average" input to a particular problem. This is achieved by associating the inputs of each problem with a particular probability distribution. That is, an "average-case" problem consists of a language L and an associated probability distribution D which forms the pair (L, D).<ref name="ab09"/> The two most common classes of distributions which are allowed are:
Line 46:
Together, AvgP and distNP define the average-case analogues of P and NP, respectively.<ref name="ab09"/>
==Reductions
Let (L,D) and (L',D') be two distributional problems. (L, D) average-case reduces to (L', D') (written (L, D) ≤<sub>AvgP</sub> (L', D')) if there is a function f that for every n, on input x can be computed in time polynomial in n and
Line 54:
The domination condition enforces the notion that if problem (L, D) is hard on average, then (L', D') is also hard on average. Intuitively, a reduction should provide a way to solve an instance x of problem L by computing f(x) and feeding the output to the algorithm which solves L'. Without the domination condition, this may not be possible since the algorithm which solves L in polynomial time on average may take super-polynomial time on a small number of inputs but f may map these inputs into a much larger set of D' so that algorithm A' no longer runs in polynomial time on average. The domination condition only allows such strings to occur polynomially as often in D'.<ref name="wangsurvey"/>
===DistNP-
The average-case analogue to NP-completeness is distNP-completeness. A distributional problem (L', D') is distNP-complete if (L', D') is in distNP and for every (L, D) in distNP, (L, D) is average-case reducible to (L', D').<ref name="ab09" />
Line 63:
In his original paper, Levin showed an example of a distributional tiling problem that is average-case NP-complete.<ref name="levin86"/> A survey of known distNP-complete problems is available online.<ref name="wangsurvey"/>
One area of active research involves finding new distNP-complete problems. However, finding such problems can be complicated due to a result of Gurevich which shows that any distributional problem with a flat distribution cannot be distNP-complete unless [[EXP]] = [[NEXP]].<ref name="gur87">Y. Gurevich, "Complete and incomplete randomized NP problems", Proc. 28th Annual Symp. on Found. of Computer Science, IEEE (1987), pp.
==
===Sorting
As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such as [[Quicksort]], have a worst-case running time of O(n<sup>2</sup>), but an average-case running time of O(nlog(n)), where n is the length of the input to be sorted.<ref name="clrs">Cormen, Thomas H.; Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.</ref>
Line 75:
Thus, all secure cryptographic schemes rely on the existence of [[one-way functions]].<ref name="bog06"/> Although the existence of one-way functions is still an open problem, many candidate one-way functions are based on NP-hard problems such as [[integer factorization]] or computing the [[discrete log]]. Note that it is not desirable for the candidate function to be NP-complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are in NP ∩ [[coNP]], and are therefore not believed to be NP-complete.<ref name="ab09"/> The fact that all of cryptography is predicated on the existence of average-case intractable problems in NP is one of the primary motivations for studying average-case complexity.
==Other
In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for a distNP-complete problem under the uniform distribution, then there is an average-case algorithm for every problem in NP under any polynomial-time samplable distribution.<ref name="imp90">R. Impagliazzo and L. Levin, "No Better Ways to Generate Hard NP Instances than Picking Uniformly at Random," in Proceedings of the 31st IEEE Sympo- sium on Foundations of Computer Science, pp.
In 1992, Ben-David et al., showed that if all languages in distNP have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language in NP is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.<ref name="bd92">S. Ben-David, B. Chor, O. Goldreich, and M. Luby, "On the theory of average case complexity," Journal of Computer and System Sciences, vol. 44, no. 2, pp.
In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for a distNP-complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems in NP.<ref name="ff93">J. Feigenbaum and L. Fortnow, "Random-self-reducibility of complete sets," SIAM Journal on Computing, vol. 22, pp.
==See also==
Line 88:
==References==
{{Reflist|30em}}
==Further reading==
|