Average-case complexity: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:08, 16 March 2023 edit Highheath (talk \| contribs) 45 edits math formatting ← Previous edit		Latest revision as of 07:18, 21 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,865,668 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Analysis of algorithms \| #UCB_Category 22/47
(16 intermediate revisions by 9 users not shown)
Line 2: In [[computational complexity theory]], the '''average-case complexity''' of an [[algorithm]] is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with [[worst-case complexity]] which considers the maximal complexity of the algorithm over all possible inputs. There are three primary motivations for studying average-case complexity.<ref name="gol07">O.{{Cite journal \|last1=Goldreich ~~and S.~~\|first1=Oded \|last2=Vadhan, \|first2=Salil \|date=December 2007 \|title=Special ~~issue~~Issue onOn ~~worst~~Worst-case ~~versus~~Versus ~~average~~Average-case ~~complexity,~~Complexity ~~Comput.~~Editors' ~~Complex.~~Foreword \|journal=Computational Complexity \|language=en \|volume=16, \|issue=4 \|pages=325–330, ~~2007~~\|doi=10.1007/s00037-007-0232-y \|issn=1016-3328\|doi-access=free }}</ref> First, although some problems may be intractable in the worst-case, the inputs which elicit this behavior may rarely occur in practice, so the average-case complexity may be a more accurate measure of an algorithm's performance. Second, average-case complexity analysis provides tools and techniques to generate hard instances of problems which can be utilized in areas such as [[cryptography]] and [[derandomization]]. Third, average-case complexity allows discriminating the most efficient algorithm in practice among algorithms of equivalent best case complexity (for instance [[Quicksort#Formal analysis\|Quicksort]]). Average-case analysis requires a notion of an "average" input to an algorithm, which leads to the problem of devising a [[probability distribution]] over inputs. Alternatively, a [[randomized algorithm]] can be used. The analysis of such algorithms leads to the related notion of an '''expected complexity'''.<ref name="clrs"/>{{rp\|28}} Line 8: ==History and background== The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.<ref name="bog06">A.{{Cite journal \|last1=Bogdanov ~~and L.~~\|first1=Andrej \|last2=Trevisan, "\|first2=Luca \|date=2006 \|title=Average-Case Complexity," \|url=http://www.nowpublishers.com/article/Details/TCS-004 \|journal=Foundations and Trends in Theoretical Computer Science, ~~Vol.~~\|language=en \|volume=2~~, No~~ \|issue=1 ~~(2006)~~ \|pages=1–106 \|doi=10.1561/0400000004 \|issn=1551-305X\|url-access=subscription }}</ref> In 1973, [[Donald Knuth]]<ref name="knu73">D.{{cite book \| last = Knuth, \| first = Donald \| title = [[The Art of Computer Programming]]. ~~Vol.~~\| volume = 3, \| publisher = Addison-Wesley, \| date = 1973. }}</ref> published Volume 3 of the [[Art of Computer Programming]] which extensively surveys average-case performance of algorithms for problems solvable in worst-case polynomial time, such as sorting and median-finding. An efficient algorithm for [[NP-complete\|{{math\|'''NP'''}}-complete]] problems is generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two. The fundamental notions of average-case complexity were developed by [[Leonid Levin]] in 1986 when he published a one-page paper<ref name="levin86">L.{{Cite journal \|last=Levin, "\|first=Leonid A. \|date=February 1986 \|title=Average ~~case~~Case ~~complete~~Complete ~~problems,"~~Problems \|url=http://epubs.siam.org/doi/10.1137/0215020 \|journal=SIAM Journal on Computing, ~~vol.~~\|language=en \|volume=15~~, no.~~ \|issue=1~~, pp.~~ \|pages=285–286, ~~1986~~\|doi=10.1137/0215020 \|issn=0097-5397\|url-access=subscription }}</ref> defining average-case complexity and completeness while giving an example of a complete problem for {{math\|'''distNP'''}}, the average-case analogue of [[NP (complexity)\|{{math\|'''NP'''}}]]. ==Definitions== Line 26 ⟶ 28: </math> for every {{math\|''n'', ''t'' > 0}} and polynomial {{mvar\|p}}, where {{math\|''t''<sub>''A''</sub>(''x'')}} denotes the running time of algorithm {{mvar\|A}} on input {{mvar\|x}}, and {{mvar\|ε}} is a positive constant value.<ref name="wangsurvey">J.{{cite book \|last1=Wang, ~~"Average~~\|first1=Jie \|url=https://www.cs.uml.edu/~wang/acc-~~case computational complexity theory,"~~ forum/avgcomp.pdf \|title=Complexity Theory: Retrospective II, \|date=1997 \|publisher=Springer Science & Business Media \|editor-last1=Hemaspaandra \|editor-first1=Lane ppA. \|volume=2 \|pages=295–328, ~~1997~~\|chapter=Average-case computational complexity theory \|editor-last2=Selman \|editor-first2=Alan L.}}</ref> Alternatively, this can be written as :<math> Line 32 ⟶ 34: </math> for some constants {{mvar\|C}} and {{mvar\|ε}}, where {{math\|''n'' {{=}} {{abs\|''x''}}}}.<ref name="ab09">S.{{cite book \|last1=Arora ~~and B. Barak,~~\|first1=Sanjeev \|title=Computational Complexity: A Modern Approach, \|last2=Barak \|first2=Boaz \|date=2009 \|publisher=Cambridge University Press, \|___location=Cambridge; New York, ~~NY, 2009~~\|chapter=18. Average case complexity: Levin’s theory}}</ref> In other words, an algorithm {{mvar\|A}} has good average-case complexity if, after running for {{math\|''t''<sub>''A''</sub>(''n'')}} steps, {{mvar\|A}} can solve all but a {{math\|{{sfrac\|''n''<sup>''c''</sup>\|(''t''<sub>''A''</sub>(''n''))<sup>''ε''</sup>}}}} fraction of inputs of length {{mvar\|n}}, for some {{math\|''ε'', ''c'' > 0}}.<ref name="bog06"/> ===Distributional problem=== Line 60 ⟶ 62: The average-case analogue to {{math\|'''NP'''}}-completeness is {{math\|'''distNP'''}}-completeness. A distributional problem {{math\|(''L′'', ''D′'')}} is {{math\|'''distNP'''}}-complete if {{math\|(''L′'', ''D′'')}} is in {{math\|'''distNP'''}} and for every {{math\|(''L'', ''D'')}} in {{math\|'''distNP'''}}, {{math\|(''L'', ''D'')}} is average-case reducible to {{math\|(''L′'', ''D′'')}}.<ref name="ab09" /> An example of a {{math\|'''distNP'''}}-complete problem is the Bounded Halting Problem, {{math\|({{mvar\|BH}},''D'')}} (for any {{math\|'''P'''}}-computable ''D'') defined as follows: <math>BH = \{(M, x, 1^t) : M \text{ is a non-deterministic Turing machine that accepts } x \text{ in} \leq t \text{ steps}\}</math><ref name="ab09"/> Line 66 ⟶ 68: In his original paper, Levin showed an example of a distributional tiling problem that is average-case {{math\|'''NP'''}}-complete.<ref name="levin86"/> A survey of known {{math\|'''distNP'''}}-complete problems is available online.<ref name="wangsurvey"/> One area of active research involves finding new {{math\|'''distNP'''}}-complete problems. However, finding such problems can be complicated due to a result of Gurevich which shows that any distributional problem with a flat distribution cannot be {{math\|'''distNP'''}}-complete unless [[EXP\|{{math\|'''EXP'''}}]] = [[NEXP\|{{math\|'''NEXP'''}}]].<ref name="gur87">Y.{{Cite book \|last=Gurevich, "\|first=Yuri \|chapter=Complete and incomplete randomized NP problems", ~~Proc.~~\|date=October 1987 \|title=28th Annual ~~Symp.~~Symposium on ~~Found.~~Foundations of Computer Science, ~~IEEE~~(SFCS (1987)~~, pp.~~ \|pages=111–117 \|doi=10.1109/SFCS.1987.14\|isbn=0-8186-0807-2 }}</ref> (A flat distribution {{mvar\|μ}} is one for which there exists an {{math\|''ε'' > 0}} such that for any {{mvar\|x}}, {{math\|''μ''(''x'') ≤ 2<sup>−{{abs\|''x''}}<sup>''ε''</sup></sup>}}.) A result by Livne shows that all natural {{math\|'''NP'''}}-complete problems have {{math\|'''DistNP'''}}-complete versions.<ref name="livne06">N.{{Cite journal \|last=Livne, "\|first=Noam \|date=December 2010 \|title=All Natural NP-Complete Problems Have Average-Case Complete Versions," \|url=http://link.springer.com/10.1007/s00037-010-0298-9 \|journal=Computational Complexity ~~(2010)~~\|language=en \|volume=19~~:477.~~ ~~https://~~\|issue=4 \|pages=477–499 \|doi~~.org/~~=10.1007/s00037-010-0298-9 \|issn=1016-3328\|url-access=subscription }}</ref> However, the goal of finding a natural distributional problem that is {{math\|'''DistNP'''}}-complete has not yet been achieved.<ref name="gol97">O.{{Citation \|last=Goldreich, "\|first=Oded \|title=Notes on Levin's ~~theory~~Theory of ~~average~~Average-~~case~~Case ~~complexity,"~~Complexity ~~Technical~~\|date=2011 ~~Report~~\|work=Studies ~~TR97-058,~~in ~~Electronic~~Complexity ~~Colloquium~~and Cryptography. Miscellanea on ~~Computational~~the ~~Complexity~~Interplay between Randomness and Computation \|series=Lecture Notes in Computer Science \|volume=6650 \|pages=233–247 \|editor-last=Goldreich \|editor-first=Oded \|url=http://link.springer.com/10.1007/978-3-642-22670-0_21 \|access-date=2025-05-21 \|place=Berlin, ~~1997~~Heidelberg \|publisher=Springer Berlin Heidelberg \|doi=10.1007/978-3-642-22670-0_21 \|isbn=978-3-642-22669-4\|url-access=subscription }}</ref> ==Applications== ===Sorting algorithms=== As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such as [[Quicksort]], have a worst-case running time of {{math\|O(''n''<sup>2</sup>)}}, but an average-case running time of {{math\|O(''n'' log(''n''))}}, where {{mvar\|n}} is the length of the input to be sorted.<ref name="clrs">{{cite book \| last1 = Cormen, \| first1 = Thomas H.; \| last2 = Leiserson, \| first2 = Charles E., \| last3 = Rivest, \| first3 = Ronald L., \| last4 = Stein, \| first4 = Clifford ~~(2009)~~\| ~~[1990].~~title = Introduction to Algorithms (\| edition = 3rd ~~ed.).~~\| date = 2009 \| orig-year = 1990 \| publisher = MIT Press and McGraw-Hill. ~~{{ISBN~~\| isbn=978-0-262-03384-48 \|url=https://www.worldcat.org/title/311310321 \|oclc=311310321}}.</ref> ===Cryptography=== For most problems, average-case complexity analysis is undertaken to find efficient algorithms for a problem that is considered difficult in the worst-case. In cryptographic applications, however, the opposite is true: the worst-case complexity is irrelevant; we instead want a guarantee that the average-case complexity of every algorithm which "breaks" the cryptographic scheme is inefficient.<ref name="katz07">J.{{Cite book \|last1=Katz ~~and~~\|first1=Jonathan Y.\|title=Introduction to modern cryptography \|last2=Lindell, ~~Introduction~~\|first2=Yehuda to\|date=2021 ~~Modern~~\|publisher=CRC ~~Cryptography~~Press (\|isbn=978-1-351-13303-6 \|edition=3 \|series=Chapman ~~and~~& Hall/~~Crc~~CRC ~~Cryptography~~cryptography and ~~Network~~network ~~Security~~security ~~Series),~~series ~~Chapman~~\|___location=Boca ~~and Hall/CRC~~Raton, ~~2007.~~FL}}</ref>{{pn\|date=May 2025}} Thus, all secure cryptographic schemes rely on the existence of [[one-way functions]].<ref name="bog06"/> Although the existence of one-way functions is still an open problem, many candidate one-way functions are based on hard problems such as [[integer factorization]] or computing the [[discrete log]]. Note that it is not desirable for the candidate function to be {{math\|'''NP'''}}-complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are in {{math\|'''NP''' ∩ }}[[coNP\|{{math\|'''coNP'''}}]], and are therefore not believed to be {{math\|'''NP'''}}-complete.<ref name="ab09"/> The fact that all of cryptography is predicated on the existence of average-case intractable problems in {{math\|'''NP'''}} is one of the primary motivations for studying average-case complexity. ==Other results== [[Yao's principle]], from a 1978 paper by [[Andrew Yao]], shows that for broad classes of computational problems, average-case complexity for a hard input distribution and a deterministic algorithm adapted to that distribution is the same thing as expected complexity for a fast randomized algorithm and its worst-case input.<ref>{{citation \| last = ~~Impagliazzo~~Yao \| first = ~~Russell~~Andrew \| author-link = ~~Russell~~Andrew ~~Impagliazzo~~Yao▼ \| contribution = Probabilistic computations: Toward a unified measure of complexity \| doi = 10.1109/SFCS.1977.24 \| pages = ~~103–106~~222–227▼ \| title = Proceedings of the 18th IEEE Symposium on Foundations of Computer Science (FOCS) \| year = ~~1986~~1977}}.</ref>▼ In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for a {{math\|'''distNP'''}}-complete problem under the uniform distribution, then there is an average-case algorithm for every problem in {{math\|'''NP'''}} under any polynomial-time samplable distribution.<ref name="imp90">R. Impagliazzo and L. Levin, "No Better Ways to Generate Hard NP Instances than Picking Uniformly at Random," in Proceedings of the 31st IEEE Sympo- sium on Foundations of Computer Science, pp. 812–821, 1990.</ref> Applying this theory to natural distributional problems remains an outstanding open question.<ref name="bog06"/> In 1992, Ben-David et al. showed that if all languages in {{math\|'''distNP'''}} have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language in {{math\|'''NP'''}} is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.<ref name="bd92">S.{{Cite book \|last1=Ben-David, B\|first1=S. \|last2=Chor, O\|first2=B. \|last3=Goldreich, ~~and M~~\|first3=O. ~~Luby, "~~\|chapter=On the theory of average case complexity," ~~Journal~~\|date=1989 \|title=Proceedings of ~~Computer~~the ~~and~~twenty-first ~~System~~annual ~~Sciences,~~ACM ~~vol.~~symposium ~~44,~~on ~~no.~~Theory 2,of ppcomputing - STOC '89 \|chapter-url=http://portal.acm.org/citation.cfm?doid=73007.73027 ~~193–219,~~\|language=en ~~1992~~\|publisher=ACM Press \|pages=204–216 \|doi=10.1145/73007.73027 \|isbn=978-0-89791-307-2}}</ref> Thus, cryptographic one-way functions can exist only if there are {{math\|'''distNP'''}} problems over the uniform distribution that are hard on average for decision algorithms. In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for a {{math\|'''distNP'''}}-complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems in {{math\|'''NP'''}}.<ref name="ff93">J.{{Cite journal \|last1=Feigenbaum ~~and L.~~\|first1=Joan \|last2=Fortnow, "\|first2=Lance \|date=October 1993 \|title=Random-~~self~~Self-~~reducibility~~Reducibility of ~~complete~~Complete ~~sets,"~~Sets \|url=http://epubs.siam.org/doi/10.1137/0222061 \|journal=SIAM Journal on Computing, ~~vol.~~\|language=en \|volume=22, ~~pp.~~\|issue=5 \|pages=994–1005, ~~1993~~\|doi=10.1137/0222061 \|issn=0097-5397\|url-access=subscription }}</ref> In 2003, Bogdanov and Trevisan generalized this result to arbitrary non-adaptive reductions.<ref name="bog03">A.{{Cite journal \|last1=Bogdanov ~~and L.~~\|first1=Andrej \|last2=Trevisan, "\|first2=Luca \|date=January 2006 \|title=On ~~worst~~Worst-~~case~~Case to ~~average~~Average-~~case~~Case ~~reductions~~Reductions for NP ~~problems,"~~Problems in\|url=https://epubs.siam.org/doi/10.1137/S0097539705446974 ~~Proceedings~~\|journal=SIAM ~~of the 44th IEEE Symposium~~Journal on ~~Foundations~~Computing of\|language=en ~~Computer~~\|volume=36 ~~Science,~~\|issue=4 pp\|pages=1119–1159 \|doi=10.1137/S0097539705446974 ~~308–317,~~\|issn=0097-5397\|url-access=subscription ~~2003.~~}}</ref> These results show that it is unlikely that any association can be made between average-case complexity and worst-case complexity via reductions.<ref name="bog06"/> ==See also== Line 96 ⟶ 106: ==Further reading== Pedagogical presentations: * {{Cite book \|last=Impagliazzo \|first=R. \|chapter=A personal view of average-case complexity \|date=1995 \|title=Proceedings of Structure in Complexity Theory. Tenth Annual IEEE Conference \|publisher=IEEE Comput. Soc. Press \|pages=134–147 \|doi=10.1109/SCT.1995.514853 \|isbn=978-0-8186-7052-7}} * {{cite book \|last1=Wang \|first1=Jie \|url=https://www.cs.uml.edu/~wang/acc-forum/avgcomp.pdf \|title=Complexity Theory: Retrospective II \|date=1997 \|publisher=Springer Science & Business Media \|editor-last1=Hemaspaandra \|editor-first1=Lane A. \|volume=2 \|pages=295–328 \|chapter=Average-case computational complexity theory \|editor-last2=Selman \|editor-first2=Alan L.}} * {{Citation \|last=Goldreich \|first=Oded \|title=Average Case Complexity, Revisited \|date=2011 \|work=Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation \|series=Lecture Notes in Computer Science \|volume=6650 \|pages=422–450 \|editor-last=Goldreich \|editor-first=Oded \|url=https://www.wisdom.weizmann.ac.il/~oded/COL/aver.pdf \|place=Berlin, Heidelberg \|publisher=Springer Berlin Heidelberg \|doi=10.1007/978-3-642-22670-0_29 \|isbn=978-3-642-22669-4}} * {{cite book \|last1=Arora \|first1=Sanjeev \|title=Computational Complexity: A Modern Approach \|last2=Barak \|first2=Boaz \|date=2009 \|publisher=Cambridge University Press \|___location=Cambridge; New York \|chapter=18. Average case complexity: Levin’s theory}} The literature of average case complexity includes the following work: {{citation▼ \| last = Franco \| first = John▼ \| doi = 10.1016/0020-0190(86)90051-7▼ \| issue = 2▼ \| journal = Information Processing Letters▼ ▲ \| pages = 103–106 \| title = On the probabilistic performance of algorithms for the satisfiability problem▼ \| volume = 23▼ ▲ \| year = 1986}}. {{citation \| last = Levin \| first = Leonid \| author-link = Leonid Levin Line 114 ⟶ 122: \| title = Average case complete problems \| volume = 15 \| year = 1986}}. ▲{{citation ▲ \| last = Franco \| first = John ▲ \| doi = 10.1016/0020-0190(86)90051-7 ▲ \| issue = 2 ▲ \| journal = Information Processing Letters \| pages = 103–106 ▲ \| title = On the probabilistic performance of algorithms for the satisfiability problem ▲ \| volume = 23 \| year = 1986}}.. {{citation \| last1 = Flajolet \| first1 = Philippe \| author1-link = Philippe Flajolet Line 134 ⟶ 151: {{citation \| last1 = Ben-David \| first1 = Shai \| last2 = Chor \| first2 = Benny \| author2-link = Benny Chor \| last3 = Goldreich \| first3 = Oded \| author3-link = Oded Goldreich \| last4 = Luby \| first4 = Michael \| author4-link = Michael Luby Line 194 ⟶ 211: \| url = http://www.cs.nyu.edu/csweb/Research/TechReports/TR1995-711/TR1995-711.pdf \| year = 1995}}. {{citation ▲ \| last = Impagliazzo \| first = Russell \| author-link = Russell Impagliazzo ~~\| date = April 17, 1995~~ ~~\| publisher = [[University of California, San Diego]]~~ ~~\| title = A personal view of average-case complexity~~ ~~\| url = http://www-cse.ucsd.edu/~russell/average.ps}}.~~ Paul E. Black, [https://web.archive.org/web/20090214175940/http://www.itl.nist.gov/div897/sqg/dads/HTML/theta.html "Θ"], in Dictionary of Algorithms and Data Structures[online]Paul E. Black, ed., U.S. National Institute of Standards and Technology. 17 December 2004.Retrieved Feb. 20/09. Christos Papadimitriou (1994). Computational Complexity. Addison-Wesley. [[Category:Randomized algorithms]]