Element distinctness problem: Difference between revisions

Content deleted Content added
m Use journal version
Undid revision 1264526708 by 37.244.251.1 (talk) uncorrect. More than one time is more than n/n times, not more than n/2 times.
 
(48 intermediate revisions by 25 users not shown)
Line 1:
In [[computational complexity theory]], the '''element distinctness problem''' or '''element uniqueness problem''' is the problem of determining whether all the elements of a list are distinct.
 
It is a well studied problem in many different models of computation. The problem may be solved by [[sorting]] the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a [[randomized algorithm]] that inserts each item into a [[hash table]] and compares only those pairs of elements that are placed in the same hash table cell.<ref>{{Cite journalcitation
| last1 = Gil | first1 = J.
| last2 = Meyer auf der Heide | first2 = F.
| last3 = Wigderson | first3 = A. | author3-link = Avi Wigderson
| contribution = Not all keys can be hashed in constant time
| doi = 10.1145/100216.100247
| pages = 244–253
| title = [[Symposium on Theory of Computing|Proc. 22nd ACM Symposium on Theory of Computing (STOC 1990)]]
| year = 1990| s2cid = 11943779
| doi-access = free
| postscript = <!--None-->}}.</ref>
}}.</ref>
 
Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.
It is known that the problem's [[time complexity]] is [[Big O notation|&Theta;]](''n'' log ''n''), i.e., both the upper and lower bounds on its time complexity are of order of the [[linearithmic function]] in a [[model of computation]] known as the [[algebraic computation tree]] model,<ref>[[Michael Ben-Or]], "Lower bounds for algebraic computation trees", Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 1983, pp. 80-86</ref> a model of computation in which the elements may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. In other words, an [[asymptotically optimal]] algorithm of linearithmic time complexity is known for this model. The algebraic computation tree model basically means that the allowable algorithms are only the ones that can perform polynomial operations of bounded degree on the input data and comparisons of the results of these computations.
 
==Decision tree complexity==
The same lower bound was later proved for the [[randomized complexity|randomized]] [[algebraic decision tree]] model.<ref>{{cite journal|doi=10.1007/BF01270387|title=A lower bound for randomized algebraic decision trees|year=1996|author=Grigoriev, Dima|journal=Computational Complexity|volume=6|pages=357|last2=Karpinski|first2=Marek|last3=Heide|first3=Friedhelm Meyer|last4=Smolensky|first4=Roman}}</ref>
The number of comparisons needed to solve the problem of size <math>n</math>, in a comparison-based model of computation such as a [[decision tree]] or [[algebraic decision tree]], is <math>\Theta(n\log n)</math>. Here, <math>\Theta</math> invokes [[Big O notation|big theta notation]], meaning that the problem can be solved in a number of comparisons proportional to <math>n\log n</math> (a [[linearithmic function]]) and that all solutions require this many comparisons.<ref>{{citation|first=Michael|last=Ben-Or|contribution=Lower bounds for algebraic computation trees|title=[[Symposium on Theory of Computing|Proc. 15th ACM Symposium on Theory of Computing]]|year=1983|pages=80–86|doi=10.1145/800061.808735|doi-access=free}}.</ref> In these models of computation, the input numbers may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. For these models, an algorithm based on [[comparison sort]] solves the problem within a constant factor of the best possible number of comparisons. The same lower bound applies as well to the [[expected value|expected number]] of comparisons in the [[randomized complexity|randomized]] [[algebraic decision tree]] model.<ref>{{citation|doi=10.1007/BF01270387|title=A lower bound for randomized algebraic decision trees|year=1996|last1=Grigoriev|first1=Dima|authorlink=Dima Grigoriev|journal=Computational Complexity|volume=6|pages=357|last2=Karpinski|first2=Marek|author2-link=Marek Karpinski|last3=Heide|first3=Friedhelm Meyer|last4=Smolensky|first4=Roman|issue=4|s2cid=1462184 }}.</ref><ref>{{citation|doi=10.1007/s000370050002|title=Complexity lower bounds for randomized computation trees over zero characteristic fields|year=1999|last=Grigoriev|first=Dima|authorlink=Dima Grigoriev|journal=Computational Complexity|volume=8|pages=316–329|issue=4|s2cid=10641238 }}.</ref>
 
==Real RAM Complexity==
It is also known that [[quantum algorithm]]s can solve this problem faster in <math>\Theta(N^{2/3})</math> queries. The optimal algorithm is by [[Andris Ambainis]],<ref>{{Citation | last1=Ambainis | first1=Andris | author1-link=Andris Ambainis | title=Quantum walk algorithm for element distinctness | id = {{arxiv|quant-ph/0311001}} | doi = 10.1137/S0097539705447311 | journal = [[SIAM Journal on Computing]] | volume = 37 | issue = 1 | year = 2007 | pages=210–239 }}.</ref> and the lower bound is due to [[Scott Aaronson]] and Yaoyun Shi.<ref>{{Cite journal | last1=Aaronson | first1=S. | last2=Shi | first2=Y. | title=Quantum lower bounds for the collision and the element distinctness problems | publisher=Association for Computing Machinery, Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA, | year=2004 | journal=[[Journal of the ACM]] | issn=0004-5411 | volume=51 | issue=4 | pages=595–605 | doi=10.1145/1008731.1008735 | postscript=<!--None-->}}</ref>
If the elements in the problem are [[real number|real numbers]], the decision-tree lower bound extends to the [[real RAM|real random-access machine]] model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor").<ref>{{citation|doi=10.1137/S0097539797329397|title=Topological Lower Bounds on Algebraic Random Access Machines|year=2001|last1=Ben-Amram|first1=Amir M.|journal=SIAM Journal on Computing|volume=31|issue=3|pages=722–761|last2=Galil|first2=Zvi|author2-link=Zvi Galil}}.</ref> It follows that the problem's complexity in this model is also <math>\Theta(n\log n)</math>. This RAM model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions.
 
==Turing Machine complexity==
Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.
A single-tape deterministic [[Turing machine]] can solve the problem, for ''n'' elements of {{math|''m'' &geq; log ''n''}} bits each, in time {{math|''O''(''n''<sup>2</sup>''m''(''m''+2–log ''n''))}},
while on a nondeterministic machine the time complexity is {{math|''O''(''nm''(''n'' + log ''m''))}}.<ref>{{citation|doi=10.1007/s00236-003-0125-8|title=Element distinctness on one-tape Turing machines: a complete solution.|year=2003|last1=Ben-Amram|first1=Amir M.|journal=Acta Informatica|volume=40|issue=2|pages=81–94|last2=Berkman|first2=Omer|last3=Petersen|first3=Holger|s2cid=24821585 }}</ref>
 
==Quantum complexity==
==Restrictions==
It is also known that [[quantumQuantum algorithm]]s can solve this problem faster, in <math display=inline>\Theta(Nn^{2/3})</math> queries. The optimal algorithm is by [[Andris Ambainis]],.<ref>{{Citationcitation | last1=Ambainis | first1=Andris | author1-link=Andris Ambainis | title=Quantum walk algorithm for element distinctness | id = {{arxiv|quant-ph/0311001}} | doi = 10.1137/S0097539705447311 | journal = [[SIAM Journal on Computing]] | volume = 37 | issue = 1 | year = 2007 | pages=210–239 | arxiv=quant-ph/0311001 }}.</ref> and the lower bound is due to [[ScottYaoyun AaronsonShi]] andfirst Yaoyunproved Shi.<ref>{{Citea journal | last1=Aaronson | first1=S. | last2=Shi | first2=Y. | title=Quantumtight lower boundsbound forwhen the collision and the element distinctness problems | publisher=Association for Computing Machinery, Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA, | year=2004 | journal=[[Journalsize of the ACM]]range |is issn=0004-5411sufficiently | volume=51 | issue=4 | pages=595–605 | doi=10large.1145/1008731.1008735 | postscript=<!--None-->}}</ref>
[[Decision tree model]]s are inapplicable for determining lower bounds for algorithmic problems for objects that have some ''a priori'' properties which can be exploited in construction of algorithms. For example, if it is known that the ''n'' objects are [[integer number]]s from the range [1..''n''], then the element uniqueness problem may be solved in [[Big O notation|O]](''n'') time by a modification of [[bucket sort]].
{{cite conference
| last1=Shi | first1=Y.
| year=2002
| title=Quantum lower bounds for the collision and the element distinctness problems
| conference = Proceedings of the 43rd [[Symposium on Foundations of Computer Science]]
| pages=513–519
| arxiv = quant-ph/0112086
| doi=10.1109/SFCS.2002.1181975
}}</ref> Ambainis<ref>
{{cite journal
|last=Ambainis |first=A.
|year=2005
|title=Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
|journal=[[Theory of Computing]]
|volume=1 |issue=1 |pages=37–46
|arxiv=
|bibcode=
|doi=10.4086/toc.2005.v001a003
|doi-access=free
}}</ref> and Kutin<ref>
{{cite journal
|last1=Kutin |first1=S.
|year=2005
|title=Quantum Lower Bound for the Collision Problem with Small Range
|journal=[[Theory of Computing]]
|volume=1 |issue=1 |pages=29–36
|arxiv=
|bibcode=
|doi=10.4086/toc.2005.v001a002
|doi-access=free
}}</ref> independently (and via different proofs) extended his work to obtain the lower bound for all functions.
 
==Generalization: Finding repeated elements==
{{main|Misra–Gries heavy hitters algorithm}}
Elements that occur more than <math>n/k</math> times in a multiset of size <math>n</math> may be found by a comparison-based algorithm, the [[Misra–Gries heavy hitters algorithm]], in time <math>O(n \log k)</math>. The elelemntelement distinctness problem is a special case of this problem where <math>k=n</math>. This algorithmtime is optimal under the [[decision tree model]] of computation. <ref> {{citation|first1=J. |last1=Misra and |first2=D. |last2=Gries.|author2-link=David Gries|title=Finding repeated elements.|journal=Science Sci.of Comput. Prog.,Computer Programming|volume=2 (|year=1982|issue=2|pages=143–152|doi=10.1016/0167-6423(82) 14390012-1520|hdl=1813/6345|hdl-access=free}}. </ref>
 
The algorithm is a generalization of the one for a special case of k=2, which had a rather convolute history of publication.<ref name=bm>[[R.S. Boyer]], [[J. S. Moore]], "MJRTY - A Fast Majority Vote Algorithm," In R.S. Boyer (ed.), Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117. </ref>
 
==See also==
The above algorithms rely only on the test of identity of the elements. If sorting is allowed, previously known [[order statistics]] finding algorithms may be exploited. For example, for k=2, a [[median]] may be found first in [[linear time]], and then it may be easily tested whether there are more than n/2 median elements. However the above algorithms require less comparisons than the order statistics algorithms.<ref name=bm/>
*[[Collision problem]]
 
==References==