Element distinctness problem: Difference between revisions

Content deleted Content added
Yobot (talk | contribs)
m WP:CHECKWIKI error fixes + general fixes using AWB (7896)
m clean up citations
Line 1:
In [[computational complexity theory]], the '''element distinctness problem''' or '''element uniqueness problem''' is the problem of determining whether all the elements of a list are distinct.
 
It is a well studied problem in many different models of computation. The problem may be solved by [[sorting]] the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a [[randomized algorithm]] that inserts each item into a [[hash table]] and compares only those elements that are placed in the same hash table cell.<ref>{{Cite journalcitation
| last1 = Gil | first1 = J.
| last2 = Meyer auf der Heide | first2 = F.
| last3 = Wigderson | first3 = A. | author3-link = Avi Wigderson
| contribution = Not all keys can be hashed in constant time
| doi = 10.1145/100216.100247
| pages = 244–253
| title = [[Symposium on Theory of Computing|Proc. 22nd ACM Symposium on Theory of Computing (STOC 1990)]]
| year = 1990}}.</ref>
 
It is known that the problem's [[time complexity]] is [[Big O notation|&Theta;]](''n'' log ''n''), i.e., both the upper and lower bounds on its time complexity are of order of the [[linearithmic function]] in the [[algebraic decision tree]] [[model of computation]],<ref>[[{{citation|first=Michael |last=Ben-Or]], "|contribution=Lower bounds for algebraic computation trees",|title=[[Symposium on ProceedingsTheory of theComputing|Proc. 15th Annual ACM Symposium on Theory of Computing, ]]|year=1983, pp|pages=80–86|doi=10.1145/800061.808735}}. 80-86</ref> a model of computation in which the elements may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. In other words, an [[asymptotically optimal]] algorithm of linearithmic time complexity is known for this model. The algebraic computation tree model basically means that the allowable algorithms are only the ones that can perform polynomial operations of bounded degree on the input data and comparisons of the results of these computations.
 
The same lower bound was later proved for the [[randomized complexity|randomized]] [[algebraic decision tree]] model.<ref>{{cite journalcitation|doi=10.1007/BF01270387|title=A lower bound for randomized algebraic decision trees|year=1996|authorlast=Grigoriev, |first=Dima|authorlink=Dima Grigoriev|journal=Computational Complexity|volume=6|pages=357|last2=Karpinski|first2=Marek|author2-link=Marek Karpinski|last3=Heide|first3=Friedhelm Meyer|last4=Smolensky|first4=Roman|issue=4}}.</ref>
 
It is also known that [[quantum algorithm]]s can solve this problem faster in <math>\Theta\left(n^{2/3}\right)</math> queries. The optimal algorithm is by [[Andris Ambainis]],<ref>{{Cite journalcitation | last1=Ambainis | first1=Andris | author1-link=Andris Ambainis | title=Quantum walk algorithm for element distinctness | doi = 10.1137/S0097539705447311 | journal = [[SIAM Journal on Computing]] | volume = 37 | issue = 1 | year = 2007 | pages=210–239 | arxiv=quant-ph/0311001 }}.</ref> and the lower bound is due to [[Scott Aaronson]] and Yaoyun Shi.<ref>{{Cite journalcitation | last1=Aaronson | first1=S. | author1-link = Scott Aaronson | last2=Shi | first2=Y. | title=Quantum lower bounds for the collision and the element distinctness problems | publisher=Association for Computing Machinery, Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA, | year=2004 | journal=[[Journal of the ACM]] | issn=0004-5411 | volume=51 | issue=4 | pages=595–605 | doi=10.1145/1008731.1008735}}.</ref>
 
Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.
Line 23:
 
==Generalization: Finding repeated elements==
Elements that occur more than ''n''/''k'' times in a multiset of size n may be found in time O(''n'' log ''k''). The element distinctness problem is a special case of ''k''=''n''. This algorithm is optimal under the [[decision tree model]] of computation.<ref>{{citation|first1=J. |last1=Misra and |first2=D.|last2=Gries|author2-link=David Gries. |title=Finding repeated elements.|journal=Science Sci.of Comput. Prog.,Computer Programming|volume=2 (|year=1982) 143-152|pages=143–152}}.</ref>
 
The algorithm is a generalization of the one for a special case of ''k''=2, which had a rather convoluted history of publication.<ref name=bm>[[{{citation|first1=R. S. |last1=Boyer]],|author1=link=Robert [[JS. Boyer|first2=J S.|last2=Moore|author2-link=J Strother Moore]], "|contribution=MJRTY - A Fast Majority Vote Algorithm," In |editor-first=R. S. |editor-last=Boyer (ed.), |title=Automated Reasoning: Essays in Honor of Woody Bledsoe, |series=Automated Reasoning Series, |publisher=Kluwer Academic Publishers, |___location=Dordrecht, The Netherlands, |year=1991, pp. 105-117|pages=105–117}}.</ref>
 
The above algorithms rely only on the test of identity of the elements. If sorting is allowed, previously known [[order statistics]] finding algorithms may be exploited. For example, for ''k''=2, a [[median]] may be found first in [[linear time]], and then it may be easily tested whether there are more than ''n''/2 median elements. However the above algorithms require fewer comparisons than the order statistics algorithms.<ref name=bm/>