Talk:Sorting algorithm: Difference between revisions

Content deleted Content added
Irate~enwiki (talk | contribs)
new section to bottom
 
(618 intermediate revisions by more than 100 users not shown)
Line 1:
{{Talk header}}
== Algorithm tester ==
{{Banner holder |collapsed=yes|1=
{{ArticleHistory
|action1=FAC
|action1date=03:15, 30 Aug 2004
|action1link=Wikipedia:Featured article candidates/Archived nominations/August 2004#Sorting algorithm
|action1result=not promoted
|action1oldid=5698861
|currentstatus=FFAC
}}
{{WikiProject banner shell|class=B|vital=yes|1=
{{WikiProject Computer science|importance=Top}}
}}
}}
{{User:MiszaBot/config
|archiveheader = {{Talk archive}}
|algo = old(365d)
|maxarchivesize = 100K
|minthreadsleft = 5
|minthreadstoarchive = 1
|counter = 3
|archive = Talk:Sorting algorithm/Archive %(counter)d
}}
 
== tk pic ==
Here's a Python module for testing comparison-based sort routines written in Python. (Obviously this won't work for radix and trie sorts, but it should work for most others.) The purpose of this module is to verify that code presented as an implementation of a sort algorithm does, in fact, sort correctly. (Whether it is an implementation of the right algorithm is a little more difficult to verify mechanically.)
 
What is the point of the [[tk (software)|tk]] screenshot at the top?? —[[User:Tamfang|Tamfang]] ([[User talk:Tamfang|talk]]) 00:46, 21 April 2023 (UTC)
Reasons for giving algorithm implementations in Python rather than pseudocode are described on [[talk:Algorithm]].
 
:Seeing no response ... [[User:Tamfang|—Tamfang]] ([[User talk:Tamfang|talk]]) 04:52, 3 November 2023 (UTC)
I call this module sorttest.py.
 
:: I agree that it wasn't very good. But I think something should be there. Do you see any good ones at Category:Sort algorithms? Perhaps [[:File:Visualization of Gnome sort.gif]] [[User:Bubba73|Bubba73]] <sup>[[User talk:Bubba73|You talkin' to me?]]</sup> 05:29, 3 November 2023 (UTC)
<pre>
:::Perhaps an animation of a less stupid sort. ;) [[Heapsort]] or [[Quicksort]] for example. [[User:Tamfang|—Tamfang]] ([[User talk:Tamfang|talk]]) 06:23, 3 November 2023 (UTC)
"""This is a Python module for testing sort routines. The sort routines
must obey the following interface:
 
::: There are some better ones at the Wikimedia Commons: Category:Animations of sort algorithms, see [https://commons.wikimedia.org/wiki/Category:Animations_of_sort_algorithms this] [[User:Bubba73|Bubba73]] <sup>[[User talk:Bubba73|You talkin' to me?]]</sup> 06:31, 3 November 2023 (UTC)
def sort(array, cmp=lambda x, y: x > y):
 
== AI Sorting ==
It doesn't matter what it returns, but it must not throw an exception
in any of the tests.
 
Just wondered if AI sorts should be added:
Note that in Python, the parameter names are part of the interface, so
* https://www.youtube.com/watch?v=n2qCry_o2Fs
the parameters must be called 'array' and 'cmp'. (There can be
* https://www.youtube.com/watch?v=nF81NVBFLcE
additional optional parameters, though.)
[[User:GrahamHardy|GrahamHardy]] ([[User talk:GrahamHardy|talk]]) 16:42, 23 June 2023 (UTC)
 
:By which I guess you mean improved algorithms recently discovered by AIs. If they are novel concepts they ought to be listed, but the videos seem to say they are speedier versions of known approaches. (I'll try to read the articles linked under the first video.) —[[User:Tamfang|Tamfang]] ([[User talk:Tamfang|talk]]) 23:45, 18 July 2023 (UTC)
When it returns, it should have mutated the array 'array' so that it
contains the same items (in the sense of 'is'), but possibly in a
different order. The new order must be such that, for any i in
range(len(array-1)), cmp(array[i], array[i+1]) is false. So, by
default, it sorts ascending.
 
::The DeepMind AI sorting algorithm targets sorting 3 to 5 elements and in the case of 5 elements, saves one compare. Not much of a breakthrough. [[User:Rcgldr|Rcgldr]] ([[User talk:Rcgldr|talk]]) 07:38, 24 July 2023 (UTC)
It must not mutate anything the array points to --- that's cheating
::: That could potentially be significant if it is in a recursive divide-and-conquer algorithm. [[User:Bubba73|Bubba73]] <sup>[[User talk:Bubba73|You talkin' to me?]]</sup> 05:36, 3 November 2023 (UTC)
--- but this is hard to test.
"""
 
== Table visibility ==
import random
 
In the [[Sorting algorithm#Comparison sorts]], quite a few parts of the text on the table are quite hard to see in dark mode. {{Tl|mvar}} seems to get the colours right though. Will try to find a work around. <span style="font-family:monospace;background:#368;padding:.2rem;color:white">''[[User:APenguinThatIsSilly|<span style="color:white">APenguinThatIsSilly</span>]]''(''[[User talk:APenguinThatIsSilly|<span style="color:#FEC">"talk"</span>]]'')</span> 00:23, 14 January 2025 (UTC)
def bubblesort(array, cmp=lambda x, y: x > y):
"""The simplest working sort routine, for testing purposes.
 
== (Multiway) Powersort ==
This is here to make it possible to test sort-testing routines."""
indices = range(len(array))
indices.reverse()
for j in indices:
for i in range(j):
if cmp(array[i], array[i+1]):
(array[i], array[i+1]) = (array[i+1], array[i])
 
[[Powersort]] is an improvement on [[Timsort]] and has replaced it as the default sort in CPython. Moreover, multiway Powersort is probably even faster and can plausibly claim to be state of the art. Can we find room for one or both in the comparison table? [[Special:Contributions/98.114.142.14|98.114.142.14]] ([[User talk:98.114.142.14|talk]]) 05:06, 22 July 2025 (UTC)
OutOfOrder = "sorttest.OutOfOrder"
ScrewedUpArray = "sorttest.ScrewedUpArray"
 
def brokensort1(array, cmp=lambda x, y: x > y):
"""Broken 'sort' routine that overwrites the whole array with 1 element."""
for i in range(len(array)): array[i] = array[0]
 
def brokensort2(array, cmp=lambda x, y: x > y):
"""Broken 'sort' routine that bubblesorts backwards."""
bubblesort(array, lambda x, y, cmp=cmp: cmp(y, x))
 
def testsort_onearray(array, sort, cmp=None):
"""Given a sort routine and an array, raise an exception if it sorts wrong.
"""
arraycopy = array[:] # copy array
if cmp is None: sort(array)
else: sort(array, cmp)
 
# verify that the array is in order
if cmp is None: cmp = lambda x, y: x > y
for i in range(len(array)-1):
if cmp(array[i], array[i+1]):
raise OutOfOrder, (i, arraycopy, array, array[i], array[i+1])
 
# verify that it consists of the elements of the original array:
# doesn't contain any fewer elements:
if len(array) != len(arraycopy):
raise ScrewedUpArray, ("array length changed", arraycopy, len(array),
len(arraycopy))
# and doesn't contain any more copies of any element than the original
# array:
idmap = {}
for i in arraycopy:
if not idmap.has_key(id(i)): idmap[id(i)] = 0
idmap[id(i)] = idmap[id(i)] + 1
for i in array:
if not idmap.has_key(id(i)):
raise ScrewedUpArray, ("element wasn't in original array",
arraycopy, i)
idmap[id(i)] = idmap[id(i)] - 1
if idmap[id(i)] &lt; 0:
raise ScrewedUpArray, ("element was in original array fewer times",
arraycopy, i)
 
def qwi(string):
"""Turn a string containing a list of ints into a list of ints."""
return map(int, string.split())
 
def testsort(sort):
"""Test a sort routine on a variety of inputs. Main entry point."""
def backwards(x, y): return x &lt; y
# simplest case: empty array
testsort_onearray([], sort)
testsort_onearray([], sort, backwards)
# sorting a short already-in-order array
testsort_onearray([1, 2, 3], sort)
testsort_onearray([3, 2, 1], sort, backwards)
# an actual array that needs some sorting
testsort_onearray([4, 2, 5, 3, 6, 0], sort)
testsort_onearray([4, 2, 5, 3, 6, 0], sort, backwards)
# and with duplicate elements
testsort_onearray(qwi('0 0 1 2 2 3 3 3 4 5 5'), sort)
testsort_onearray(qwi('5 5 5 4 3 2 1 1'), sort, backwards)
testsort_onearray(qwi('0 0 1 2 2 3 3 3 4 5 5'), sort, backwards)
testsort_onearray(qwi('5 5 5 4 3 2 1 1'), sort)
# more more-or-less random tests with lists of integers
testsort_onearray(qwi('1 33 1 3 1 3 42 1 5 5 1 -1 17 40'), sort)
testsort_onearray(qwi('1 1 1 1 1'), sort)
testsort_onearray([1], sort)
# I'd like to use a bigger random list, but O(N^2) sorts get too slow
rg = random.Random()
testsort_onearray([rg.randint(0, 1000) for x in range(100)], sort)
# verify that it works on things other than integers
testsort_onearray('vow to bring enlightenment to all sentient beings'
.split(), sort)
testsort_onearray(map(lambda x: [x], qwi('1 3 1 4 5')), sort,
lambda x, y: x[0] > y[0])
 
def test():
"""Test routine to verify that sort-testing routines work.
 
This routine runs when the module loads to ensure that it still
works correctly.
 
"""
 
testsort_onearray([], bubblesort)
testsort_onearray([], bubblesort, lambda x, y: x &lt; y)
testsort_onearray([1, 2, 3], bubblesort)
testsort_onearray([1, 2, 3], bubblesort, lambda x, y: x &lt; y)
testsort_onearray([3, 2, 1], bubblesort)
testsort_onearray([3, 2, 1], bubblesort, lambda x, y: x &lt; y)
testsort_onearray(map(int, '2 3 3 1 41 31 1 0 1'.split()), bubblesort)
 
ok = 0
try: testsort_onearray([1, 2], brokensort2)
except: ok = 1
assert ok
 
ok = 0
try: testsort_onearray([1, 2], brokensort1)
except: ok = 1
assert ok
 
testsort(bubblesort)
 
ok = 0
try: testsort(brokensort1)
except: ok = 1
assert ok
 
ok = 0
try: testsort(brokensort2)
except: ok = 1
assert ok
 
test()
 
</pre>
 
== Selection sort ==
 
Could somebody elaborate a bit on the remark to [[Selection sort]] please?
 
''works in-place, but loses stability or gains complexity''
 
--HJH
 
I think they mean the following: if you use the obvious in-place implementation of selection sort (i.e.: find the smallest element in the list of not-yet-sorted elements, then swap it to the correct position), then it won't stable anymore. If you want to keep it stable, you need to use a more complex algorithm. [[User:AxelBoldt|AxelBoldt]] 18:39 Oct 17, 2002 (UTC)
 
: This is true of all unstable sorting algorithms, so I don't think it's necessary. [[User:MyRedDice|Martin]]
 
----
 
Combsort comments moved to [[talk:comb sort]]
 
== List or table? ==
I considered using an unordered list instead of a table - please take a look and decide which is (A) easiest to read and (B) easiest to edit... [[User:MyRedDice|Martin]]
 
* [[Bubble sort]]: O(''n''²), but O(''n'') on already sorted data; works in-place; stable
* [[Cocktail sort]] (bidirectional bubble sort): O(''n''²), but O(''n'') on already sorted data; works in-place; stable
* [[Comb sort]]: O(''n'' log ''n''); works in-place; stable
* [[Selection sort]]: O(''n''²); works in-place, but loses stability or gains complexity; unstable
* Straight [[insertion sort]]: O(''n''²), but O(''n'') on already sorted data; works in-place; stable
* [[Bucket sort]]: O(''n''); requires O(''n'') extra memory; stable
* [[Counting sort]]: O(''n''+''k''); requires O(''n''+''k'') extra memory; stable
* [[Heapsort]]: O(''n'' log ''n''); works in-place; unstable
* [[Smoothsort]]: O(''n'' log ''n''), but O(''n'') on already sorted data; works in-place; unstable
* [[Merge sort]]: O(''n'' log ''n''); requires O(''n'') extra memory; stable
* [[Quicksort]]: O(''n'' log ''n''), but O(''n''²) in worst case; requires O(log n) extra memory; unstable
* [[binary tree sort]]: O(''n'' log ''n''), but O(''n''²) on already sorted data; requires O(''n'') extra memory, typically several pointers per input item; stable
* [[Pigeonhole sort]]: O(''n''+''k''); requires O(''k'') extra memory; can only be used when all keys are unique
* [[Radix sort]]: O(''n'' log ''k''); requires O(''n'') extra space; stable
* [[Shell sort]]: Worst case O(''n''<sup>1.5</sup>), Average case O(''n''<sup>1.25</sup>, Best case O(''n'' log ''n'') on already sorted data; works in-place; stable
* [[Bogosort]]: Worst case O(''n''!), Average case O((''n''/2)!), Best case O(''n''), depending on luck; requires O(''n'') extra space; unstable
 
The table is easier to read and not that difficult to edit (in fact it may be easier). The list gains flexibility, but it's not really needed. Verbose descriptions should be handled in the individual articles. This is just a quick overview and comparison, which tables are well suited for in this case. -[[User:Nknight|nknight]] 10:25 Apr 26, 2003 (UTC)
 
: Thanks for the feedback - I was uncertain, hence why I wasn't [[meta:bold|bold]]... :)
: Another suggestion - what if the complexity columns were merged into one column? Might save space/width/... [[User:MyRedDice|Martin]] 13:31 Apr 26, 2003 (UTC)
 
I found and corrected a Big Fat omission from the [[list of mathematical topics]]: It did not link to this page! Now it does, but it does not link to most of the separate articles on individual sort algorithms. Since there are only 24 hours in a day, I will not fix that now, so this notice is your chance to beat me to it. [[User:Michael Hardy|Michael Hardy]] 23:39 May 1, 2003 (UTC)
 
Is sort algorithm a mathematical topic? I think it is a computer science topic. Anyway, about table. I don't like the current table, which is not easy to read (remember rendering of tables heavily depends on your environment) and not eay to edit. But I don't see the list version you posted better. I will edit the article by my version, which looks better to me. -- [[User:TakuyaMurata|Taku]] 00:10 May 2, 2003 (UTC)
 
:It is a computer science topic, but it is ''also'' a mathematical topic. [[User:Michael Hardy|Michael Hardy]] 00:15 May 2, 2003 (UTC)
-------
The table was moved here. The article used to have a table but was converted to have a list. The list is usually more readable.
 
{| cellpadding=2 border=1
|+ Sorting algorithms
|-
! sort algorithm
! runtime order
! extra memory
! stability
! note
|-
| [[Bogosort]]
| O((''n''/2)!)
| O(''n'')
| stable
| intentionally poor algorithm
|-
| [[Bubble sort]]
| O(''n''²)
|
| stable
|
|-
| [[Cocktail sort]]
| O(''n''²)
|
| stable
| bidirectional bubble sort
|-
| [[Comb sort]]
| O(''n'' log ''n'')
|
| unstable
|
|-
| [[Selection sort]]
| O(''n''²)
|
| unstable
|
|-
| [[Insertion sort]]
| O(''n''²)
|
| unstable
|
|-
| [[Bucket sort]]
| O(''n'')
| O(''n'')
| stable
|
|-
| [[Counting sort]]
| O(''n''+''k'')
| O(''n''+''k'')
| stable
|
|-
| [[Heapsort]]
| O(''n'' log ''n'')
|
| unstable
|
|-
| [[Smoothsort]]
| O(''n'' log ''n'')
|
| unstable
|
|-
| [[Merge sort]]
| O(''n'' log ''n'')
|
| stable
|
|-
| [[Quicksort]]
| O(''n'' log ''n'')
|
| unstable
|
|-
| [[Binary tree sort]]
| O(''n'' log ''n'')
| O(''n'')
| stable
|
|-
| [[Pigeonhole sort]]
| O(''n''+''k'')
| O(''k'')
| stable
|
|-
| [[Radix sort]]
| O(''nk'')
| O(''n'')
| stable
|
|+ test
|-
| [[Shell sort]]
| O(''n''<sup>1.25</sup>)
|
| stable
|
|}
 
-- [[User:TakuyaMurata|Taku]] 22:03, Oct 28, 2003 (UTC)
 
I guess I should have looked at this page before I inserted the table. But I think a table would be best: it allows one to quickly find an algorithm with an acceptible complexity. Someone said that we shouldn't try to make that information the focus because the reader can go to the individual page, but I will argue that the over view section should make comparisons easier, and not simply list them.
-Shawn P. '[[User:Indefual|Indefual]]' Conroy
 
:I think it is a matter of preference. The list looks much nicer to me and the list still has the same information as that the table can have. The alternative is to have a separate article of the table with more complete information such as if the algorithm is implemented using work-in-place. I must say that the trick is the table can be rendered very ugly in a certain envrionment while the list is always more liable. -- [[User:TakuyaMurata|Taku]] 21:31, Oct 29, 2003 (UTC)
 
Bring back the table. Then you can also add the best, average and worst runtime orders, like this: -- [[User:Dissident|Dissident]] 05:00, 11 Apr 2004 (UTC)
<table cellpadding=2 border=1>
<caption>Sorting algorithms</caption>
<tr>
<th rowspan="2">sort algorithm</th>
<th colspan="3">runtime order</th>
<th rowspan="2">extra memory</th>
<th rowspan="2">stability</th>
<th rowspan="2">note</th>
</tr>
<tr>
<th>best</th>
<th>average</th>
<th>worst</th>
</tr>
</table>
 
== Theta notation ==
 
Is there any reason why the algorithms asymtotic growth is shown in big-oh-notation instead of big-theta-notation? I mean, isn't a tight fit better than an upper bound (for instance, you could correctly say that heap sort is O(n^10))? [[User:Gkhan|Gkhan]]
 
== what about alphabet size? ==
 
Shouldn't alphabet size be mentioned somewhere? For an alphabet of size m, an input of size n contains n / log m elements. Also, it takes O(log m) to do an operation on an element, or pair of elements. Some examples: pigeon hole sort: not really O(n), but O(n + m log n - log log m). Radix sort: not O(n), but O(n log m). Heap sort: not O(n log n), but O(n log n log m). And so on. This might be interesting for people who deal with alphabets that do not fit into a single machine word.
 
== Permutation sort? ==
 
Should a page be added for it or should it be included with Bogosort? Should Bogosort be secondary to permutation sort?
 
-- [[User:UTSRelativity|UTSRelativity]] 20:23 2004-12-12 (UTC)
 
== Speed of in-place merge sort ==
 
An anonymous user changed my addition
 
* In-place [[merge sort]] - O(''n''<sup>2</sup>)
 
to
 
* In-place [[merge sort]] - O(''n'' log ''n''); several times slower, more complicated
 
I added it based on this:
 
"In-Place Mergesort is yet another abomination. Mergesort is supposed to run in O(n log n), but the implementation here runs in O(n * n)." [http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html]
 
Can anyone give me an implementation of in-place merge sort that's O(n log n)? -- [[User:Smjg|Smjg]] 17:06, 1 Mar 2005 (UTC)
 
== Other sort pages ==
 
I found these pages which should be integrated here or merged in with other sort articles:
 
* [[Stooge sort]]
* [[Double Bubble sort]] - same as cocktail? (haven't had time to analyze)
 
[[User:Daniel Quinlan|Daniel Quinlan]] 04:58, Mar 8, 2005 (UTC)
 
:: That page gave "Double bubble sort" as a misleading name for [[insertion sort]]. But AFAICS this is just wrong - other websites give it as meaning [[cocktail sort]]. As such, I'm changing it to redirect there. -- [[User:Smjg|Smjg]] 21:55, 30 Mar 2005 (UTC)
 
:::It's not a cocktail sort as it doesn't alternate direction. It is not the same, it get's it's name because the first bubble find something out of place and then starts a bubble in the opposite direction to correct the order. It is very efficent for data that is almost in order.--[[User:Irate|Jirate]] 22:13, 2005 Mar 30 (UTC)
 
:::: What are you referring to as "it" here? We're talking about two things:
::::* What the Wikipedia article refers to as "double bubble sort", which is insertion sort
::::* What other websites refer to as "double bubble sort", which is cocktail sort.
:::: What was the point of reverting it to be a duplicate of [[Insertion sort]]? We should make up our mind: either
::::* Redirect to [[Cocktail sort]], what the other sites I checked use it to mean (the only exception being a Wikipedia mirror) and a more intuitive meaning IMO
::::* Redirect to [[Insertion sort]], which it currently describes
::::* Get rid of the page, considering that the title's miscapitalised anyway. -- [[User:Smjg|Smjg]] 12:08, 31 Mar 2005 (UTC)
::::::It is not an isertion sort. I think you don't actually understand either elgorythm.--[[User:Irate|Jirate]] 12:38, 2005 Mar 31 (UTC)
 
::::::: Why are you being so vague? What do you consider to be the difference between them? Simply that your "double bubble sort" wastes time doing whole swaps? Or is there some other subtle difference that really renders them distinct algorithms? -- [[User:Smjg|Smjg]] 15:46, 31 Mar 2005 (UTC)
 
:::::::The wasteful swaping bit is the bubble. [http://linux.wku.edu/~lamonml/algor/sort/insertion.html] is what I understand to be an insertion sort as opposed to a bubble. Have you checked Knuth?--[[User:Irate|Jirate]] 19:53, 2005 Mar 31 (UTC)
::::::The other point is that I think you insertion sort is actually not an insertion sort. It a modified double bubble.--[[User:Irate|Jirate]] 12:40, 2005 Mar 31 (UTC)
 
::::::: Do you know of an algorithm that is more accurately called insertion sort? -- [[User:Smjg|Smjg]] 15:46, 31 Mar 2005 (UTC)
::::::::[http://linux.wku.edu/~lamonml/algor/sort/insertion.html] --[[User:Irate|Jirate]] 19:53, 2005 Mar 31 (UTC)
 
AFAIK, both are implementations of insertion sort. The implementation Jirate points to doesn't have the wasteful swapping (xor swap or not). &mdash;[[User:UTSRelativity|UTSRelativity]] 01:15, 1 Apr 2005 (UTC)
:[http://www.math-info.univ-paris5.fr/~ycart/mst/mst031/Group3/db_sort/index_db_sort.html][http://engr.smu.edu/cse/2353/] [http://www.users.bigpond.com/robin_v/f-cont.htm][http://www.cs.uiowa.edu/~rlawrenc/teaching/030/Notes/Code/sorts.cxx]--[[User:Irate|Jirate]] 01:34, 2005 Apr 1 (UTC)
::This is I understand it is an insertion sort [http://linux.wku.edu/~lamonml/algor/sort/insertion.html], which is different and requires an extra variable, which can be vital on very small processors.--[[User:Irate|Jirate]] 01:41, 2005 Apr 1 (UTC)