Talk:Sorting algorithm: Difference between revisions

Content deleted Content added
Line 716:
{{Reflist-talk}}
*{{cite book|ref=harv|last1=Knuth|first1=Donald|title=The Art of Computer Programming, Volume 3: Sorting and Searching|date=1998|publisher=Addison-Wesley Professional|edition=2nd}}
 
== NEW - release of hsort (triangulation sort) and why it matters ==
http://sourceforge.net/projects/headsort/
 
This isn't on the main page or book but i beleive DOES matter. Simply because it was never tried in publications doesn't mean it doesn't matter; results matter, and are in forma gnu sort(1) for trail.
 
Mergesort is stable and fast, it heuristicly "bins" like radix and speedily compares LINES of data using strcmp - making MULTI-DIMENSIONAL sorting (multi field) very fast.
 
However - Mergesort heuristic failure is comparing item that have ALREADY been compared.
 
Traditionally it is thought a "triangulation sort" cannot be fast because it needs two or thee dimensions to compare characters (also no cpu strcmp to use). However it gains by never comparing two things twice. That still lags behind merge sort best case, but does win in some cases.
 
"hsort" or headsort wins in TWO major ways. first: when the first bin is done (which is triangulated and cannot finish sooner), headsort program can begin piping sorted output. (unlike a simple "sift up" sort, it can finish the tail quickly as well).
 
ONE: "hsort" found two NEW improvements which may well make it a good algo to use in general. First instead of data[line][field][characters] it tried data[field][line][characters] and nearly being deleted attempt: found a special case which allowed a dimension reduction of 1, at moderate cost (a good improvement). Doing so ate up all time to try the other...
 
TWO: Next and recently, HSORT tried a simpler dimensional reduction. Using a 1d array sorted that output ptr to 3d array. Impossible per say: but distribution counting allows it on the tail, so it is fully 1/2 true.
 
THREE: the old adage *foo[][][] is expesive is not as true as it had been in the 8086 days. New CPU can do some extra dimensions in 1 CPU tick, which remove more of the remaining handicap of the 'hsort' algorithm in practice.
 
With those two (or fee) dimensional reductions my tests show that hsort beats "merge sort" used in gnu sort heavily in best cases, and also still wins in headsort's worst cases: just all around faster FOR multi dimensional sorting.
 
I'd like others comments but I'm well aware its' rare any proffessors exchange comments on algorithms.
 
hsort is debuggable (unlike recursive heuristics, it's flow is mostly natural)
 
hsort also has implications in distributive computing because it can recurse dimensions and doesn't require merging or waiting per say (implementation of piping finished bins in 123 order aside which is easy, which is always true single process)