Burrows–Wheeler transform: Difference between revisions

Content deleted Content added
m remove redundant URL
Removed reference to the inadequacy of standard library sorting functions. This sentence was originally meant to allude to the ability to do the "sorting" here in linear time by exploiting the relationship between the BWT and the suffix array, as indicated by the revision description at https://en.wikipedia.org/w/index.php?title=Burrows%E2%80%93Wheeler_transform&oldid=186184455. But that point is now clearly stated in the paragraph that follows, so this vague allusion is now just confusing.
Line 280:
 
==Optimization==
A number of [[Optimization (computer science)|optimizations]] can make these algorithms run more efficiently without changing the output. There is no need to represent the table in either the encoder or decoder. In the encoder, each row of the table can be represented by a single pointer into the strings, and the sort performed using the indices. Some care must be taken to ensure that the sort does not exhibit bad [[Best, worst and average case|worst-case]] behavior: Standard library sort functions are unlikely to be appropriate.{{Clarify|reason=What kind of sort is recommended, and what is the bad runtime that standard library sorts are likely to have?|date=August 2020}} In the decoder, there is also no need to store the table, and in fact no sort is needed at all. In time proportional to the alphabet size and string length, the decoded string may be generated one character at a time from right to left. A "character" in the algorithm can be a byte, or a bit, or any other convenient size.
 
One may also make the observation that mathematically, the encoded string can be computed as a simple modification of the [[suffix array]], and suffix arrays can be computed with linear time and memory. The BWT can be defined with regards to the suffix array SA of text T as (1-based indexing):