Move-to-front transform: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:25, 25 January 2025 edit 178.143.41.41 (talk) →Python: clean up the code; fix some errors ← Previous edit		Latest revision as of 20:30, 20 June 2025 edit undo Chris the speller (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 894,487 edits m →Implementation: replaced: frequently-used → frequently used Tag: AWB
(3 intermediate revisions by 2 users not shown)
Line 73: Details of implementation are important for performance, particularly for decoding. For encoding, no clear advantage is gained by using a [[linked list]], so using an [[Array data structure\|array]] to store the list is acceptable, with worst-case performance [[Big O notation\|O]](<var>n</var><var>k</var>), where <var>n</var> is the length of the data to be encoded and <var>k</var> is the number of values (generally a constant for a given implementation). The typical performance is better because frequently- used symbols are more likely to be at the front and will produce earlier hits. This is also the idea behind a [[Self-organizing list#Move to Front Method (MTF)\|Move-to-front self-organizing list]]. However, for decoding, we can use specialized data structures to greatly improve performance.{{Example needed\|date=February 2012}} Line 127: # Read in each rank in the encoded text for rank in compressed_data: # ~~Read~~Remove the character of that rank from the dictionary plain_text.append(dictionary[rank])▼ # Update the dictionary▼ e = dictionary.pop(rank) ▲ plain_text.append(~~dictionary[rank]~~e) ▲ # ~~Update~~Insert the character at the beginning of dictionary dictionary.insert(0, e) Line 179: As an example, imagine we wish to compress [[To be, or not to be\|Hamlet's soliloquy]] (''To be, or not to be...''). We can calculate the size of this message to be 7033 bits. Naively, we might try to apply the MTF transform directly. The result is a message with 7807 bits (higher than the original). The reason is that English text does not in general exhibit a high level of local frequency correlation. However, if we first apply the Burrows–Wheeler transform, and then the MTF transform, we get a message with 6187 bits. Note that the Burrows–Wheeler transform does not decrease the entropy of the message; it only reorders the bytes in a way that makes the MTF transform more effective. One problem with the basic MTF transform is that it makes the same changes for any character, regardless of frequency, which can result in diminished compression as characters that occur rarely may push frequent characters to higher values. Various alterations and alternatives have been developed for this reason. One common change is to make it so that characters above a certain point can only be moved to a certain threshold. Another is to make some algorithm that runs a count of each character's local frequency and uses these values to choose the characters' order at any point. Many of these transforms still reserve zero for repeat characters, since these are often the most common in data after the ~~Burrows Wheeler~~Burrows–Wheeler Transform. ==Move-to-front linked-list==