Prefix code: Difference between revisions

Content deleted Content added
Techniques: another example
Techniques: I hope this reference can be replaced by an even better one
Line 17:
If every word in the code has the same length, the code is called a [[fixed-length code]]. For example, [[ISO 8859-15]] letters are always 8 bits long. [[UTF-32/UCS-4]] letters are always 32 bits long. [[Asynchronous Transfer Mode|ATM packets]] are always 424 bits long. Prefixes cannot exist in a fixed-length code. Unfortunately, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.
 
Some codes mark the end of a code word with a special "comma" symbol, different from normal data.
Some variable-length codes mark the end of a code word with a special "comma" symbol. This is somewhat analogous to the period at the end of a sentence; it marks where one sentence ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is prefix-free. However, modern communication systems send everything as sequences of "1" and "0" – adding a third symbol would be expensive, and using it only at the ends of words would be inefficient. [[Morse code]] is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, [[Fibonacci coding]] uses a "11" to mark the end of every code word.
<ref>
[http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf "Development of Trigger and Control Systems for CMS"] by J. A. Jones: "Synchronisation" p. 70
</ref>
Some variable-length codes mark the end of a code word with a special "comma" symbol. This is somewhat analogous to the period at the end of a sentence; it marks where one sentence ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is prefix-free. However, modern communication systems send everything as sequences of "1" and "0" &ndash; adding a third symbol would be expensive, and using it only at the ends of words would be inefficient. [[Morse code]] is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, [[Fibonacci coding]] uses a "11" to mark the end of every code word.
 
[[Huffman coding]] is a more sophisticated technique for constructing variable-length prefix codes. The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths.