Prefix code: Difference between revisions

Content deleted Content added
Your edit summary made no sense. Alternative names for subject go in the first sentence, see MOS
Undid revision 888928174 by 46.208.152.26 (talk) There is no requirement to clog up sentences with inessential parentheticals
Line 1:
A '''prefix code''' (also known as '''prefix-free code''', '''prefix condition code''' and '''instantaneous code''')is a type of [[code]] system (typically a [[variable-length code]]) distinguished by its possession of the "prefix property", which requires that there is no whole [[code word]] in the system that is a [[prefix (computer science)|prefix]] (initial segment) of any other code word in the system. For example, a code with code words {9, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55". A prefix code is a [[uniquely decodable code]]: given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words. However, there are uniquely decodable codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely decodable (it is a suffix code), but it is not necessarily a prefix code.
 
Prefix codes are also known as '''prefix-free codes''', '''prefix condition codes''' and '''instantaneous codes'''. Although [[Huffman coding]] is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. The term '''comma-free code''' is sometimes also applied as a synonym for prefix-free codes<ref>US [[Federal Standard 1037C]]</ref><ref>{{citation|title=ATIS Telecom Glossary 2007|url=http://www.atis.org/glossary/definition.aspx?id=6416|accessdate=December 4, 2010}}</ref> but in most mathematical books and articles (e.g.<ref>{{citation|last1=Berstel|first1=Jean|last2=Perrin|first2=Dominique|title=Theory of Codes|publisher=Academic Press|year=1985}}</ref><ref>{{citation|doi=10.4153/CJM-1958-023-9|last1=Golomb|first1=S. W.|author1-link=Solomon W. Golomb|last2=Gordon|first2=Basil|author2-link=Basil Gordon|last3=Welch|first3=L. R.|title=Comma-Free Codes|journal=Canadian Journal of Mathematics|volume=10|issue=2|pages=202–209|year=1958|url=https://books.google.com/books?id=oRgtS14oa-sC&pg=PA202}}</ref>) a comma-free code is used to mean a [[self-synchronizing code]], a subclass of prefix codes.
 
Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any [[Out-of-band data|out-of-band]] markers or (alternatively) special markers between words to [[framing (telecommunication)|frame]] the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing sequences that form valid code words. This is not generally possible with codes that lack the prefix property, for example {0,&nbsp;1,&nbsp;10,&nbsp;11}: a receiver reading a "1" at the start of a code word would not know whether that was the complete code word "1", or merely the prefix of the code word "10" or "11"; so the string "10" could be interpreted either as a single codeword or as the concatenation of the words "1" then "0".
Line 12:
 
==Techniques==
If every word in the code has the same length, the code is called a '''fixed-length code''', or a '''block code''' (though the term [[block code]] is also used for fixed-size [[error-correcting code]]s in [[channel coding]]). For example, [[ISO 8859-15]] letters are always 8 bits long. [[UTF-32/UCS-4]] letters are always 32 bits long. [[Asynchronous Transfer Mode|ATM cells]] are always 424 bits (53 bytes) long. A fixed-length code of fixed length ''k'' bits can encode up to <math>2^{k}</math> source symbols.
 
A fixed-length code is necessarily a prefix code. It is possible to turn any code into a fixed-length code by padding fixed symbols to the shorter prefixes in order to meet the length of the longest prefixes. Alternately, such padding codes may be employed to introduce redundancy that allows autocorrection and/or synchronisation. However, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.
Line 25:
 
==Related concepts==
A '''suffix code''' is a set of words none of which is a suffix of any other; equivalently, a set of words which are the reverse of a prefix code. As with a prefix code, the representation of a string as a concatenation of such words is unique. A '''bifix code''' is a set of words which is both a prefix and a suffix code.<ref name=BPR58>Berstel et al (2010) p.58</ref>
An '''optimal prefix code''' is a prefix code with minimal average length. That is, assume an alphabet of {{mvar|n}} symbols with probabilities <math>p(A_i)</math> for a prefix code {{mvar|C}}. If {{mvar|C'}} is another prefix code and <math>\lambda'_i</math> are the lengths of the codewords of {{mvar|C'}}, then <math>\sum_{i=1}^n { \lambda_i p(A_i) } \leq \sum_{i=1}^n { \lambda'_i p(A_i) } \!</math>.<ref>[http://www.cim.mcgill.ca/~langer/423/lecture2.pdf McGill COMP 423 Lecture notes]</ref>
 
==Prefix codes in use today==