Content deleted Content added
→top: now prefix property redirects to here |
|||
(115 intermediate revisions by 74 users not shown) | |||
Line 1:
{{Short description|Type of code system}}
A '''prefix code''' is a type of [[code]] system distinguished by its possession of the '''prefix property''', which requires that there is no whole [[Code word (communication)|code word]] in the system that is a [[prefix (computer science)|prefix]] (initial segment) of any other code word in the system. It is trivially true for fixed-length codes, so only a point of consideration for [[variable-length code|variable-length codes]].
For example, a code with code {9, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55". A prefix code is a [[uniquely decodable code]]: given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words. However, there are uniquely decodable codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely decodable (it is a suffix code), but it is not necessarily a prefix code.
Prefix codes are also known as '''prefix-free codes''', '''prefix condition codes''' and '''instantaneous codes'''. Although [[Huffman coding]] is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. The term '''comma-free code''' is sometimes also applied as a synonym for prefix-free codes<ref>US [[Federal Standard 1037C]]</ref><ref>{{citation|title=ATIS Telecom Glossary 2007|url=http://www.atis.org/glossary/definition.aspx?id=6416|access-date=December 4, 2010|archive-date=July 8, 2010|archive-url=https://web.archive.org/web/20100708083829/http://www.atis.org/glossary/definition.aspx?id=6416|url-status=dead}}</ref> but in most mathematical books and articles (e.g.<ref>{{citation|last1=Berstel|first1=Jean|last2=Perrin|first2=Dominique|title=Theory of Codes|publisher=Academic Press|year=1985}}</ref><ref>{{citation|doi=10.4153/CJM-1958-023-9|last1=Golomb|first1=S. W.|author1-link=Solomon W. Golomb|last2=Gordon|first2=Basil|author2-link=Basil Gordon|last3=Welch|first3=L. R.|title=Comma-Free Codes|journal=Canadian Journal of Mathematics|volume=10|issue=2|pages=202–209|year=1958|s2cid=124092269 |url=https://books.google.com/books?id=oRgtS14oa-sC&pg=PA202|doi-access=free}}</ref>) a comma-free code is used to mean a [[self-synchronizing code]], a subclass of prefix codes.
Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any [[Out-of-band data|out-of-band]] markers or (alternatively) special markers between words to [[framing (telecommunication)|frame]] the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing
The variable-length [[Huffman coding|Huffman codes]], [[country calling codes]], the country and publisher parts of [[ISBN]]s,
Prefix codes are not [[error-correcting codes]]. In
For any [[Variable-length_code#Uniquely_decodable_codes|uniquely decodable]] code there is a prefix code that has the same code word lengths.<ref name=LTU2015>Le Boudec, Jean-Yves, Patrick Thiran, and Rüdiger Urbanke. Introduction aux sciences de l'information: entropie, compression, chiffrement et correction d'erreurs. PPUR Presses polytechniques, 2015.</ref> [[Kraft's inequality]] characterizes the sets of code word lengths that are possible in a [[Variable-length_code#Uniquely_decodable_codes|uniquely decodable]] code.<ref name=BRS75>Berstel et al (2010) p.75</ref>
== Techniques ==▼
If every word in the code has the same length, the code is called a '''fixed-length code''', or a '''block code''' (though the term [[block code]] is also used for fixed-
[[Truncated binary encoding]] is a straightforward generalization of fixed-length codes to deal with cases where the number of symbols ''n'' is not a power of two. Source symbols are assigned codewords of length ''k'' and ''k''+1, where ''k'' is chosen so that ''2<sup>k</sup> < n ≤ 2<sup>k+1</sup>''.
<ref>▼
This is somewhat analogous to the spaces between words in a sentence; they mark where one word ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is prefix-free. However, modern communication systems send everything as sequences of "1" and "0" – adding a third symbol would be expensive, and using it only at the ends of words would be inefficient. [[Morse code]] is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, [[Fibonacci coding]] uses a "11" to mark the end of every code word.▼
[[Huffman coding]] is a more sophisticated technique for constructing variable-length prefix codes. The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths. (This is closely related to minimizing the entropy.) This is a form of [[lossless data compression]] based on [[entropy encoding]].
▲Some codes mark the end of a code word with a special "comma" symbol (also called a [[Sentinel value]]), different from normal data.<ref>{{cite web |url=http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf |title=Development of Trigger and Control Systems for CMS |first1=J. |last1=A. Jones |page=70 |publisher=High Energy Physics, Blackett Laboratory, Imperial College, London |url-status=dead |archive-url= https://web.archive.org/web/20110613183447/http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf |archive-date= Jun 13, 2011 }}</ref> This is somewhat analogous to the spaces between words in a sentence; they mark where one word ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is automatically prefix-free. However,
[[Self-synchronizing code]]s are prefix codes that allow [[frame synchronization]].
==Related concepts==
==Prefix codes in use today==▼
A '''suffix code''' is a set of words none of which is a suffix of any other; equivalently, a set of words which are the reverse of a prefix code. As with a prefix code, the representation of a string as a concatenation of such words is unique. A '''bifix code''' is a set of words which is both a prefix and a suffix code.<ref name=BPR58>Berstel et al (2010) p.58</ref>
An '''optimal prefix code''' is a prefix code with minimal average length. That is, assume an alphabet of {{mvar|n}} symbols with probabilities <math>p(A_i)</math> for a prefix code {{mvar|C}}. If {{mvar|C'}} is another prefix code and <math>\lambda'_i</math> are the lengths of the codewords of {{mvar|C'}}, then <math>\sum_{i=1}^n { \lambda_i p(A_i) } \leq \sum_{i=1}^n { \lambda'_i p(A_i) } \!</math>.<ref>[http://www.cim.mcgill.ca/~langer/423/lecture2.pdf McGill COMP 423 Lecture notes]</ref>
▲==Prefix codes in use today==
Examples of prefix codes include:
* variable-length [[Huffman coding|Huffman codes]]
* [[country calling codes]]
* [[Chen–Ho encoding]]
* the country and publisher parts of [[ISBN]]s
* the Secondary Synchronization Codes used in the [[UMTS]] [[W-CDMA]] 3G Wireless Standard
* [[VCR Plus|VCR Plus+ codes]]
* [[Unicode Transformation Format]], in particular the [[UTF-8]] system for encoding [[Unicode]] characters, which is both a prefix-free code and a [[self-synchronizing code]]<ref>{{cite web
| url = http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
| title = UTF-8 history
| first = Rob
| last = Pike
| date = 2003-04-03
▲}}</ref>
* [[variable-length quantity]]
===Techniques===
Commonly used techniques for constructing prefix codes include [[Huffman coding|Huffman codes]] and the earlier [[
* [[Elias delta coding]]
* [[Elias gamma coding]]
Line 49 ⟶ 57:
* [[Unary coding]]
* [[Golomb Rice code]]
* [[Straddling checkerboard]] (simple cryptography technique which produces prefix codes)
* binary coding<ref>{{citation|doi=10.25209/2079-3316-2018-9-4-239-252|last1=Shevchuk|first1=Y. V.|author1-link=Yury V. Shevchuk|title=Vbinary: variable length integer coding revisited|journal=Program Systems: Theory and Applications|volume=9|issue=4|pages=239–252|year=2018|url=http://psta.psiras.ru//read/psta2018_4_239-252.pdf|doi-access=free}}</ref>
==Notes==
Line 54 ⟶ 64:
==References==
* {{cite book | last1=Berstel | first1=Jean | last2=Perrin | first2=Dominique | last3=Reutenauer | first3=Christophe | title=Codes and automata | series=Encyclopedia of Mathematics and its Applications | volume=129 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2010 | url=http://www-igm.univ-mlv.fr/~berstel/LivreCodes/Codes.html | isbn=978-0-521-88831-8 | zbl=1187.94001 }}
* {{cite journal | last=Elias | first=Peter | author-link=Peter Elias | title=Universal codeword sets and representations of the integers | journal=IEEE Trans. Inf. Theory | volume=21 | number=2 | year=1975 | pages=194–203 | issn=0018-9448 | zbl=0298.94011 | doi=10.1109/tit.1975.1055349}}
* D.A. Huffman, "
* [https://web.archive.org/web/20070220234037/http://www.huffmancoding.com/david/scientific.html Profile: David A. Huffman], [[Scientific American]], Sept. 1991, pp.
* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. {{ISBN
* {{FS1037C}}
==External links==
* [http://plus.maths.org/issue10/features/infotheory/index.html Codes, trees and the prefix property] by Kona Macphee
{{Compression methods}}
[[Category:Coding theory]]
[[Category:Prefixes|code]]
[[Category:Data compression]]
[[Category:Lossless compression algorithms]] <!-- do I really need both categories? -->
|