Huffman coding: Difference between revisions

Content deleted Content added
m equivelent->equivalent
David A. Huffman
Line 1:
[[de:Huffman-Code]] [[nl:Huffmancodering]] [[pl:Kodowanie Huffmana]]
 
In [[computer science]], <b>'''Huffman coding</b> (A Method for the Construction of Minimum-Redundancy Codes 1952)''' is an [[entropy encoding]] [[algorithm]] used for [[data compression]].
<br>
It was developed by [[David A. Huffman]] and published [[1952]] ''A Method for the Construction of Minimum-Redundancy Codes''.
 
The basic idea is borrowed from an older and slightly less efficient method called
[[Data compression/Shannon-Fano coding|Shannon-Fano coding]].
 
The text to be compressed is considered as a [[string]] of symbols.
Symbols that are likely to be frequent are represented by a short sequence of [[bit]]s, and symbols that are likely to be rare are represented by a longer sequences of bits.
 
Huffman coding uses a specific method for choosing the representations for each symbol, resulting in a prefix-free code (i.e. no bit string of any symbol is a prefix of the bit string of any other symbol).
It has been proven that Huffman coding is the most effective compression method of this type,. that
That is, no other mapping of source symbols to strings of bits will produce a smaller output when the actual symbol frequencies agree with those used to create the code.
For a set of symbols whose cardinality is a power of two and a uniform probability distribution, Huffman coding is equivalent to simple binary block encoding.
 
Huffman coding is optimal when the frequencies of input characters are powers of two.
[[Data compression/Arithmetic coding|Arithmetic coding]] produces slight gains over Huffman coding, but in practice these gains have not been large enough to offset arithmetic coding's higher computational complexity and patent royalties (as of November 2001, IBM owns patents on the core concepts of arithmetic coding in several jurisdictions).
but in practice these gains have not been large enough to offset arithmetic coding's higher computational complexity and patent royalties (as of November 2001, IBM owns patents on the core concepts of arithmetic coding in several jurisdictions).
 
Huffman works by creating a [[binary tree]] of symbols:
Line 28 ⟶ 33:
Extreme cases of Huffman codes are connected with Fibonacci numbers. For example, see http://mathforum.org/discuss/sci.math/t/207334.
 
Huffman coding today is often used as a "back-end" to some other compression method.
[[Data compression/Deflation|Deflation]] (PKZIP's algorithm) and multimedia codecs such as [[JPEG]] and [[MP3]] have a front-end model and quantization followed by Huffman coding.
 
n-ary Huffman algorithm uses the {0, 1, ..., n-1} alphabet to encode message. Built tree is n-ary one.
 
Huffman Template algorithm enables to use non-numerical weights (costs, frequences).
For example, see http://alexvn.freeservers.com/s1/huffman_template_algorithm.html