Huffman coding: Difference between revisions

Content deleted Content added
mNo edit summary
Small alphabets create problems; alternating between codes can provide a solution. Also sectionified it and explained the basis behind Huffman Template.
Line 15:
For a set of symbols whose cardinality is a power of two and a uniform probability distribution, Huffman coding is equivalent to simple binary block encoding.
 
== Basic technique ==
Huffman coding is optimal when the frequencies of input characters are powers of two.
HuffmanThe technique works by creating a [[binary tree]] of symbols:
[[Arithmetic coding]] produces slight gains over Huffman coding, but in practice these gains have not been large enough to offset arithmetic coding's higher computational complexity and patent royalties (as of November 2001, IBM owns patents on the core concepts of arithmetic coding in several jurisdictions).
 
Huffman works by creating a [[binary tree]] of symbols:
#Start with as many trees as there are symbols.
#While there is more than one tree:
##Find the two trees with the smallest total probabilityweight.
##Combine the trees into one, setting one as the left child and the other as the right.
#Now the tree contains all the symbols. A '0' represents following the left child; a '1' represents following the right child.
 
== Variations ==
There are variations.
The frequencies used can be generic ones for the application ___domain that are based on average experience, or they can be the actual frequencies found in the text being compressed.
(This variation requires that a [[frequency table]] or other hint as to the encoding must be stored with the compressed text; implementations employ various tricks to store these tables efficiently.)
A variation called "adaptive Huffman coding" calculates the frequencies dynamically based on recent actual frequencies in the source string. This is somewhat related to the [[LZ77|LZ]] family of algorithms.
 
Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only a way to order weights and to add them.
Huffman Template algorithm enables to use non-numerical weights (costs, frequences).
For example, see http://alexvn.freeservers.com/s1/huffman_template_algorithm.html
 
n-ary Huffman algorithm uses the {0, 1, ..., n-1} alphabet to encode message. Built tree is n-ary one.
 
Extreme cases of Huffman codes are connected with [[Fibonacci number]]s. For example, see http://mathforum.org/discuss/sci.math/t/207334.
 
Huffman coding is optimal when the frequenciesprobability of each input characterssymbol is area powerspower of two.
Prefix-free codes tend to have slight inefficiency on small alphabets, where probabilities often fall between powers of two.
[[Arithmetic coding]] produces slight gains over Huffman coding, but in practice these gains have not been large enough to offset arithmetic coding's higher computational complexity and patent royalties ([[As of 2001|as of November 2001]], IBM owns patents on the core concepts of arithmetic coding in several jurisdictions).
In addition, alternating between Huffman codes for symbols in even and odd positions can in effect allocate fractional numbers of bits, improving results in small-symbol alphabets.
 
== Applications ==
Huffman coding today is often used as a "back-end" to some other compression method.
[[DEFLATE]] ([[PKZIP]]'s algorithm) and multimedia [[codec]]s such as [[JPEG]] and [[MP3]] have a front-end model and [[quantization]] followed by Huffman coding.
 
n-ary Huffman algorithm uses the {0, 1, ..., n-1} alphabet to encode message. Built tree is n-ary one.
 
Huffman Template algorithm enables to use non-numerical weights (costs, frequences).
For example, see http://alexvn.freeservers.com/s1/huffman_template_algorithm.html