Content deleted Content added
ClueBot NG (talk | contribs) m Reverting possible vandalism by 139.5.254.134 to version by Priyanshu1rai. Report False Positive? Thanks, ClueBot NG. (4221365) (Bot) |
Tags: Reverted Visual edit |
||
Line 159:
== Basic technique ==
▲[[File:Huffman_coding_visualisation.svg|thumb|360px|Visualisation of the use of Huffman coding to encode the message "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_{{break}}BED". In steps 2 to 6, the letters are sorted by increasing frequency, and the least frequent two at each step are combined and reinserted into the list, and a partial tree is constructed. The final tree in step 6 is traversed to generate the dictionary in step 7. Step 8 uses it to encode the message.]]
[[Image:Huffman coding example.svg|thumb|A source generates 4 different symbols <math>\{a_1 , a_2 , a_3 , a_4 \}</math> with probability <math>\{0.4 ; 0.35 ; 0.2 ; 0.05 \}</math>. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is:
Line 217 ⟶ 216:
[[Image:Huffman huff demo.gif|center]]
-->
===Decompression===
Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Huffman tree must be somehow reconstructed. In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression efficiency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. If the data is compressed using [[canonical Huffman code|canonical encoding]], the compression model can be precisely reconstructed with just <math>B\cdot 2^B</math> bits of information (where {{mvar|B}} is the number of bits per symbol). Another method is to simply prepend the Huffman tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads the next 8 bits to
== Main properties ==
|