Revision as of 11:40, 10 March 2023 edit ClueBot NG (talk \| contribs) Bots, Pending changes reviewers, Rollbackers 6,477,693 edits m Reverting possible vandalism by 139.5.254.134 to version by Priyanshu1rai. Report False Positive? Thanks, ClueBot NG. (4221365) (Bot) Tag: Rollback ← Previous edit		Revision as of 10:59, 19 April 2023 edit undo 103.90.157.163 (talk) →Decompression Tags: Reverted Visual edit Next edit →
Line 159: == Basic technique == ~~[[File:Huffman_coding_visualisation.svg\|thumb\|360px\|Visualisation of the use of Huffman coding~~ding to encode the message "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_{{break}}BED". In steps 2 to 6, the letters are sorted by increasing frequency, and the least frequent two at each step are combined and reinserted into the list, and a partial tree is constructed. The final tree in step 6 is traversed to generate the dictionary in step 7. Step 8 uses it to encode the message.]]▼ ~~===Compression===~~ ▲[[File:Huffman_coding_visualisation.svg\|thumb\|360px\|Visualisation of the use of Huffman coding to encode the message "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_{{break}}BED". In steps 2 to 6, the letters are sorted by increasing frequency, and the least frequent two at each step are combined and reinserted into the list, and a partial tree is constructed. The final tree in step 6 is traversed to generate the dictionary in step 7. Step 8 uses it to encode the message.]] [[Image:Huffman coding example.svg\|thumb\|A source generates 4 different symbols <math>\{a_1 , a_2 , a_3 , a_4 \}</math> with probability <math>\{0.4 ; 0.35 ; 0.2 ; 0.05 \}</math>. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is: Line 217 ⟶ 216: [[Image:Huffman huff demo.gif\|center]] --> ===Decompression=== Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Huffman tree must be somehow reconstructed. In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression efficiency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. If the data is compressed using [[canonical Huffman code\|canonical encoding]], the compression model can be precisely reconstructed with just <math>B\cdot 2^B</math> bits of information (where {{mvar\|B}} is the number of bits per symbol). Another method is to simply prepend the Huffman tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the Huffman tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet). Many other techniques are possible as well. In any case, since the compressed data can include unused "trailing bits" the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by defining a special code symbol to signify the end of input (the latter method can adversely affect code length optimality, however). == Main properties ==

Huffman coding: Difference between revisions