Content deleted Content added
IznoRepeat (talk | contribs) m add WP:TEMPLATECAT to remove from template; genfixes |
|||
Line 1:
{{Short description|Technique to compress data}}
{{Use dmy dates|date=May 2019|cs1-dates=y}} [[Image:Huffman tree 2.svg|thumb|Huffman tree generated from the exact frequencies of the text "this is an example of a huffman tree". Encoding the sentence with this code requires 135 (or 147) bits, as opposed to 288 (or 180) bits if 36 characters of 8 (or 5) bits were used (This assumes that the code tree structure is known to the decoder and thus does not need to be counted as part of the transmitted information). The frequencies and codes of each character are shown in the accompanying table.
Line 246 ⟶ 247:
=== ''n''-ary Huffman coding ===
The '''''n''-ary Huffman''' algorithm uses the {0, 1,..., ''n'' − 1} alphabet to encode message and build an ''n''-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (<math alt="''n'' equals 2">n = 2</math>) codes, except that the ''n'' least probable symbols are taken together, instead of just the 2 least probable. Note that for ''n'' greater than 2, not all sets of source words can properly form an ''n''-ary tree for Huffman coding. In these cases, additional 0-probability place holders must be added. This is because the tree must form an ''n'' to 1 contractor;{{
=== Adaptive Huffman coding ===
Line 260 ⟶ 261:
In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is ''N'' digits will always have a cost of ''N'', no matter how many of those digits are 0s, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing.
''Huffman coding with unequal letter costs'' is the generalization without this assumption: the letters of the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An example is the encoding alphabet of [[Morse code]], where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in the same manner or with the same efficiency as conventional Huffman coding, though it has been solved by [[Richard M. Karp]]<ref>{{Cite journal |last=Karp |first=Richard M. |author-link=Richard M. Karp |title=Minimum-redundancy coding for the discrete noiseless channel |url=https://ieeexplore.ieee.org/document/1057615?arnumber=1057615 |journal=IRE Transactions on Information Theory |publisher=IEEE |publication-date=31 January 1961 |volume=7 |issue=1 |pages=27–38 |doi=10.1109/TIT.1961.1057615 |url-access=subscription |via=IEEE}}</ref> whose solution has been refined for the case of integer costs by Mordecai J. Golin.<ref>{{Cite journal |last=Golin |first=Mordekai J. |date=January 1998 |publication-date=1 September 1998 |title=A Dynamic Programming Algorithm for Constructing Optimal Prefix-Free Codes with Unequal Letter Costs |url=https://page.mi.fu-berlin.de/rote/Papers/pdf/A+dynamic+programming+algorithm+for+constructing+optimal+prefix-free+codes+for+unequal+letter+costs.pdf |access-date=10 September 2024 |s2cid=2265146 |doi=10.1109/18.705558 |journal=IEEE Transactions on Information Theory |volume=44 |issue=5 |pages=
=== {{anchor|Hu-Tucker coding|Alphabetic Huffman coding|Alphabetic Huffman tree}}Optimal alphabetic binary trees (Hu–Tucker coding) ===
Line 300 ⟶ 301:
[[Category:Lossless compression algorithms]]
[[Category:Binary trees]]
[[Category:Data compression]]
|