Huffman coding: Difference between revisions

Content deleted Content added
m Added mention of Hamming codes
Clean up/copyedit, improve readability
Line 40:
|}
]]
In [[computer science]] and [[information theory]], a '''Huffman code''' is a particular type of optimal [[prefix code]] that is commonly used for [[lossless data compression]]. The process of finding or using such a code is '''Huffman coding''', an algorithm developed by [[David A. Huffman]] while he was a [[Doctor of Science|Sc.D.]] student at [[Massachusetts Institute of Technology|MIT]], and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".<ref name=":0">{{Cite journal | last1 = Huffman | first1 = D. |author-link1=David A. Huffman| title = A Method for the Construction of Minimum-Redundancy Codes | doi = 10.1109/JRPROC.1952.273898 | journal = [[Proceedings of the IRE]]| volume = 40 | issue = 9 | pages = 1098–1101 | year = 1952 | url = http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf}}</ref>
 
The output from Huffman's algorithm can be viewed as a [[variable-length code]] table for encoding a source symbol (such as a character in a file). The algorithm derives this table from the estimated probability or frequency of occurrence (''weight'') for each possible value of the source symbol. As in other [[entropy encoding]] methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time [[linear time|linear]] to the number of input weights if these weights are sorted.<ref>{{cite journal | first = Jan | last = Van Leeuwen | author-link = Jan van Leeuwen | url = http://www.staff.science.uu.nl/~leeuw112/huffman.pdf | title = On the construction of Huffman trees | journal = ICALP | year =1976 | pages = 382–410 | access-date = 20 February 2014}}</ref> However, although optimal among methods encoding symbols separately, Huffman coding [[#Optimality|is not always optimal]] among all compression methods – it is replaced with [[arithmetic coding]]<ref name="LiDrew2014">{{cite book|author1=Ze-Nian Li|author2=Mark S. Drew|author3=Jiangchuan Liu|title=Fundamentals of Multimedia|url=https://books.google.com/books?id=R6vBBAAAQBAJ|date=9 April 2014|publisher=Springer Science & Business Media|isbn=978-3-319-05290-8}}</ref> or [[asymmetric numeral systems]]<ref name=PCS2015>[https://ieeexplore.ieee.org/document/7170048 J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp, ''The use of asymmetric numeral systems as an accurate replacement for Huffman coding''], Picture Coding Symposium, 2015.</ref> if a better compression ratio is required.
Line 248:
 
=== ''n''-ary Huffman coding ===
The '''''n''-ary Huffman''' algorithm uses thean alphabet of size ''n'', typically {0, 1, ..., ''n'' − -1} alphabet, to encode messagemessages and build an ''n''-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (<math alt="''n'' equals 2">n = 2</math>) codes, except that the ''n'' least probable symbols are taken together,but instead of justcombining the 2two least probable.likely Note that for ''n'' greater than 2symbols, not all sets of source words can properly form an ''n''-ary tree for Huffman coding. In these cases, additional 0-probability place holders must be added. This is because the tree must form an ''n'' toleast 1likely contractor;{{clarifysymbols |date=Septemberare 2023}}grouped for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractortogether. If the number of source words is congruent to 1 modulo ''n''&nbsp;&minus;&nbsp;1, then the set of source words will form a proper Huffman tree.
 
Note that for ''n'' > 2, not all sets of source words can properly form a complete ''n''-ary tree for Huffman coding. In these cases, additional placeholder symbols with 0 probability may need to be added. This is because the structure of the tree needs to repeatedly join ''n'' branches into one - also known as an "''n'' to 1" combination. For binary coding, this is a "2 to 1" combination, which works with any number of symbols. For ''n''-ary coding, a complete tree is only possible when the total number of symbols (real + placeholders) leaves a remainder of 1 when divided by (n-1). <ref name=":0" />
 
=== Adaptive Huffman coding ===