Shannon–Fano coding: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:27, 23 December 2019 edit Oshwah (talk \| contribs) Edit filter managers, Autopatrolled, Checkusers, Interface administrators, Oversighters, Administrators 500,034 edits m Reverted edits by 14.139.123.193 (talk): unexplained content removal (HG) (3.4.9) Tags: Huggle Rollback ← Previous edit		Latest revision as of 20:17, 15 July 2025 edit undo Bendemaree (talk \| contribs) 4 edits m Minor grammatical correction. Tag: 2017 wikitext editor
(28 intermediate revisions by 20 users not shown)
Line 1: ~~[[Category:Lossless~~{{Short description\|Data compression algorithms]]}}▼ In the field of [[data compression]], '''Shannon–Fano coding''', named after [[Claude Shannon]] and [[Robert Fano]], is aone ~~name given to~~of two ~~different but~~ related techniques for constructing a [[prefix code]] based on a set of symbols and their probabilities (estimated or measured). * '''Shannon's method''' chooses a prefix code where a source symbol <math>i</math> is given the codeword length <math>l_i = \lceil - \log_2 p_i\rceil</math>. One common way of choosing the codewords uses the binary expansion of the cumulative probabilities. This method was proposed in Shannon's "[[A Mathematical Theory of Communication]]" (1948), his article introducing the field of [[information theory]]. * '''Fano's method''' divides the source symbols into two sets ("0" and "1") with probabilities as close to 1/2 as possible. Then those sets are themselves divided in two, and so on, until each set contains only one symbol. The codeword for that symbol is the string of "0"s and "1"s that records which half of the divides it fell on. This method was proposed in a later (in print) [[technical report]] by Fano (1949). Shannon–Fano codes are [[Optimization (mathematics)\|suboptimal]] in the sense that they do not always achieve the lowest possible expected codeword length, as [[Huffman coding]] does.<ref name="Kaur">{{cite journal \|last1=Kaur \|first1=Sandeep \|last2=Singh \|first2=Sukhjeet \|title=Entropy Coding and Different Coding Techniques \|journal=Journal of Network Communications and Emerging Technologies \|date=May 2016 \|volume=6 \|issue=5 \|page=5 \|s2cid=212439287 \|url=https://pdfs.semanticscholar.org/4253/7898a836d0384c6689a3c098b823309ab723.pdf \|~~accessdate~~archive-url=https://web.archive.org/web/20191203151816/https://pdfs.semanticscholar.org/4253/7898a836d0384c6689a3c098b823309ab723.pdf \|url-status=dead \|archive-date=2019-12-03 \|access-date=3 December 2019}}</ref> However, Shannon–Fano codes have an expected codeword length within 1 bit of optimal. Fano's method usually produces encoding with shorter expected lengths than Shannon's method. However, Shannon's method is easier to analyse theoretically. Shannon–Fano coding should not be confused with [[Shannon–Fano–Elias coding]] (also known as Elias coding), the precursor to [[arithmetic coding]]. Line 11: ==Naming== Regarding the confusion in the two different codes being referred to by the same name, Krajči et al. write:<ref name="Kraj">Stanislav Krajči, Chin-Fu Liu, Ladislav Mikeš and Stefan M. Moser (2015), "Performance analysis of Fano coding", ''2015 IEEE International Symposium on Information Theory (ISIT)''.</ref> ~~write:~~ <blockquote> Around 1948, both Claude E. Shannon (1948) and Robert M. Fano (1949) independently proposed two different source coding algorithms for an efficient description of a discrete memoryless source. Unfortunately, in spite of being different, both schemes became known under the same name ''Shannon–Fano coding''. There are several reasons for this mixup. For one thing, in the discussion of his coding scheme, Shannon mentions Fano’s scheme and calls it “substantially the same” (Shannon, 1948, p. 17 [reprint]).<ref>{{Cite book \|url=https://archive.org/details/sim_att-technical-journal_1948-07_27_3/page/402/ \|title=The Bell System Technical Journal 1948-07: Vol 27 Iss 3 \|date=1948-07-01 \|publisher=AT & T Bell Laboratories \|pages=403 \|language=en}}</ref> For another, both Shannon’s and Fano’s coding schemes are similar in the sense that they both are efficient, but ''suboptimal'' prefix-free coding schemes with a similar performance. </blockquote> Shannon's (1948) method, using predefined word lengths, is called '''Shannon–Fano coding''' by Cover and Thomas,<ref>Thomas M. Cover and Joy A. Thomas (2006), ''Elements of Information Theory'' (2nd ed.), Wiley–Interscience. "Historical Notes" to Chapter 5.</ref>, Goldie and Pinch,<ref>Charles M. Goldie and Richard G. E. Pinch (1991), ''Communication Theory'', Cambridge University Press. Section 1.6.</ref>, Jones and Jones,<ref>Gareth A. Jones and J. Mary Jones (2012), ''Information and Coding Theory'' (Springer). Section 3.4.</ref>, and Han and Kobayashi.<ref>Te Sun Han and Kingo Kobayashi (2007), ''Mathematics of Information and Coding'', American Mathematical Society. Subsection 3.7.1.</ref>. It is called '''Shannon coding''' by Yeung.<ref>Raymond W Yeung (2002), ''A First Course in Information Theory'', Springer. Subsection 3.2.2.</ref>. Fano's (1949) method, using binary division of probabilities, is called '''Shannon–Fano coding''' by Salomon<ref>David Salomon (2013), ''Data Compression: The Complete Reference'', Springer. Section 2.6.</ref> and Gupta.<ref>Prakash C. Gupta (2006), ''Data Communications and Computer Networks'', Phi Publishing. Subsection 1.11.5.</ref>. It is called '''Fano coding''' by Krajči et al.<ref name="Kraj" />. ==Shannon's code: predefined word lengths== Line 35: A second method makes use of cumulative probabilities. First, the probabilities are written in decreasing order <math>p_1 \geq p_2 \geq \cdots \geq p_n</math>. Then, the cumulative probabilities are defined as :<math>c_1 = 0, \qquad c_i = \sum_{j=1}^{i-1} ~~p_i~~p_j \text{ for }i \geq 2 , </math> so <math>c_1 = 0, c_2 = p_1, c_3 = p_1 + p_2</math> and so on. The codeword for symbol <math>i</math> is chosen to be the first <math>l_i</math> binary digits in the [[binary number\|binary expansion]] of <math>c_i</math>. Line 41: ===Example=== This example shows the construction of a Shannon–Fano code for a small alphabet. There are 5 different source symbols. Suppose 39 total symbols have been observed with the following frequencies, from which we can estimate the symbol probabilities. :{\| class="wikitable" style="text-align: center;" Line 180: Note that although the codewords under the two methods are different, the word lengths are the same. We have lengths of 2 bits for A, and 3 bits for B, C, D and E, giving an average length of :<math display="block">\frac{2\,\text{bits}\cdot(15) + 3\,\text{bits} \cdot (7+6+6+5)}{39\, \text{symbols}} \approx 2.62\,\text{bits per symbol,}</math> which is within one bit of the entropy. Line 191: Hence the expected word length satisfies :<math display="block">\mathbb E L = \sum_{i=1}^n p_il_i \leq \sum_{i=1}^n p_i (-\log_2 p_i + 1) = -\sum_{i=1}^n p_i \log_2 p_i + \sum_{i=1}^n p_i = H(X) + 1.</math> Here, <math>H(X) = - \textstyle\sum_{i=1}^n p_i \log_2 p_i</math> is the [[Entropy (information theory)\|entropy]], and [[Shannon's source coding theorem]] says that any code must have an average length of at least <math>H(X)</math>. Hence we see that the Shannon–Fano code is always within one bit of the optimal expected word length. ==Fano's code: binary splitting== Line 202: The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon–Fano coding does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon–Fano coding. Fano's version of Shannon–Fano coding is used in the <ttcode>IMPLODE</ttcode> compression method, which is part of the [[Zip (file format)\|<ttcode>ZIP</ttcode> file format]].<ref name="appnote">{{cite web \| url = http://www.pkware.com/documents/casestudies/APPNOTE.TXT \| title = <ttcode>APPNOTE.TXT</ttcode> - .ZIP File Format Specification \| ~~accessdate~~access-date = 2008-01-06 \| publisher = PKWARE Inc \| date = 2007-09-28 Line 295: This results in lengths of 2 bits for A, B and C and per 3 bits for D and E, giving an average length of :<math display="block">\frac{2\,\text{bits}\cdot(15+7+6) + 3\,\text{bits} \cdot (6+5)}{39\, \text{symbols}} \approx 2.28\,\text{bits per symbol.}</math> We see that Fano's method, with an average length of 2.28, has outperformed Shannon's method, with an average length of 2.62. Line 305: ==Comparison with other coding methods== Neither Shannon–Fano algorithm is guaranteed to generate an optimal code. For this reason, Shannon–Fano codes are almost never used; [[Huffman coding]] is almost as computationally simple and produces prefix codes that always achieve the lowest possible expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are [[Statistical independence\|independent]] and are some power of a half, i.e., <math>\textstyle 1 / 2^k</math>. In most situations, [[arithmetic coding]] can produce greater overall compression than either Huffman or Shannon–Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.<ref name="LiDrew2014">{{cncite book\|author1=Ze-Nian Li\|author2=Mark S. Drew\|author3=Jiangchuan Liu\|title=Fundamentals of Multimedia\|url=https://books.google.com/books?id=R6vBBAAAQBAJ\|date=~~March~~9 April 2014\|publisher=Springer Science & Business Media\|isbn=978-3-319-05290-8}}</ref> ===Huffman coding=== Line 311: {{Main\|Huffman coding}} A few years later, [[David A. Huffman]] (~~1949~~1952)<ref>{{Cite journal \| last1 = Huffman \| first1 = D. \|~~authorlink1~~author-link1=David A. Huffman\| title = A Method for the Construction of Minimum-Redundancy Codes \| doi = 10.1109/JRPROC.1952.273898 \| journal = [[Proceedings of the IRE]]\| volume = 40 \| issue = 9 \| pages = 1098–1101 \| year = 1952 \| url = http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf}}</ref> gave a different algorithm that always produces an optimal tree for any given symbol probabilities. While Fano's Shannon–Fano tree is created by dividing from the root to the leaves, the Huffman algorithm works in the opposite direction, merging from the leaves to the root. # Create a leaf node for each symbol and add it to a [[priority queue]], using its frequency of occurrence as the priority. # While there is more than one node in the queue: Line 350: In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped together with a combined probability of 0.282. The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.333. This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a single node and our algorithm is complete. The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters. :{\| class="wikitable" Line 370: This results in the lengths of 1 bit for A and per 3 bits for B, C, D and E, giving an average length of :<math display="block">\frac{1\,\text{bit}\cdot 15 + 3\,\text{bits} \cdot (7+6+6+5)}{39\, \text{symbols}} \approx 2.23\,\text{bits per symbol.}</math> We see that the Huffman code has outperformed both types of Shannon–Fano code, which had expected lengths of 2.62 and 2.28. Line 379: ==References== * {{cite journal \| first = R.M. \| last = Fano \| title = The transmission of information \| ~~work~~journal = Technical Report No. 65 \| year = 1949 \| publisher = [[Research Laboratory of Electronics at MIT]] \| ___location = Cambridge (Mass.), USA ~~\| ref =harv~~ \| url = https://archive.org/details/fano-tr65.7z}} * {{cite journal \| first = C.E. \| last = Shannon \| url = https://archive.org/details/ost-engineering-shannon1948 \| title = A Mathematical Theory of Communication \|[reprint with ~~journal~~corrections] \|journal= [[Bell System Technical Journal]] \| volume = 27 \| pages = 379–423 \|date=July 1948\|doi=10.1002/j.1538-7305.1948.tb01338.x }} {{Compression methods}} {{DEFAULTSORT:Shannon-Fano coding}} ▲[[Category:Lossless compression algorithms]] [[Category:Claude Shannon]] [[Category:Entropy coding]] [[Category:Data compression]]