Content deleted Content added
Petri Krohn (talk | contribs) |
m Hartley |
||
(40 intermediate revisions by 26 users not shown) | |||
Line 1:
{{short description|Limit on data transfer rate}}
{{redirect|Shannon's theorem|text=Shannon's name is also associated with the [[sampling theorem]]}}▼
▲{{redirect|Shannon's theorem|text=Shannon's name is also associated with the [[sampling theorem]]}}
In [[information theory]], the '''noisy-channel coding theorem''' (sometimes '''Shannon's theorem''' or '''Shannon's limit'''), establishes that for any given degree of [[Noisy channel model|noise contamination of a communication channel]], it is possible to communicate discrete data (digital [[information]]) nearly error-free up to a computable maximum rate through the channel. This result was presented by [[Claude Shannon]] in 1948 and was based in part on earlier work and ideas of [[Harry Nyquist]] and [[Ralph Hartley]].▼
▲In [[information theory]], the '''noisy-channel coding theorem''' (sometimes '''Shannon's theorem''' or '''Shannon's limit'''), establishes that for any given degree of
The '''Shannon limit''' or '''Shannon capacity''' of a communication channel refers to the maximum [[Code rate|rate]] of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level. It was first described by Shannon (1948), and shortly after published in a book by [[Claude E. Shannon|Claude Elwood Shannon]] and [[Warren Weaver]] in [[1949]] entitled ''[[The Mathematical Theory of Communication]].'' ({{ISBN|0252725484}}). This founded the modern discipline of [[information theory]]. ▼
▲The '''Shannon limit''' or '''Shannon capacity''' of a communication channel refers to the maximum [[Code rate|rate]] of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level. It was first described by Shannon (1948), and shortly after published in a book by
== Overview ==
Stated by [[Claude Shannon]] in 1948, the theorem describes the maximum possible efficiency of [[error-correcting code|error-correcting methods]] versus levels of noise interference and data corruption. Shannon's theorem has wide-ranging applications in both communications and [[data storage device|data storage]]. This theorem is of foundational importance to the modern field of [[information theory]]. Shannon only gave an outline of the proof. The first rigorous proof for the discrete case is
The Shannon theorem states that given a noisy channel with [[channel capacity]] ''C'' and information transmitted at a rate ''R'', then if <math>R < C</math> there exist [[code]]s that allow the [[probability of error]] at the receiver to be made arbitrarily small. This means that, theoretically, it is possible to transmit information nearly without error at any rate below a limiting rate, ''C''.
Line 17 ⟶ 16:
The channel capacity <math>C</math> can be calculated from the physical properties of a channel; for a band-limited channel with Gaussian noise, using the [[Shannon–Hartley theorem]].
Simple schemes such as "send the message 3 times and use a best 2 out of 3 voting scheme if the copies differ" are inefficient error-correction methods, unable to asymptotically guarantee that a block of data can be communicated free of error. Advanced techniques such as [[Reed–Solomon code]]s and, more recently, [[low-density parity-check code|low-density parity-check]] (LDPC) codes and [[turbo code]]s, come much closer to reaching the theoretical Shannon limit, but at a cost of high computational complexity. Using these highly efficient codes and with the computing power in today's [[digital signal processors]], it is now possible to reach very close to the Shannon limit. In fact, it was shown that LDPC codes can reach within 0.0045 dB of the Shannon limit (for binary [[
== Mathematical statement ==
[[Image:Noisy-channel coding theorem — channel capacity graph.png|thumb|right|300px|Graph showing the proportion of a channel’s capacity (''y''-axis) that can be used for payload based on how noisy the channel is (probability of bit flips; ''x''-axis)]]
The basic mathematical model for a communication system is the following:
: <math title="Channel model">\xrightarrow[\text{Message}]{W}
\begin{array}{ |c| }\hline \text{Encoder} \\ f_n \\ \hline\end{array} \xrightarrow[\mathrm{Encoded \atop sequence}]{X^n} \begin{array}{ |c| }\hline \text{Channel} \\ p(y|x) \\ \hline\end{array} \xrightarrow[\mathrm{Received \atop sequence}]{Y^n} \begin{array}{ |c| }\hline \text{Decoder} \\ g_n \\ \hline\end{array} \xrightarrow[\mathrm{Estimated \atop message}]{\hat W}</math>
A '''message''' ''W'' is transmitted through a noisy channel by using encoding and decoding functions. An '''encoder''' maps ''W'' into a pre-defined sequence of channel symbols of length ''n''. In its most basic model, the channel distorts each of these symbols independently of the others. The output of the channel –the received sequence– is fed into a '''decoder''' which maps the sequence into an estimate of the message. In this setting, the probability of error is defined as:
::<math> P_e = \text{Pr}\left\{ \hat{W} \neq W \right\}. </math>▼
▲:: <math> P_e = \text{Pr}\left\{ \hat{W} \neq W \right\}. </math>
'''Theorem''' (Shannon, 1948):
: 1. For every discrete memoryless channel, the [[channel capacity]]
:: <math>\ C = \sup_{p_X} I(X;Y)</math><ref>For a description of the "sup" function, see [[Supremum]]</ref>
: has the following property. For any <math>\epsilon>0</math> and <math>R<C</math>, for large enough <math>N</math>, there exists a code of length <math>N</math> and rate <math>\geq R</math> and a decoding algorithm, such that the maximal probability of block error is <math>\leq \epsilon</math>.
: 2. If a probability of bit error <math>p_b</math> is acceptable, rates up to <math>R(p_b)</math> are achievable, where
:: <math>R(p_b) = \frac{C}{1-H_2(p_b)} .</math>
: and <math> H_2(p_b)</math> is the ''[[binary entropy function]]''
:: <math>H_2(p_b)=- \left[
: 3. For any <math>p_b</math>, rates greater than <math>R(p_b)</math> are not achievable.
(MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991), p. 198; Shannon (1948) thm. 11)
== Outline of proof ==
As with the several other major results in information theory, the proof of the noisy channel coding theorem includes an achievability result and a matching converse result. These two components serve to bound, in this case, the set of possible rates at which one can communicate over a noisy channel, and matching serves to show that these bounds are tight bounds.
The following outlines are only one set of many different styles available for study in information theory texts.
=== Achievability for discrete memoryless channels ===
This particular proof of achievability follows the style of proofs that make use of the [[asymptotic equipartition property]] (AEP). Another style can be found in information theory texts using [[error exponent]]s.
Line 66 ⟶ 67:
: <math>A_\varepsilon^{(n)} = \{(x^n, y^n) \in \mathcal X^n \times \mathcal Y^n </math>
::: <math>2^{-n(H(X)+\varepsilon)} \le p(X_1^n) \le 2^{-n(H(X) - \varepsilon)}</math>
::: <math>2^{-n(H(Y) + \varepsilon)} \le p(Y_1^n) \le 2^{-n(H(Y)-\varepsilon)}</math>
::: <math>{2^{-n(H(X,Y) + \varepsilon)}}\le p(X_1^n, Y_1^n) \le 2^{-n(H(X,Y) -\varepsilon)} \}</math>
We say that two sequences <math>{X_1^n}</math> and <math>Y_1^n</math> are ''jointly typical'' if they lie in the jointly typical set defined above.
'''Steps'''
# In the style of the random coding argument, we randomly generate <math> 2^{nR} </math> codewords of length n from a probability distribution Q.
# This code is revealed to the sender and receiver. It is also assumed that one knows the transition matrix <math>p(y|x)</math> for the channel being used.
# A message W is chosen according to the uniform distribution on the set of codewords. That is, <math>Pr(W = w) = 2^{-nR}, w = 1, 2, \dots, 2^{nR}</math>.
# The message W is sent across the channel.
# The receiver receives a sequence according to <math>P(y^n|x^n(w))= \prod_{i = 1}^np(y_i|x_i(w))</math> # Sending these codewords across the channel, we receive <math>Y_1^n</math>, and decode to some source sequence if there exists exactly 1 codeword that is jointly typical with Y. If there are no jointly typical codewords, or if there are more than one, an error is declared. An error also occurs if a decoded codeword
The probability of error of this scheme is divided into two parts:
# First, error can occur if no jointly typical X sequences are found for a received Y sequence
# Second, error can occur if an incorrect X sequence is jointly typical with a received Y sequence.
* By the randomness of the code construction, we can assume that the average probability of error averaged over all codes does not depend on the index sent. Thus, without loss of generality, we can assume ''W'' = 1.
* From the joint AEP, we know that the probability that no jointly typical X exists goes to 0 as n grows large. We can bound this error probability by <math>\varepsilon</math>.
* Also from the joint AEP, we know the probability that a particular <math>X_1^{n}(i)</math> and the <math>Y_1^n</math> resulting from ''W'' = 1 are jointly typical is <math>\le 2^{-n(I(X;Y) - 3\varepsilon)}</math>.
Define: <math>E_i = \{(X_1^n(i), Y_1^n) \in A_\varepsilon^{(n)}\}, i = 1, 2, \dots, 2^{nR}</math>
Line 96 ⟶ 98:
: <math>
\begin{align}
P(\text{error}) & {} = P(\text{error}|W=1) \le P(E_1^c) + \sum_{i=2}^{2^{nR}}P(E_i) \\
Line 107 ⟶ 110:
Finally, given that the average codebook is shown to be "good," we know that there exists a codebook whose performance is better than the average, and so satisfies our need for arbitrarily low error probability communicating across the noisy channel.
=== Weak converse for discrete memoryless channels ===
Suppose a code of <math>2^{nR}</math> codewords. Let W be drawn uniformly over this set as an index. Let <math>X^n</math> and <math>Y^n</math> be the transmitted codewords and received codewords, respectively.
# <math>nR = H(W) = H(W|Y^n) + I(W;Y^n)</math> using identities involving entropy and
# <math>\le H(W|Y^n) + I(X^n(W);Y^{n})</math> since X is a function of W
# <math>\le 1 + P_e^{(n)}nR + I(X^n(W);Y^n)</math> by the use of [[Fano's Inequality]]
# <math>\le 1 + P_e^{(n)}nR + nC</math> by the fact that capacity is maximized mutual information.
The result of these steps is that <math> P_e^{(n)} \ge 1 - \frac{1}{nR} - \frac{C}{R} </math>. As the block length <math>n</math> goes to infinity, we obtain <math> P_e^{(n)}</math> is bounded away from 0 if R is greater than C - we can get arbitrarily low rates of error only if R is less than C.
Line 120 ⟶ 123:
=== Strong converse for discrete memoryless channels ===
A strong converse theorem, proven by Wolfowitz in 1957,<ref>{{cite book |first=Robert |last=Gallager
: <math>▼
▲:<math>
P_e \geq 1- \frac{4A}{n(R-C)^2} - e^{-\frac{n(R-C)}{2}}
</math>
Line 128 ⟶ 132:
for some finite positive constant <math>A</math>. While the weak converse states that the error probability is bounded away from zero as <math>n</math> goes to infinity, the strong converse states that the error goes to 1. Thus, <math>C</math> is a sharp threshold between perfectly reliable and completely unreliable communication.
== Channel coding theorem for non-stationary memoryless channels ==
We assume that the channel is memoryless, but its transition probabilities change with time, in a fashion known at the transmitter as well as the receiver.
Then the channel capacity is given by
: <math>
C=\lim \inf \max_{p^{(X_1)},p^{(X_2)},...}\frac{1}{n}\sum_{i=1}^nI(X_i;Y_i).
</math>
Line 143 ⟶ 149:
where <math>C_i</math> is the capacity of the i''th'' channel.
=== Outline of the proof ===
The proof runs through in almost the same way as that of channel coding theorem. Achievability follows from random coding with each symbol chosen randomly from the capacity achieving distribution for that particular channel. Typicality arguments use the definition of typical sets for non-stationary sources defined in the [[asymptotic equipartition property]] article.
The technicality of [[lim inf]] comes into play when <math>\frac{1}{n}\sum_{i=1}^n C_i</math> does not converge.
== See also ==
* [[Asymptotic equipartition property]] (AEP)
* [[Fano's inequality]]
Line 156 ⟶ 164:
* [[Turbo code]]
== Notes ==
{{reflist}}
==References ==▼
*{{cite web |first=B. |last=Aazhang |title=Shannon's Noisy Channel Coding Theorem |date=2004 |work=Connections |publisher= |url=https://www.cse.iitd.ac.in/~vinay/courses/CSL858/reading/m10180.pdf}}
*{{cite
*
*{{cite journal |last1=Feinstein |first1=Amiel |title=A new basic theorem of information theory |journal=Transactions of the IRE Professional Group on Information Theory |date=September 1954 |volume=4 |issue=4 |pages=2–22 |doi=10.1109/TIT.1954.1057459 |hdl=1721.1/4798 |bibcode=1955PhDT........12F|hdl-access=free }}
*{{cite journal |first=Lars |last=Lundheim |title=On Shannon and Shannon's Formula |journal=Telektronik |volume=98 |issue=1 |pages=20–29 |date=2002 |doi= |url=http://www.cs.miami.edu/home/burt/learning/Csc524.142/LarsTelektronikk02.pdf}}
*{{cite
▲==References==
*{{cite journal|author-link=Claude E. Shannon | doi=10.1002/j.1538-7305.1948.tb01338.x | title=A Mathematical Theory of Communication | year=1948 | last1=Shannon | first1=C. E. | journal=Bell System Technical Journal | volume=27 | issue=3 | pages=379–423 }}
▲* [[Thomas M. Cover|Cover T. M.]], Thomas J. A., ''Elements of Information Theory'', [[John Wiley & Sons]], 1991. {{ISBN|0-471-06259-6}}
*{{cite book |author-link=Claude E. Shannon |first=C.E. |last=Shannon |title=A Mathematical Theory of Communication |publisher=University of Illinois Press |orig-year=1948 |date=1998 |pages= |url=http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html}}
▲*[[Fano|Fano, R. A.]], ''Transmission of information; a statistical theory of communications'', [[MIT Press]], 1961. {{ISBN|0-262-06001-9}}
*{{cite journal |first=J. |last=Wolfowitz |title=The coding of messages subject to chance errors |journal=Illinois J. Math. |volume=1 |issue= 4|pages=591–606 |date=1957 |doi= 10.1215/ijm/1255380682|url=https://projecteuclid.org/download/pdf_1/euclid.ijm/1255380682|doi-access=free }}
▲* [[David J.C. MacKay|MacKay, David J. C.]], ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'', [[Cambridge University Press]], 2003. {{ISBN|0-521-64298-1}} [free online]
{{DEFAULTSORT:Noisy-Channel Coding Theorem}}
[[Category:Information theory]]
[[Category:Theorems in discrete mathematics]]
|