Noisy-channel coding theorem

In information theory, the noisy-channel coding theorem establishes that however contaminated with noise interference a communication channel may be, it is possible to communicate digital data (information) error-free up to a given maximum rate through the channel. This surprising result, sometimes called the fundamental theorem of information theory, or just Shannon's theorem, was first presented by Claude Shannon in 1948.

The Shannon limit or Shannon capacity of a communications channel is the theoretical maximum information transfer rate of the channel, for a particular noise level.

Overview

Proved by Claude Shannon in 1948, the theorem describes the maximum possible efficiency of error-correcting methods versus levels of noise interference and data corruption. The theory doesn't describe how to construct the error-correcting method, it only tells us how good the best possible method can be. Shannon's theorem has wide-ranging applications in both communications and data storage applications. This theorem is of foundational importance to the modern field of information theory.

The Shannon theorem states that given a noisy channel with information capacity C and information transmitted at a rate R, then if

R<C\,

there exists a coding technique which allows the probability of error at the receiver to be made arbitrarily small. This means that theoretically, it is possible to transmit information without error up to a limit, C.

The converse is also important. If

R>C\,

the probability of error at the receiver increases without bound as the rate is increased. So no useful information can be transmitted beyond the channel capacity. The theorem does not address the rare situation in which rate and capacity are equal.

Simple schemes such as "send the message 3 times and use at best 2 out of 3 voting scheme if the copies differ" are inefficient error-correction methods, unable to asymptotically guarantee that a block of data can be communicated free of error. Advanced techniques such as Reed-Solomon codes and, more recently, Turbo codes come much closer to reaching the theoretical Shannon limit, but at a cost of high computational complexity. With Turbo codes and the computing power in today's digital signal processors, it is now possible to reach within 1/10 of one decibel of the Shannon limit.

Mathematical statement

Theorem (Shannon, 1948):

1. For every discrete memoryless channel, the channel capacity

C=\max _{P_{X}}\,I(X;Y)

has the following property. For any ε > 0 and R < C, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε.

2. If a probability of bit error p_b is acceptable, rates up to R(p_b) are achievable, where

R(p_{b})={\frac {C}{1-H_{2}(p_{b})}}.

and

H_{2}(p_{b})

is the binary entropy function

H_{2}(p_{b})=p_{b}\log \left({\frac {1}{p_{b}}}\right)+(1-p_{b})\log \left({\frac {1}{1-p_{b}}}\right)

3. For any p_b, rates greater than R(p_b) are not achievable.

(MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991), p. 198; Shannon (1948) thm. 11)

Outline of Proof

As with several other major results in information theory, the proof of the noisy channel coding theorem includes an achievability result and a matching converse result. These two components serve to bound, in this case, the set of possible rates at which one can communicate over a noisy channel, and matching serves to show that these bounds are tight bounds.

The following outlines are only one set of many different styles available for study in information theory texts.

Achievability for discrete memoryless channels

Converse for discrete memoryless channels

Suppose a code of $2^{nR}$ codewords. Let W be drawn uniformly over this set as an index. Let $X^{n}$ and $Y^{n}$ be the codewords and received codewords, respectively.

1)

nR=H(W)=H(W|Y^{n})+I(W;Y^{n})\;

using identities involving entropy and mutual information

2)

\leq H(W|Y^{n})+I(X^{n}(W);Y^{n})

since X is a function of W

3)

\leq 1+P_{e}^{(n)}nR+I(X^{n}(W);Y^{n})

by the use of Fano's Inequality

4)

\leq 1+P_{e}^{(n)}nR+nC

by the fact that capacity is maximized mutual information.

The result of these steps is that $P_{e}^{(n)}\leq 1-{\frac {1}{nR}}-{\frac {C}{R}}$ . As the block length n goes to infinity, we obtain $P_{e}^{(n)}$ is bounded away from 0 if R is greater than C - we can only get arbitrarily low rates of error if R is less than C.

Channel coding theorem for non-stationary memoryless channel

We assume that the channel is memoryless, but its transition probabilities change with time, in a fashion known at the transmitter as well as the receiver.

Then the channel capacity is given by

$C=\lim \;\inf \;\;\max _{p^{*}(X_{1}),p^{*}(X_{2}),...}{\frac {1}{n}}\sum _{i=1}^{n}I(X_{i};Y_{i})$ where $p^{*}(X_{i})$ is the capacity achieving distribution for the ith channel. That is, $C=\lim \;\inf \;\;{\frac {1}{n}}\sum _{i=1}^{n}C_{i}$ where $C_{i}$ is the capacity of the ith channel.

Outline of the proof

The proof runs through in almost the same way as that of channel coding theorem. Achievability follows from random coding with each symbol chosen randomly from the capacity achieving distribution for that particular channel. Typicality arguments use the definition of typical sets for non-stationary sources defined in Asymptotic Equipartition Property.

The technicality of $lim\;inf$ comes into play when $\sum _{i=1}^{n}C_{i}$ doesn't converge.

References

C. E. Shannon, The Mathematical Theory of Information. Urbana, IL:University of Illinois Press, 1949 (reprinted 1998).
David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0521642981
Thomas Cover, Joy Thomas, Elements of Information Theory. New York, NY:John Wiley & Sons, Inc., 1991. ISBN 0471062596

External links

On Shannon and Shannon's law
On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay - gives an entertaining and thorough introduction to Shannon theory, including two proofs of the noisy-channel coding theorem. This text also discusses state-of-the-art methods from coding theory, such as low-density parity-check codes, and Turbo codes.

This mathematics-related article is a stub. You can help Wikipedia by expanding it.