Noisy-channel coding theorem: Difference between revisions

Content deleted Content added
Phr (talk | contribs)
m Mathematical statement: rearrange equation
Iseetho (talk | contribs)
No edit summary
Line 52:
====Achievability for discrete memoryless channels====
 
This particular proof of achievability follows the style of proofs that make use of the [[Asymptotic equipartition property]](AEP). Another style can be found in information theory texts using [[Error Exponent]]s.
 
Both types of proofs make use of a random coding argument where the codebook used across a channel is randomly constructed - this serves to reduce computational complexity while still proving the existence of a code satisfying a desired low probability of error at any data rate below the [[Channel capacity]].
 
By an AEP-related argument, given a channel, length n strings of source symbols <math>X_1^{n}</math>, and length n strings of channel outputs <math>Y_1^{n}</math>, we can define a jointly typical set by the following:
==== Converse for discrete memoryless channels====
 
<math>A_\epsilon^{(n)} = \{ 2^{-n(H(X)+\epsilon)} \le p(X_1^n) \le 2^{-n(H(X) - \epsilon)}</math>
Suppose a code of <math>2^{nR}</math> codewords. Let W be drawn uniformly over this set as an index. Let <math>X^n</math> and <math>Y^n</math> be the codewords and received codewords, respectively.
 
:1) ::<math>nR = 2^{-n(H(WY) =+ H\epsilon)} \le p(W|YY_1^n) +\le I(W;Y2^{-n(H(Y)-\;epsilon)}</math> using identities involving entropy and mutual information
 
:2) ::<math>\le 2^{-n(H(W|X,Y^n) + I\epsilon)} \le p(XX_1^n(W);Y, Y_1^n)</math> since\le 2^{-n(H(X,Y) is-\epsilon)} a function of W\}</math>
 
'''Steps'''
:3) <math>\le 1 + P_e^{(n)}nR + I(X^n(W);Y^n)</math> by the use of [[Fano's Inequality]]
#In the style of the random coding argument, we randomly generate <math> 2^{nR} </math> codewords of length n from a probability distribution Q.
#This code is revealed to the sender and receiver. It is also assumed both know the transition matrix <math>p(y|x)</math> for the channel being used.
#A message W is chosen according to the uniform distribution on the set of codewords. That is, <math>Pr(W = w) = 2^{-nR}, w = 1, 2, ..., 2^{nR}</math>.
#The message W is sent across the channel.
#The receiver receives a sequence according to <math>P(y^n|x^n(w))=</math>
#Sending these codewords across the channel, we receive <math>Y_1^n</math>, and decode to some source sequence if there exists exactly 1 codeword that is jointly typical in Y. If there are no jointly typical codewords, or if there are more than one, an error is declared. An error also occurs if a decoded codeword doesn't match the original codeword. This is called ''typical set decoding''.
 
 
The probability of error of this scheme is divided into two parts:
 
#First, error can occur if no jointly typical X sequences are found for a received Y sequence
#Second, error can occur if an incorrect X sequence is jointly typical with a received Y sequence.
 
 
By the randomness of the code construction, we can assume that the average probability of error averaged over all codes does not depend on the index sent. Thus, without loss of generality, we can assume W = 1.
 
 
From the AEP, we know that the probability that no jointly typical X exists goes to 0 as n grows large. We can bound this error probability by <math>\epsilon</math>.
 
Also from the AEP, we know the probability that a particular <math>X_1^n(i)</math> and the <math>Y_1^n</math> resulting from W = 1 are jointly typical is <math>\le 2^{-n(I(X;Y) - 3\epsilon)}</math>. Thus,
 
 
Define <math>E_i = \{(X_1^n(i), Y_1^n) is in A_\epsilon^{(n)}\}, i = 1, 2, ..., 2^{nR}</math> as the event that some other message i is jointly typical with the sequence received when message 1 is sent.
 
<math>P(error) = P(error|W=1) \le P(E_1^c) + \sum_{i=2}^{2^{nR}}P(E_i)</math>
 
:::<math>\le \epsilon + 2^{-n(I(X;Y)-R-3\epsilon)}</math>
 
We can observe that as n goes to infinity, as long as R < I(X;Y) for the channel, the probability of error will go to 0.
 
==== Converse for discrete memoryless channels====
 
Suppose a code of <math>2^{nR}</math> codewords. Let W be drawn uniformly over this set as an index. Let <math>X^n</math> and <math>Y^n</math> be the codewords and received codewords, respectively.
 
:4) #<math>\lenR 1= +H(W) P_e^{= H(W|Y^n)}nR + nCI(W;Y^n)\;</math> by the factusing thatidentities capacityinvolving isentropy maximizedand mutual information.
#<math>\le H(W|Y^n) + I(X^n(W);Y^n)</math> since X is a function of W
:3) #<math>\le 1 + P_e^{(n)}nR + I(X^n(W);Y^n)</math> by the use of [[Fano's Inequality]]
#<math>\le 1 + P_e^{(n)}nR + nC</math> by the fact that capacity is maximized mutual information.
 
The result of these steps is that <math> P_e^{(n)} \le 1 - \frac{1}{nR} - \frac{C}{R} </math>. As the block length n goes to infinity, we obtain <math> P_e^{(n)}</math> is bounded away from 0 if R is greater than C - we can only get arbitrarily low rates of error if R is less than C.