Вы находитесь на странице: 1из 33

Chapter 7: Channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Chapter 7 outline

Definition and examples of channel capacity


Symmetric channels
Channel capacity properties
Definitions and jointly typical sequences
Channel coding theorem: achievability and converse
Hamming codes
Channels with feedback
Source-channel separation theorem

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Generic communication block diagram


Noise

Source

Encoder

Channel

Decoder

Destination

Noise
Encoder

Source

Source
coder

Decoder
Channel
coder

Channel

Channel
decoder

Source
decoder

Destination

Decode signals, detect/correct errors


Remove redundancy
Controlled adding of redundancy
Restore source

Communication system
Noise

Source

Encoder

Channel

Destination

Estimate of message

Message
Source

Decoder

Encoder

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Channel

Decoder

Destination

Capacity: key ideas


Estimate of message

Message
Source

Encoder

Channel

Decoder

Destination

choose input set of codewords so they are non-confusable at the output


number of these that we can chose will determine the channels capacity!
number that we can choose will depend on the distribution p(y|x) which
characterizes the channel!
for now we deal with discrete channels

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Discrete channel capacity

Channel

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

" #
n
!
n ofi capacity ni
Mathematical
description
f (1 f )
Pe =
i
i=m+1
Information channel capacity:

C = max I(X; Y )
p(x)

1
C = log2 (1 + |h|2 P/PN )
2
Highest rate (bits/channel use) that can
1communicate at2 reliably
2 log2 (1 + |h| P/PN )
C = theorem' says: information capacity = operational
( capacity
Channel coding

1
2
Eh 2 log2 (1 + |h| P/PN )
)
)

)
1
)
maxQ:T r(Q)=P 2 log2 IMR + HQH
Operational channel capacity:

C=

max

r(Q)=P
ChannelQ:T
capacity

EH

)
)(
)
)
log2 IMR + HQH
2

'1

Y =Channel:
HX +p(y|x)
N
X
Y
1
X=H U+N

1I(X; Y )
Capacity
C H(H
= max
Y=
p(x) U) + N
=U+N

bits/channel use

hard part, to find the ``capacity achieving input


distribution.
"
#
!
p(x, y)
p(x, y) log
I(X; Y ) =
p(x)p(y)
x,y

= ,

1
C = log2B(1= +
B1 P/N)
+ B2
2

Noiseless channel

Channel capacity
Capacity?
1 bit/channel use

C = max I(X; Y )
p(x)

= max H(X) H(X|Y )


p(x)

= max H(X) 0
p(x)

=1

Non-overlapping outputs

Channel capacity

1/2

Capacity?

1/2

1 bit/channel use
1/2

C = max I(X; Y )

1
1/2

p(x)

= max H(X) H(X|Y )


p(x)

= max H(X) 0
p(x)

=1

o
1
2
3

;Y )

X) H(X|Y Channel
)
capacity

X) log2 (3)

Noisy typewriter
Z .

Capacity?

D
E

log2 (3) = log2 (9) bits/channel use

I
J

T
S

C = max I(X; Y )

p(x)

O N M

= max H(X) H(X|Y )


p(x)

= max H(X) log2 (3)


p(x)

= log2 (27) log2 (3) = log2 (9)

Channel capacity
Capacity?

Binary erasure channel


p

1-f bits/channel use


1-p

1-f

f
1-f

Binary symmetric channel

Channel capacity
Capacity?

1-H(f)
bits/channel use

1-f
f
f

Conditional distributions
p(y=0|x=0) = p(y=1|x=1)=1-f

1-f

p(y=1|x=0) = p(y=0|x=1)=f

[Cover+Thomas pg.187]

Transition probability matrix


ReviewReview

Examples of
Channel of Channel
Examples

Review

Channel Capacity
Channel Capacity

Examples of Channel

Jointly Typical Sequences


Jointly Typical Sequences

Channel Capacity

Jointly Typical Sequences

Binary
Channels
Binary
Channels

Binary
BinaryChannels
Symmetric Channel: X = {0, 1} and Y = {0, 1}

Binary Symmetric Channel: X = {0, 1} and Y = {0, 1}


!
"
0
0
1X=f !{0,
f 1} and Y = "
{0, 1}
XBinary Symmetric Channel:
Y
ff f
fY
11
0
0
X
!
f
1 "f
1f f
X
1
Binary
Erasure
Channel: X =1Y{0, 1} and Y = {0, ?, 1}
1Y
f = {0, ?, 1}
Binary Erasure Channel: X =f {0, 1} and
0

"

{0,
f f1} and
0
Binary Erasure Channel:
X1=
Y
! f 1 fY = {0, ?,"1}
0
1f f 0
X
Y
!
0
f 1 "f
1f f 0
X
Y
Z channel: X = {0, 1} and Y = {0,0 1}
f 1f

"

Z channel: X = {0, 1} and 1Y =


0 {0, 1}
Y
0
0
f 1!
f
"
Z channel: X = {0, 1} and Y = {0, 1}
1 0
X
Y
0
0
!
f 1 "f
1 0
1Y
X B. Smida 1(ES250)
Channel Capacity
f 1f

Fall 2008-09

9 / 22

B. Smida (ES250)

B. Smida (ES250)

Channel Capacity

Channel Capacity

Fall 2008-09

Fall 2008-09

9 / 22

9 / 22

Symmetric channels

= max(1 )H ( ) + H () H ()
(7.14)

and C is achieved by a uniform


distribution on the input.
The transition matrix
of the
= max
(1 symmetric
)H ( ) channel defined above is doubly
(7.15)

stochastic. In the computation


of the capacity, we used the facts that the
= 1 of
one
, another and that all the column sums(7.16)
rows were permutations
were
equal.
1
where capacity is achieved by = 2 .
Considering
theseforproperties,
we has
cansome
define
a generalization
of the
The expression
the capacity
intuitive
meaning: Since
a
concept
of
a
symmetric
channel
as
follows:
proportion of the bits are lost in the channel, we can recover (at most)
a proportion 1 of the bits. Hence the capacity is at most 1 . It is
Definition A channel is said to be symmetric if the rows of the channel
not immediately obvious that it is possible to achieve this rate. This will
transition matrix p(y|x) are permutations of each other and the columns
follow from Shannons second theorem.
are permutations
of each
other. Athe
channel
said to some
be weakly
symmetric
In many practical
channels,
senderisreceives
feedback
from
if every
row
of
the
transition
matrix
p(|x)
is
a
permutation
of
every
"
the receiver. If feedback is available
for the binary erasure channel, other
it is
row
andclear
all the
column
are equal.
x p(y|x)
very
what
to do: sums
If a bit is
lost, retransmit
it until it gets through.
For example,
Since
the bits the
get channel
through with
with transition
probabilitymatrix
1 , the effective rate of
$able to achieve a capacity
transmission is 1 . In this way #we1are1easily
1
of 1 with feedback.p(y|x) =
3
6
2
(7.24)
Later in the chapter we prove that 31the 21rate611 is the best that can be
achieved both with and without feedback. This is one of the consequences
the surprising
does not increase the capacity of
is of
weakly
symmetricfact
butthat
not feedback
symmetric.
discrete memoryless channels.
7.2

SYMMETRIC CHANNELS

The capacity of the binary symmetric channel is C = 1 H (p) bits per


transmission, and the capacity of the binary erasure channel is C = 1
bits per transmission. Now consider the channel with transition matrix:

0.3 0.2 0.5


p(y|x) = 0.5 0.3 0.2 .
(7.17)
0.2 0.5 0.3

Here the entry in the xth row and the yth column denotes the conditional
probability p(y|x) that y is received when x is sent. In this channel, all
the rows of the probability transition matrix are permutations of each other
and so are the columns. Such a channel is said to be symmetric. Another
example of a symmetric channel is one of the form
Y =X+Z

Capacity of weakly symmetric channels

(mod c),

(7.18)

" #
n
!
n i
f (1 f )ni
Pe =
Properties of the channel
i capacity
i=m+1
C = max I(X; Y )
p(x)

C=

C=

C=

1
2

1
log2 (1 + |h|2 P/PN )
2
log2 (1 + |h|2 P/PN )

(
Eh 2 log2 (1 + |h| P/PN )
)
)

)
1
)
maxQ:T r(Q)=P 2 log2 IMR + HQH

'1

maxQ:T r(Q)=P EH

)
)(

log2 )IMR + HQH )


2

'1

Preview of the channel coding theorem

Y = HX + N
X = H1 U + N

Y = H(H1 U) + N
=U+N

What happens when we use the channel n times?

C=

1
log2 (1 + P/N)
2

Review

Examples of Channel

Channel Capacity

Jointly Typical Sequences

Previous
of the
coding
theorem
Preview
of channel
the channel
coding
theorem

An average input sequence corresponds to about 2nH(Y |X ) typical output


sequences
There are a total of 2nH(Y ) typical output sequences
For nearly error free transmission, we select a number of input sequences
whose corresponding sets of output sequences hardly overlap
The maximum number of distinct sets of output sequences is
2n(H(Y )H(Y |X )) = 2nI (Y ;X )

B. Smida (ES250)

Channel Capacity

Lets make this rigorous!

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Fall 2008-09

18 / 22

Definitions

Channel

Estimate of message

Message

Definitions

Source

Encoder

Channel

Decoder

Destination

Estimate of message

Message

Definitions

Source

Encoder

Channel

Source

Destination

Estimate of message

Message

Definitions

Decoder

Encoder

Channel

Decoder

Destination

Whats our goal?

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

" #
n
!
n ofi capacity ni
Mathematical
description
f (1 f )
Pe =
i
i=m+1
Information channel capacity:

C = max I(X; Y )
p(x)

1
C = log2 (1 + |h|2 P/PN )
2
Highest rate (bits/channel use) that can
1communicate at2 reliably
2 log2 (1 + |h| P/PN )
C = theorem' says: information capacity = operational
( capacity
Channel coding

1
2
Eh 2 log2 (1 + |h| P/PN )
)
)

1
maxQ:T r(Q)=P 2 log2 )IMR + HQH )
Operational channel capacity:

Recall the definition of typical sequences....

Lets make this 2-D!

Jointly typical sequences

Joint Asymptotic Equipartition Theorem (AEP)

Joint typicality images

Channel Coding Theorem

Proof of Achievability

Proof of Converse

ntly Typical Diagram

There are about 2nH(X ) typical X in all


Each typical Y is jointly typical with about 2nH(X |Y ) of those typical X s

Channel coding theorem

Key ideas behind channel coding theorem


Allow for arbitrarily small but nonzero probability of error
Use channel many times in succession: law of large numbers!
Probability of error calculated over a random choice of codebooks
Joint typicality decoders used for simplicity of proof
NOT constructive! Does NOT tell us how to code to achieve capacity!

Key ideas behind the channel coding theorem

Random codes

Estimate of message

Message
Source

Encoder

Channel

Decoder

Destination

Transmission

Probability of error

Estimate of message

Message
Source

Encoder

Channel

Destination

Estimate of message

Message
Source

Decoder

Encoder

Channel

Decoder

Destination

Probability of error

Random codes?

Estimate of message

Message
Source

Encoder

Channel

Destination

Estimate of message

Message
Source

Decoder

Encoder

Channel

Decoder

Destination

AY

2N H(Y )

Analogy....
#

! ! ! 2N H(Y |X)
!!!
#
!!!#
!!!
!!
!!
!!
!"
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!
"
!!!
N H(X|Y ) ! ! !
2
!!!
!!!
!!!
!!
!!

10.3 Proof of the noisy-channel coding theorem


Analogy
Imagine that we wish to prove that there is a baby in a class of one hundred
babies who weighs less than 10 kg. Individual babies are difficult to catch and
weigh. Shannons method of solving the task is to scoop up all the babies
and weigh them all at once on a big weighing machine. If we find that their
average weight is smaller than 10 kg, there must exist at least one baby who
weighs less than 10 kg indeed there must be many! Shannons method isnt
guaranteed to reveal the existence of an underweight child, since it relies on
there being a tiny number of elephants in the class. But if we use his method
and get a total weight smaller than 1000 kg then our task is solved.

Figure 10.3. Shannons method for


proving one baby weighs less than
10 kg.

From skinny children to fantastic codes


We wish to show that there exists a code and a decoder having small probability of error. Evaluating the probability of error of any particular coding
and decoding system is not easy. Shannons innovation was this: instead of
constructing a good coding and decoding system and evaluating its error probability, Shannon calculated the average probability of block error of all codes,
and proved that this average is small. There must then exist individual codes
that have small probability of block error.

[Mackay textbook, pg. 164]

Random coding and typical-set decoding


Consider the following encodingdecoding system, whose rate is R ! .
!

1. We fix P (x) and generate the S = 2N R codewords of a (N, N R! ) =

Channel coding theorem

Converse to the channel coding theorem


Based of Fanos inequality:

Converse to the channel coding theorem


Need one more result before we prove the converse:

What does this mean?

Now lets prove the channel coding converse

Converse to the channel coding theorem

Weak versus strong converses


Weak converse:

Strong converse:

Channel capacity: sharp dividing point below which


fast, and above which
exponentially fast.

exponentially

Practical coding schemes


Noise
Encoder

Source

Source
coder

Decoder
Channel
coder

Channel

Channel
decoder

Source
decoder

Destination

Example: channel coding

With permission from David J.C. Mackay

Example: channel coding

Rate?
R = # source bits / # coded bits

With permission from David J.C. Mackay

Example: channel coding


Use repetition code of rate R=1/n: 0 ! 000...0
1 ! 111...1
Decoder?

Majority vote

om
c
e
l
liab
e
Probability of error?
[n=2m+1]
r
r
! fo
!
" #
dn
n
e
e
!
N

Pe =

i=m+1

n!
o
i
t
a
unic

n i
f (1 f )ni
i

C = max I(X;WithYpermission
) from David J.C. Mackay
p(x)

Channel capacity
Is capacity R = 0?

No! just need better coding!


Now, were more interested in determining capacity than determining
(finding codes) to achieve it
Benchmarks

Practical coding schemes


Noise
Encoder

Source

Source
coder

Decoder
Channel
coder

Channel

Channel
decoder

Source
decoder

Destination

Linear block codes

Properties of linear block codes

Properties of linear block codes

Examples

of the information bits. The Hamming code that we describe below is an


parity check bit and to allow the parity checks to depend on various subsets
exampleofofthea information
parity check
code. We describe it using some simple ideas
bits. The Hamming code that we describe below is an
from linear
algebra.
example of a parity check code. We describe it using some simple ideas
To illustrate
thealgebra.
principles of Hamming codes, we consider a binary
from linear
code of block
length
7. All
operations
will be codes,
done modulo
2. Consider
To illustrate the
principles
of Hamming
we consider
a binary
code
block length
All operations
will3.be
done modulo
2. columns
Consider
the set of
all of
nonzero
binary7.vectors
of length
Arrange
them in
set of allcodes
nonzero binary vectors of length 3. Arrange them in columns
to
form the
a matrix:
Hamming
to form a matrix:

0 0 0 1 1 1 1
0 0 0 1 1 1 1
0 1 1 0 0 1 1 .
H =
(7.117)
H = 0 1 1 0 0 1 1 .
(7.117)
1 01 10 01 10 10 01 1

the set the


of vectors
of length
7 in7the
nullnull
space
Consider
set of vectors
of length
in the
spaceofofHH(the
(the vectors
vectors
Consider
# of codewords?
which
when
multiplied
by
H
give
000).
From
the
theory
of
linear
spaces,
which when multiplied by H give 000). From the theory of linear spaces,
has3,rank
we expect
the null
space
havedimension
dimension 4.
since Hsince
has H
rank
we 3,expect
the null
space
of of
H Htotohave
4.
4 the24codewords?
These
codewords
are
These
what 2are
codewords are
0000000 0100101 1000011 1100110

0000000
0100101 1000011 1100110
0001111 0101010 1001100 1101001
0001111
0101010
1001100
0010110
0110011
10101011101001
1110000
0010110
0110011
1010101
0011001 0111100 10110101110000
1111111
0011001 0111100 1011010 1111111

Since the set of codewords is the null space of a matrix, it is linear in the
sense
the sum of is
anythe
two
codewords
a codeword.
The in
setthe
of
Since the
set that
of codewords
null
space ofisaalso
matrix,
it is linear
213
codewords
therefore
forms
a
linear
subspace
of
dimension
4
in
the
vector
sense that the sum of any two codewords is also a codeword. The set of
space of dimension
7.represent the message, and the last n k bits are parity check
codeword
codewords
therefore forms
linear
subspace
ofThedimension
4 in the vector
bits. Suchaa code
is called
a systematic code.
code is often identified
by its block length n, the number of information bits k and the minimum
space of dimension 7.distance
d. For example, the above code is called a (7,4,3) Hamming code
7.11 HAMMING CODES

(i.e., n = 7, k = 4, and d = 3).


An easy way to see how Hamming codes work is by means of a Venn
diagram. Consider the following Venn diagram with three circles and with
four intersection regions as shown in Figure 7.10. To send the information
sequence 1101, we place the 4 information bits in the four intersection
regions as shown in the figure. We then place a parity bit in each of the
three remaining regions so that the parity of each circle is even (i.e., there
are an even number of 1s in each circle). Thus, the parity bits are as
shown in Figure 7.11.
Now assume that one of the bits is changed; for example one of the
information bits is changed from 1 to 0 as shown in Figure 7.12. Then
the parity constraints are violated for two of the circles (highlighted in the
figure), and it is not hard to see that given these violations, the only single
bit error that could have caused it is at the intersection of the two circles
(i.e., the bit that was changed). Similarly working through the other error
cases, it is not hard to see that this code can detect and correct any single
bit error in the received codeword.
214
CHANNEL CAPACITY
We can easily generalize this procedure to construct larger matrices
H . In general, if we use l rows in H , the code that we obtain will have
block length n = 2l 1, k = 2l l 1 and minimum distance 3. All
these codes are called Hamming codes and can correct one error.

A curiosity: Venn diagrams + Hamming codes


Hamming codes can correct up to 1 error.
1

1
1

0
1
0

214

FIGURE 7.11. Venn diagram with information bits and parity bits with even parity for each
circle.

CHANNEL CAPACITY

FIGURE 7.10. Venn diagram with information bits.

1
1

FIGURE 7.11. Venn diagram with information bits and parity bits with even parity for each
circle.

FIGURE 7.12. Venn diagram with one of the information bits changed.

Hamming codes are the simplest examples of linear parity check codes.
They demonstrate the principle that underlies the construction of other
linear codes. But with large block lengths it is likely that there will be
more than one error in the block. In the early 1950s, Reed and Solomon

parity check bit and to allow the parity checks to depend on various subsets
of the information bits. The Hamming code that we describe below is an
example of a parity check code. We describe it using some simple ideas
from linear algebra.
To illustrate the principles of Hamming codes, we consider a binary
code of block length 7. All operations will be done modulo 2. Consider
the set of all nonzero binary vectors of length 3. Arrange them in columns
to form a matrix:

0 0 0 1 1 1 1
Linear block codes: not good enough...
H = 0 1 1 0 0 1 1 .
(7.117)
1 0 1 0 1 0 1

Achieving capacity

Convolutional codes: not good


enough......
Consider
the set of vectors of length 7 in the null space of H (the vectors

which when multiplied by H give 000). From the theory of linear spaces,
since H has rank 3, we expect the null space of H to have dimension 4.
These 24 codewords are
0000000
0001111
0010110
0011001

0100101
0101010
0110011
0111100

1000011 1100110
1001100 1101001
1010101
1110000
7.11 HAMMING CODES
211
1011010 1111111

the block as a 1; otherwise, we decode it as 0. An error occurs if and


only if more than three of the bits are changed. By using longer repetition
Sincecodes,
the set
of achieve
codewords
is the
spaceofof
a matrix,
it is linear in the
we can
an arbitrarily
lownull
probability
error.
But the rate
the code
goes
zero with
length, so even
though
code
senseofthat
the also
sum
oftoany
twoblock
codewords
is also
a the
codeword.
The set of
is simple, it is really not a very useful code.
codewords
therefore
forms
a
linear
subspace
of
dimension
4
in
the vector
Instead of simply repeating the bits, we can combine the bits in some
spaceintelligent
of dimension
7. each extra bit checks whether there is an error in
fashion so that
some subset of the information bits. A simple example of this is a parity
check code. Starting with a block of n 1 information bits, we choose
the nth bit so that the parity of the entire block is 0 (the number of 1s
in the block is even). Then if there is an odd number of errors during
the transmission, the receiver will notice that the parity has changed and
detect the error. This is the simplest example of an error-detecting code.
The code does not detect an even number of errors and does not give any
information about how to correct the errors that occur.
We can extend the idea of parity checks to allow for more than one
parity check bit and to allow the parity checks to depend on various subsets
of the information bits. The Hamming code that we describe below is an
example of a parity check code. We describe it using some simple ideas
from linear algebra.
To illustrate the principles of Hamming codes, we consider a binary
code of block length 7. All operations will be done modulo 2. Consider
the set of all nonzero binary vectors of length 3. Arrange them in columns
to form a matrix:

0 0 0 1 1 1 1
H = 0 1 1 0 0 1 1 .
(7.117)
good enough...
1 0 1 0 1 0 1

Achieving capacity
Linear block codes: not

Convolutional codes: not good

Consider the set of vectors of length 7 in the null space of H (the vectors
which when multiplied by H give 000). From the theory of linear spaces,
enough...
since H has rank 3, we expect the null space of H to have dimension 4.
These 24 codewords are
0000000 0100101 1000011 1100110

0001111
1001100convolutional
1101001
Turbo codes: 1993 Berrou et al. considered
two 0101010
interleaved
0010110 0110011 1010101 1110000
0111100
1011010
1111111
codes with parallel cooperative decoders.0011001
Achieved
close
to capacity!
Since the set of codewords is the null space of a matrix, it is linear in the
sense that the sum of any two codewords is also a codeword. The set of
formsintroduced
a linear subspace by
of dimension
4 in the
Paritycodewords
Checktherefore
Codes
Gallager
invector
his
space of dimension 7.

LDPC codes: Low Density


1963 thesis, later kept alive by Michael Tanner (UICs provost!) in the 80s and
then re-discovered in the 90s, where an iterative message passing
algorithm used to decode was formulated. Also achieve close to capacity!
Excellent survey is

Channel Coding: The Road to Channel Capacity


Forney, G.D.!! Costello, D.J.!!
This paper appears in: Proceedings of the IEEE
Publication Date: June 2007
Volume: 95,! Issue: 6
On page(s): 1150-1177
ISSN: 0018-9219
INSPEC Accession Number: 9629854
Digital Object Identifier: 10.1109/JPROC.2007.895188
Current Version Published: 2007-07-30

(linked to on course website)

Feedback capacity
Channel without feedback
Estimate of message

Message
Source

Encoder

Channel

Decoder

Destination

Channel WITH feedback


Estimate of message

Message
Source

Encoder

Feedback capacity

Decoder

Destination

Feedback capacity
Channel without feedback
Estimate of message

Message
Source

Encoder

Channel

Decoder

Destination

Channel WITH feedback


Estimate of message

Message
Source

Decoder

Encoder

Destination

Source-channel separation
When are we allowed to design the source and channel coder separately AND
remain optimal from an end-to-end perspective?
Noise

Source

Encoder

Channel

Decoder

Destination

Noise
Encoder

Source

Source
coder

Decoder
Channel
coder

Channel

Channel
decoder

Source
decoder

Destination

Source-channel separation

Source-channel separation: achievability

Source

Encoder

Channel

Decoder

Destination

Source-channel separation: converse

Source-channel separation
Noise

Source

Encoder

Channel

Decoder

Destination

Noise
Encoder

Source

Source
coder

Decoder
Channel
coder

Channel

Channel
decoder

Source
decoder

Destination

Вам также может понравиться