Вы находитесь на странице: 1из 34

. . . . . .

.
.
. . .
.
.
Information Theory: Principles and Applications
Tiago T. V. Vinhoza
March 26, 2010
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 1 / 34
. . . . . .
.
.
.1
Exploring the inequalities a little bit more
.
.
.2
Source Coding
Fixed Length Codes
Variable Length Codes
.
.
.3
Asymptotic Equipartition Property
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 2 / 34
. . . . . .
Exploring the inequalities a little bit more
Jensens Inequality
If f() is a convex function and X is a random variable
E[f(X)] f(E[X])
Let us now show that relative entropy and mutual information are
greater than zero and some other interesting properties of the
information measures.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 3 / 34
. . . . . .
Exploring the inequalities a little bit more
Log-Sum Inequality
For n positive numbers a
1
, a
2
, . . . , a
n
and b
1
, b
2
, . . . b
n
n

i=1
a
i
log
a
i
b
i

(
n

i=1
a
i
)
log

n
i=1
a
i

n
i=1
b
i
with equality if and only if a
i
/b
i
= c.
Let us now prove the convexity of the relative entropy and the
concavity of the entropy.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 4 / 34
. . . . . .
Exploring the inequalities a little bit more
Fanos Inequality
Suppose we know a random variable Y and we wish to guess the value
of a correlated random variable X.
Fanos inequality relates the probability of error in guessing X from Y
to its conditional entropy H(X|Y ).
Let

X = g(Y ), if P
e
= P(

X = X), then
H(P
e
) +P
e
log(|X| 1) H(X|Y )
where H(P
e
) is the binary entropy function evaluated at P
e
.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 5 / 34
. . . . . .
Source Coding
Source Coding
From the previous lecture: "A source encoder converts the sequence
of symbols from the source into a sequence of bits".
Types of source:
Discrete: keyboard characters, bits, . . .
Continous (time, amplitude): speech
Continuous amplitude, discrete time: sampled signal before
quantization
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 6 / 34
. . . . . .
Source Coding
Source Coding: Continous Sources
For continuous-amplitude sources, there is usually no way to map the
source values to a bit sequence such that the map is uniquely
decodable.
For example: the set of real numbers between 0 and 1 requires
innitely many binary digits for exact specication.
Quantization is necessary distortion introduced.
Source encoding: trade o between the bit rate and the level of
distortion.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 7 / 34
. . . . . .
Source Coding
Source Coding: Discrete Memoryless Sources
A discrete memoryless source (DMS) is dened by the following
properties:
The source output is an unending sequence X
1
, X
2
, X
3
, . . . of
randomly selected letters from X.
Each source output is selected from X using a common probability
measure.
Each source output X
i
is statistically independent of the other source
outputs X
j
, j = i.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 8 / 34
. . . . . .
Source Coding
Source Coding: Discrete Random Variables
A source code C for a discrete random variable X is a mapping from
X, the range of X, to D

, the set of nite length strings of symbols


from a D-ary alphabet. Let C(x) denote the codeword corresponding
to x and let l(x) denote the length of C(x).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 9 / 34
. . . . . .
Source Coding Fixed Length Codes
Fixed Length Source Codes
Convert each source letter individually into a xed-length block of L
bits.
There are 2
L
dierent combinations.
If the number of letters in the source alphabet X is less or equal to 2
L
then a dierent binary Ltuple may be assigned to each source
symbol.
Uniquely decoded from the binary blocks, and the code is uniquely
decodable.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 10 / 34
. . . . . .
Source Coding Fixed Length Codes
Fixed Length Source Codes
Requires L = log |X| bits to encode each source letter.
Hence log |X| L < log |X| + 1
For blocks of n symbols. The n-tuple source alphabet is then the
n-fold Cartesian product X
n
= X X . . . X.
|X
n
| = |X|
n
.
Each source n-tuple can be coded into L = nlog |X| bits.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 11 / 34
. . . . . .
Source Coding Fixed Length Codes
Fixed Length Source Codes
Rate L of coded bits per source symbol:
L =
L
n
Bounds:
log |X| L < log |X| +
1
n
Letting n become suciently large, the average number of coded bits
required per source symbol can be made arbitrarily close to log |X|
This method is nonprobabilistic; it does not takes into account if some
symbols occur more frequently than others.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 12 / 34
. . . . . .
Source Coding Variable Length Codes
Variable Length Source Codes
Intuition: Allocate the shortest codewords to the most probable
outcomes and the longer ones to the least likely outcomes.
Example: Morse code.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 13 / 34
. . . . . .
Source Coding Variable Length Codes
Variable Length Source Codes
Codewords of a variable-length source code: a continuing sequence of
bits, with no demarcations of codeword boundaries.
The source decoder, given an original starting point, must determine
where the codeword boundaries are (parsing).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 14 / 34
. . . . . .
Source Coding Variable Length Codes
Classes of Codes
Non-singular code
x
i
= x
j
C(x
i
) = C(x
j
)
Unambiguous for a single symbol.
Example of a non-singular code. For a binary valued random variable
X:
C(x
1
) = 0 C(x
2
) = 1.
Example of a singular code. For a binary valued random variable X:
C(x
1
) = 0 C(x
2
) = 0.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 15 / 34
. . . . . .
Source Coding Variable Length Codes
Classes of Codes
Denition: Extension of a code
X
n
D
n
: C(x
1
x
2
. . . x
n
) = C(x
1
)C(x
2
) . . . C(x
n
)
Example: C(x
1
) = 00, C(x
2
) = 11, C(x
1
x
2
) = 0011.
The extension of an uniquely decodable code is singular.
Example
C(x
1
) = 0 C(x
2
) = 1.
Example of a non uniquely decodable code:
C(x
1
) = 0 C(x
2
) = 1 C(x
3
) = 10.
Example: C(x
2
x
1
x
3
) = C(x
2
x
1
x
2
x
1
) = 1010.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 16 / 34
. . . . . .
Source Coding Variable Length Codes
Classes of Codes
Prex-free Codes: no codeword is a prex of any other codeword
They are also called instantaneous because the source symbol with
essentially no delay. As soon as the entire codeword is received at the
decoder, it can be recognized as a codeword and decoded without
waiting for additional bits.
It is very easy to check whether a code is prex-free, and therefore
uniquely decodable.
Leafs of the code tree.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 17 / 34
. . . . . .
Source Coding Variable Length Codes
Classes of Codes
All Codes
Singular Codes
Uniquely Decodable Codes
Prex-free Codes
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 18 / 34
. . . . . .
Source Coding Variable Length Codes
Kraft Inequality
It tells us about the possibilty of constructing a prex-free code for a
given source with alphabet X with a given set of codeword lengths
l(x
i
), x
i
X.

x
i
X
D
l(x
i
)
1
For the binary case, D = 2, there exists a full prex-free code with
codeword lengths {1, 2, 2}.
On the other hand a prex-free code with codeword lengths {1, 1, 2}
does not exist in the binary case.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 19 / 34
. . . . . .
Source Coding Variable Length Codes
Minimum L for prex-free codes
Kraft Inequatilty: determines which sets of coderword lengths are
possible for prex-free codes.
What set of codewords can be used to minimize the expected length
of a prex-free code?
Constrained optimization problem
min
s.t. Kraft Inequality
L
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 20 / 34
. . . . . .
Source Coding Variable Length Codes
Minimum L for prex-free codes
Entropy Bounds
H(X) L
min
H(X) + 1
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 21 / 34
. . . . . .
Source Coding Variable Length Codes
Human Codes
Result of an Information Theory class project.
Human ignored the Kraft inequality and focused on the code tree to
establish propertiess that an optimum prex-free code should have.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 22 / 34
. . . . . .
Source Coding Variable Length Codes
Binary Human Codes
Optimum codes have the property that if p
i
> p
j
, then l(x
i
) l(x
j
) .
Code tree is full.
Longest codeword has a sibling that is another longest codeword. (a
sibling dier in the nal bit)
Let X be a random symbol with a pmf satisfying p
1
p
2
. . . p
M
.
There is an optimal prex free code for X in which the codewords for
M 1 and M are siblings and have maximal length within the code.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 23 / 34
. . . . . .
Source Coding Variable Length Codes
Human Codes: An example
Probability distribution (0.4; 0.2; 0.15; 0.15; 0.1)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 24 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
In Information Theory, the analog of the law of the large numbers is
the Asymptotic Equipartition Property (AEP).
The AEP says that, given a very long string of n independent and
identically distributed discrete random variables X
1
, . . . , X
n
there
exists a typical set of sample strings (x1; . . . , x
n
) whose aggregate
probability is almost 1.
There are roughly 2
nH(X)
typical strings of length n, and each has a
probability roughly equal to 2
nH(X)
Almost all events are equally surprising.
First, lets review the weak law of large numbers.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 25 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Weak Law of Large Numbers.
Let X
1
, . . . , X
n
be a sequence of independent and equally distributed
random variables.
X =
1
n
n

i=1
X
i
sample average
Chebyshev inequality: Let X be a random variable with mean m
X
and
variance
2
X
, then P(|X m
X
| )
2
X
/
2
.
Applying this inequality to the sample mean, we have
P(|X m
X
| )
2
X
/n
2
Remember that E[X] = m
X
and var(X) =
2
X
/n.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 26 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Let X
1
, . . . , X
n
be a sequence of discrete independent and equally
distributed random variables over X.
Note that w(x) = log p
X
(x) is a real valued funcion of x X.
W(X
i
) is a random variable that takes the value w(x) for X = x.
Let W(X
1
), . . . , W(X
n
) is a sequence of random variables.
E[W(X
i
)] =

xX
p
X
(x) log p
X
(x) = H(X)
We have that for independent random variables.
w(x
1
) +w(x
2
) = log p
X
(x
1
) log p
X
(x
2
) = log p
X
1
X
2
(x
1
, x
2
)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 27 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
For a general n:

n
i=1
w(x
i
) =

n
i=1
log p
X
(x
i
) = log p
X
n
(x
n
),
where X
n
= [X
1
, . . . X
n
] and x
n
= [x
1
, . . . x
n
].
Lets do the sample average of those random variables W(X
i
)
W =
1
n
n

i=1
W(X
i
) =
log p
X
n
(x
n
)
n
Using Chebyshevs inequality we get
P
(

log p
X
n(x
n
)
n
H(X)


)

2
W
/n
2
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 28 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
The typical set A
(n)

with respect to p
X
(x) is the set of sequences
(x
1
, x
2
, . . . , x
n
) X
n
with the following property:
A
(n)

=
{
x
n
:

log p
X
n(x)
n
H(X)


}
Which can be written as:
n(H(X) +) log p
X
n(x
n
) n(H(X) )
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 29 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
P(X
n
A
(n)

) > 1

2
W
n
for n sucient large
P(X
n
A
(n)

) = P
(

log p
X
n(x
n
)
n
H(X)


)
P(X
n
A
(n)

) 1

2
W
n
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 30 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
|A
(n)

| 2
n(H(X)+)
1 =

x
n
X
n
p
X
n(x
n
)

x
n
A
(n)

p
X
n(x
n
)

x
n
A
(n)

2
n(H(X))
2
n(H(X))

x
n
A
(n)

1
2
n(H(X))
|A
(n)

|
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 31 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
|A
(n)

| (1 )2
n(H(X))
, where =

2
W
n
2
(1 ) P(X
n
A
(n)

x
n
A
(n)

2
n(H(X))
= 2
n(H(X))
|A
(n)

|
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 32 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property: Summary
Denition of typical set:
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
Size of typical set:
(1 )2
n(H(X))
|A
(n)

| 2
n(H(X)+)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 33 / 34
. . . . . .
Asymptotic Equipartition Property
Source coding in the light of the AEP
A source coder operating on strings of n source symbols need only
provide a codeword for each string x
n
in the typical set A
(n)

.
That will be shown next class.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 34 / 34

Вам также может понравиться