Академический Документы
Профессиональный Документы
Культура Документы
.
.
. . .
.
.
Information Theory: Principles and Applications
Tiago T. V. Vinhoza
March 26, 2010
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 1 / 34
. . . . . .
.
.
.1
Exploring the inequalities a little bit more
.
.
.2
Source Coding
Fixed Length Codes
Variable Length Codes
.
.
.3
Asymptotic Equipartition Property
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 2 / 34
. . . . . .
Exploring the inequalities a little bit more
Jensens Inequality
If f() is a convex function and X is a random variable
E[f(X)] f(E[X])
Let us now show that relative entropy and mutual information are
greater than zero and some other interesting properties of the
information measures.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 3 / 34
. . . . . .
Exploring the inequalities a little bit more
Log-Sum Inequality
For n positive numbers a
1
, a
2
, . . . , a
n
and b
1
, b
2
, . . . b
n
n
i=1
a
i
log
a
i
b
i
(
n
i=1
a
i
)
log
n
i=1
a
i
n
i=1
b
i
with equality if and only if a
i
/b
i
= c.
Let us now prove the convexity of the relative entropy and the
concavity of the entropy.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 4 / 34
. . . . . .
Exploring the inequalities a little bit more
Fanos Inequality
Suppose we know a random variable Y and we wish to guess the value
of a correlated random variable X.
Fanos inequality relates the probability of error in guessing X from Y
to its conditional entropy H(X|Y ).
Let
X = g(Y ), if P
e
= P(
X = X), then
H(P
e
) +P
e
log(|X| 1) H(X|Y )
where H(P
e
) is the binary entropy function evaluated at P
e
.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 5 / 34
. . . . . .
Source Coding
Source Coding
From the previous lecture: "A source encoder converts the sequence
of symbols from the source into a sequence of bits".
Types of source:
Discrete: keyboard characters, bits, . . .
Continous (time, amplitude): speech
Continuous amplitude, discrete time: sampled signal before
quantization
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 6 / 34
. . . . . .
Source Coding
Source Coding: Continous Sources
For continuous-amplitude sources, there is usually no way to map the
source values to a bit sequence such that the map is uniquely
decodable.
For example: the set of real numbers between 0 and 1 requires
innitely many binary digits for exact specication.
Quantization is necessary distortion introduced.
Source encoding: trade o between the bit rate and the level of
distortion.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 7 / 34
. . . . . .
Source Coding
Source Coding: Discrete Memoryless Sources
A discrete memoryless source (DMS) is dened by the following
properties:
The source output is an unending sequence X
1
, X
2
, X
3
, . . . of
randomly selected letters from X.
Each source output is selected from X using a common probability
measure.
Each source output X
i
is statistically independent of the other source
outputs X
j
, j = i.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 8 / 34
. . . . . .
Source Coding
Source Coding: Discrete Random Variables
A source code C for a discrete random variable X is a mapping from
X, the range of X, to D
x
i
X
D
l(x
i
)
1
For the binary case, D = 2, there exists a full prex-free code with
codeword lengths {1, 2, 2}.
On the other hand a prex-free code with codeword lengths {1, 1, 2}
does not exist in the binary case.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 19 / 34
. . . . . .
Source Coding Variable Length Codes
Minimum L for prex-free codes
Kraft Inequatilty: determines which sets of coderword lengths are
possible for prex-free codes.
What set of codewords can be used to minimize the expected length
of a prex-free code?
Constrained optimization problem
min
s.t. Kraft Inequality
L
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 20 / 34
. . . . . .
Source Coding Variable Length Codes
Minimum L for prex-free codes
Entropy Bounds
H(X) L
min
H(X) + 1
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 21 / 34
. . . . . .
Source Coding Variable Length Codes
Human Codes
Result of an Information Theory class project.
Human ignored the Kraft inequality and focused on the code tree to
establish propertiess that an optimum prex-free code should have.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 22 / 34
. . . . . .
Source Coding Variable Length Codes
Binary Human Codes
Optimum codes have the property that if p
i
> p
j
, then l(x
i
) l(x
j
) .
Code tree is full.
Longest codeword has a sibling that is another longest codeword. (a
sibling dier in the nal bit)
Let X be a random symbol with a pmf satisfying p
1
p
2
. . . p
M
.
There is an optimal prex free code for X in which the codewords for
M 1 and M are siblings and have maximal length within the code.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 23 / 34
. . . . . .
Source Coding Variable Length Codes
Human Codes: An example
Probability distribution (0.4; 0.2; 0.15; 0.15; 0.1)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 24 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
In Information Theory, the analog of the law of the large numbers is
the Asymptotic Equipartition Property (AEP).
The AEP says that, given a very long string of n independent and
identically distributed discrete random variables X
1
, . . . , X
n
there
exists a typical set of sample strings (x1; . . . , x
n
) whose aggregate
probability is almost 1.
There are roughly 2
nH(X)
typical strings of length n, and each has a
probability roughly equal to 2
nH(X)
Almost all events are equally surprising.
First, lets review the weak law of large numbers.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 25 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Weak Law of Large Numbers.
Let X
1
, . . . , X
n
be a sequence of independent and equally distributed
random variables.
X =
1
n
n
i=1
X
i
sample average
Chebyshev inequality: Let X be a random variable with mean m
X
and
variance
2
X
, then P(|X m
X
| )
2
X
/
2
.
Applying this inequality to the sample mean, we have
P(|X m
X
| )
2
X
/n
2
Remember that E[X] = m
X
and var(X) =
2
X
/n.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 26 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Let X
1
, . . . , X
n
be a sequence of discrete independent and equally
distributed random variables over X.
Note that w(x) = log p
X
(x) is a real valued funcion of x X.
W(X
i
) is a random variable that takes the value w(x) for X = x.
Let W(X
1
), . . . , W(X
n
) is a sequence of random variables.
E[W(X
i
)] =
xX
p
X
(x) log p
X
(x) = H(X)
We have that for independent random variables.
w(x
1
) +w(x
2
) = log p
X
(x
1
) log p
X
(x
2
) = log p
X
1
X
2
(x
1
, x
2
)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 27 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
For a general n:
n
i=1
w(x
i
) =
n
i=1
log p
X
(x
i
) = log p
X
n
(x
n
),
where X
n
= [X
1
, . . . X
n
] and x
n
= [x
1
, . . . x
n
].
Lets do the sample average of those random variables W(X
i
)
W =
1
n
n
i=1
W(X
i
) =
log p
X
n
(x
n
)
n
Using Chebyshevs inequality we get
P
(
log p
X
n(x
n
)
n
H(X)
)
2
W
/n
2
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 28 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
The typical set A
(n)
with respect to p
X
(x) is the set of sequences
(x
1
, x
2
, . . . , x
n
) X
n
with the following property:
A
(n)
=
{
x
n
:
log p
X
n(x)
n
H(X)
}
Which can be written as:
n(H(X) +) log p
X
n(x
n
) n(H(X) )
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 29 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
P(X
n
A
(n)
) > 1
2
W
n
for n sucient large
P(X
n
A
(n)
) = P
(
log p
X
n(x
n
)
n
H(X)
)
P(X
n
A
(n)
) 1
2
W
n
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 30 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
|A
(n)
| 2
n(H(X)+)
1 =
x
n
X
n
p
X
n(x
n
)
x
n
A
(n)
p
X
n(x
n
)
x
n
A
(n)
2
n(H(X))
2
n(H(X))
x
n
A
(n)
1
2
n(H(X))
|A
(n)
|
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 31 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property
Properties of the typical set:
|A
(n)
| (1 )2
n(H(X))
, where =
2
W
n
2
(1 ) P(X
n
A
(n)
x
n
A
(n)
2
n(H(X))
= 2
n(H(X))
|A
(n)
|
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 32 / 34
. . . . . .
Asymptotic Equipartition Property
Asymptotic Equipartition Property: Summary
Denition of typical set:
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
Size of typical set:
(1 )2
n(H(X))
|A
(n)
| 2
n(H(X)+)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 33 / 34
. . . . . .
Asymptotic Equipartition Property
Source coding in the light of the AEP
A source coder operating on strings of n source symbols need only
provide a codeword for each string x
n
in the typical set A
(n)
.
That will be shown next class.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 34 / 34