Information Theory: Principles and Applications: Tiago T. V. Vinhoza

. . . . . .
.
.
. . .
.
.
Information Theory: Principles and Applications
Tiago T. V. Vinhoza
March 26, 2010
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 26, 2010 1 / 34
. . . . . .
.
.
.1
Exploring the inequalities a little bit more
.
.
.2
Source Coding
Fixed Length Codes
Variable Length Codes
.
.
.3
Asymptotic Equipartition Property
. . . . . .
Jensens Inequality
If f() is a convex function and X is a random variable
E[f(X)] f(E[X])
Let us now show that relative entropy and mutual information are
greater than zero and some other interesting properties of the
information measures.
. . . . . .
Log-Sum Inequality
For n positive numbers a
1
, a
2
, . . . , a
n
and b
1
, b
2
, . . . b
n
n
i=1
a
i
log
a
i
b
i
(
n
i=1
a
i
)
log
n
i=1
a
i
n
i=1
b
i
with equality if and only if a
i
/b
i
= c.
Let us now prove the convexity of the relative entropy and the
concavity of the entropy.
. . . . . .
Fanos Inequality
Suppose we know a random variable Y and we wish to guess the value
of a correlated random variable X.
Fanos inequality relates the probability of error in guessing X from Y
to its conditional entropy H(X|Y ).
Let

X = g(Y ), if P
e
= P(

X = X), then
H(P
e
) +P
e
log(|X| 1) H(X|Y )
where H(P
e
) is the binary entropy function evaluated at P
e
.
. . . . . .
Source Coding
Source Coding
From the previous lecture: "A source encoder converts the sequence
of symbols from the source into a sequence of bits".
Types of source:
Discrete: keyboard characters, bits, . . .
Continous (time, amplitude): speech
Continuous amplitude, discrete time: sampled signal before
quantization
. . . . . .
Source Coding
Source Coding: Continous Sources
For continuous-amplitude sources, there is usually no way to map the
source values to a bit sequence such that the map is uniquely
decodable.
For example: the set of real numbers between 0 and 1 requires
innitely many binary digits for exact specication.
Quantization is necessary distortion introduced.
Source encoding: trade o between the bit rate and the level of
distortion.
. . . . . .
Source Coding
Source Coding: Discrete Memoryless Sources
A discrete memoryless source (DMS) is dened by the following
properties:
The source output is an unending sequence X
1
, X
2
, X
3
, . . . of
randomly selected letters from X.
Each source output is selected from X using a common probability
measure.
Each source output X
i
is statistically independent of the other source
outputs X
j
, j = i.
. . . . . .
Source Coding
Source Coding: Discrete Random Variables
A source code C for a discrete random variable X is a mapping from
X, the range of X, to D
, the set of nite length strings of symbols

from a D-ary alphabet. Let C(x) denote the codeword corresponding
to x and let l(x) denote the length of C(x).
. . . . . .
Source Coding Fixed Length Codes
Fixed Length Source Codes
Convert each source letter individually into a xed-length block of L
bits.
There are 2
L
dierent combinations.
If the number of letters in the source alphabet X is less or equal to 2
L
then a dierent binary Ltuple may be assigned to each source
symbol.
Uniquely decoded from the binary blocks, and the code is uniquely
decodable.
. . . . . .
Requires L = log |X| bits to encode each source letter.
Hence log |X| L < log |X| + 1
For blocks of n symbols. The n-tuple source alphabet is then the
n-fold Cartesian product X
n
= X X . . . X.
|X
n
| = |X|
n
.
Each source n-tuple can be coded into L = nlog |X| bits.
. . . . . .
Rate L of coded bits per source symbol:
L =
L
n
Bounds:
log |X| L < log |X| +
1
n
Letting n become suciently large, the average number of coded bits
required per source symbol can be made arbitrarily close to log |X|
This method is nonprobabilistic; it does not takes into account if some
symbols occur more frequently than others.
. . . . . .
Source Coding Variable Length Codes
Variable Length Source Codes
Intuition: Allocate the shortest codewords to the most probable
outcomes and the longer ones to the least likely outcomes.
Example: Morse code.
. . . . . .
Variable Length Source Codes
Codewords of a variable-length source code: a continuing sequence of
bits, with no demarcations of codeword boundaries.
The source decoder, given an original starting point, must determine
where the codeword boundaries are (parsing).
. . . . . .
Classes of Codes
Non-singular code
x
i
= x
j
C(x
i
) = C(x
j
)
Unambiguous for a single symbol.
Example of a non-singular code. For a binary valued random variable
X:
C(x
1
) = 0 C(x
2
) = 1.
Example of a singular code. For a binary valued random variable X:
C(x
1
) = 0 C(x
2
) = 0.
. . . . . .
Classes of Codes
Denition: Extension of a code
X
n
D
n
: C(x
1
x
2
. . . x
n
) = C(x
1
)C(x
2
) . . . C(x
n
)
Example: C(x
1
) = 00, C(x
2
) = 11, C(x
1
x
2
) = 0011.
The extension of an uniquely decodable code is singular.
Example
C(x
1
) = 0 C(x
2
) = 1.
Example of a non uniquely decodable code:
C(x
1
) = 0 C(x
2
) = 1 C(x
3
) = 10.
Example: C(x
2
x
1
x
3
) = C(x
2
x
1
x
2
x
1
) = 1010.
. . . . . .
Classes of Codes
Prex-free Codes: no codeword is a prex of any other codeword
They are also called instantaneous because the source symbol with
essentially no delay. As soon as the entire codeword is received at the
decoder, it can be recognized as a codeword and decoded without
waiting for additional bits.
It is very easy to check whether a code is prex-free, and therefore
uniquely decodable.
Leafs of the code tree.
. . . . . .
Classes of Codes
All Codes
Singular Codes
Uniquely Decodable Codes
Prex-free Codes
. . . . . .
Kraft Inequality
It tells us about the possibilty of constructing a prex-free code for a
given source with alphabet X with a given set of codeword lengths
l(x
i
), x
i
X.
x
i
X
D
l(x
i
)
1
For the binary case, D = 2, there exists a full prex-free code with
codeword lengths {1, 2, 2}.
On the other hand a prex-free code with codeword lengths {1, 1, 2}
does not exist in the binary case.
. . . . . .
Minimum L for prex-free codes
Kraft Inequatilty: determines which sets of coderword lengths are
possible for prex-free codes.
What set of codewords can be used to minimize the expected length
of a prex-free code?
Constrained optimization problem
min
s.t. Kraft Inequality
L
. . . . . .
Minimum L for prex-free codes
Entropy Bounds
H(X) L
min
H(X) + 1
. . . . . .
Human Codes
Result of an Information Theory class project.
Human ignored the Kraft inequality and focused on the code tree to
establish propertiess that an optimum prex-free code should have.
. . . . . .
Binary Human Codes
Optimum codes have the property that if p
i
> p
j
, then l(x
i
) l(x
j
) .
Code tree is full.
Longest codeword has a sibling that is another longest codeword. (a
sibling dier in the nal bit)
Let X be a random symbol with a pmf satisfying p
1
p
2
. . . p
M
.
There is an optimal prex free code for X in which the codewords for
M 1 and M are siblings and have maximal length within the code.
. . . . . .
Human Codes: An example
Probability distribution (0.4; 0.2; 0.15; 0.15; 0.1)
. . . . . .
In Information Theory, the analog of the law of the large numbers is
the Asymptotic Equipartition Property (AEP).
The AEP says that, given a very long string of n independent and
identically distributed discrete random variables X
1
, . . . , X
n
there
exists a typical set of sample strings (x1; . . . , x
n
) whose aggregate
probability is almost 1.
There are roughly 2
nH(X)
typical strings of length n, and each has a
probability roughly equal to 2
nH(X)
Almost all events are equally surprising.
First, lets review the weak law of large numbers.
. . . . . .
Weak Law of Large Numbers.
Let X
1
, . . . , X
n
be a sequence of independent and equally distributed
random variables.
X =
1
n
n
i=1
X
i
sample average
Chebyshev inequality: Let X be a random variable with mean m
X
and
variance
2
X
, then P(|X m
X
| )
2
X
/
2
.
Applying this inequality to the sample mean, we have
P(|X m
X
| )
2
X
/n
2
Remember that E[X] = m
X
and var(X) =
2
X
/n.
. . . . . .
Let X
1
, . . . , X
n
be a sequence of discrete independent and equally
distributed random variables over X.
Note that w(x) = log p
X
(x) is a real valued funcion of x X.
W(X
i
) is a random variable that takes the value w(x) for X = x.
Let W(X
1
), . . . , W(X
n
) is a sequence of random variables.
E[W(X
i
)] =
xX
p
X
(x) log p
X
(x) = H(X)
We have that for independent random variables.
w(x
1
) +w(x
2
) = log p
X
(x
1
) log p
X
(x
2
) = log p
X
1
X
2
(x
1
, x
2
)
. . . . . .
For a general n:

n
i=1
w(x
i
) =
n
i=1
log p
X
(x
i
) = log p
X
n
(x
n
),
where X
n
= [X
1
, . . . X
n
] and x
n
= [x
1
, . . . x
n
].
Lets do the sample average of those random variables W(X
i
)
W =
1
n
n
i=1
W(X
i
) =
log p
X
n
(x
n
)
n
Using Chebyshevs inequality we get
P
(
log p
X
n(x
n
)
n
H(X)

)

2
W
/n
2
. . . . . .
The typical set A
(n)
with respect to p
X
(x) is the set of sequences
(x
1
, x
2
, . . . , x
n
) X
n
with the following property:
A
(n)
=
{
x
n
:
log p
X
n(x)
n
H(X)

}
Which can be written as:
n(H(X) +) log p
X
n(x
n
) n(H(X) )
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
. . . . . .
Properties of the typical set:
P(X
n
A
(n)
) > 1

2
W
n
for n sucient large
P(X
n
A
(n)
) = P
(
log p
X
n(x
n
)
n
H(X)

)
P(X
n
A
(n)
) 1

2
W
n
. . . . . .
|A
(n)
| 2
n(H(X)+)
1 =
x
n
X
n
p
X
n(x
n
)
x
n
A
(n)
p
X
n(x
n
)
x
n
A
(n)
2
n(H(X))
2
n(H(X))
x
n
A
(n)
1
2
n(H(X))
|A
(n)
|
. . . . . .
|A
(n)
| (1 )2
n(H(X))
, where =

2
W
n
2
(1 ) P(X
n
A
(n)
x
n
A
(n)
2
n(H(X))
= 2
n(H(X))
|A
(n)
|
. . . . . .
Asymptotic Equipartition Property: Summary
Denition of typical set:
2
n(H(X)+)
p
X
n(x
n
) 2
n(H(X))
Size of typical set:
(1 )2
n(H(X))
|A
(n)
| 2
n(H(X)+)
. . . . . .
Source coding in the light of the AEP
A source coder operating on strings of n source symbols need only
provide a codeword for each string x
n
in the typical set A
(n)
.
That will be shown next class.

Information Theory: Principles and Applications: Tiago T. V. Vinhoza

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Information Theory: Principles and Applications: Tiago T. V. Vinhoza

Загружено:

Авторское право:

Доступные форматы

. . . . . .

, the set of nite length strings of symbols

Вам также может понравиться