Вы находитесь на странице: 1из 5

Chapter2: Entropy, Relative Entropy and Mutual Information

Definition: The entropy of a discrete random variable is defined by

let

then of a pair of discrete random variables is defined as

Definition: The joint entropy with a joint distribution

Definition: if

then the conditional entropy

is defined as

Definition: The relative entropy or Kullback Leibler distance between two probability mass functions and is defined as

Relative entropy is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality. Definition: Consider two random variables and with a joint probability mass function and marginal probability mass function and . The mutual information is the relative entropy between the joint distribution and the product distribution , i.e.,

Definition: The conditional mutual information of random variable is defined by

and

and

Chain Rule for Entropy: Let Then

be drawn according to

Chain Rule for Information:

Definition: The conditional relative entropy is the average of relative entropies between the conditional probability mass functions and averaged over the probability mass function .

Chain Rule for Relative Entropy:

Definition: A function every and

is said to be convex over an interval ,

if for

A function is said to be strictly convex if equality holds only if Definition: A function is concave if is convex

or

A function is convex if it always lies below any chord. A function is concave if it always lies above any chord. Theorem: if the function has a second derivative which is non-negative (positive) everywhere, then the function is convex (strictly convex) Theorem: Jensen's inequality : If is a convex function and is a random variable then

If f is a convex function on the interval [ab] , for each xknk=1[ab] and each knk=1 with k0 one has: fknknk=1kxkknknk=1kfxk A common situation occurs when1+2++n=1; in this case, the inequality simplifies to:

fnk=1kxknk=1kf(xk)
where0k1. Iffis a concave function, the inequality is reversed. Corollary: Non-negativity of mutual information: For any two random variables, and ,

with equality if and only if Corollary: with equality if and only if Corollary: , with equality if and only if

and are independent.

for all and with

and are conditionally independent given

Theorem: , where denotes the number of elements in the range of , with equality if and only if has a uniform distribution

Theorem: Conditioning reduces entropy: with equality if and only if and are independent Theorem: Independence bound on entropy: Let to , then

be drawn according

Theorem: Log sum inequality : For non-negative numbers,

and

with equality if and only if Theorem: is a convex in the pair paris of probability mass functions , then , i.e., if and are two

for all Theorem: Concavity of entropy: is a concave function of outcomes. The concavity of then

where u is the uniform distribution of follows directly from the convexity of Theorem: Let a concave function of fixed

for fixed

. The mutual information and a convex function of

is for

Definition: Random variables are said to form a Markov chain in the order (denoted ) if the conditional distribution of only depends on and is conditionally independent of . Specifically, form a Markov chain if the joint probability mass function can be written as

if and only if

and are conditionally independent given

implies Theorem: Data Processing inequality: If We have equality if and only if Similarly Corollary: In particular, if Corollary: if then when do not form a , we have , i.e., , then

Note that it is also possible that Markov Chain.

Вам также может понравиться