Вы находитесь на странице: 1из 24

LECTURE 3

Joint Entropy
py
Conditional Entropy
Mutual Information
Relative Entropy

39

JOINT ENTROPY H(X,Y)

H ( X ,Y ) E[log p( X ,Y )]
p(X, Y)

Y=0

Y=1

X=0

1/2

1/4

X=1

1/4

p(X,Y)

Y=0

Y=1

X=0

1/4

1/4

X=1

1/4

1/4
40

JOINT ENTROPY H(X,Y)


E ample:
Example:

YX

X [
[ 1, 1]

pX [0.1, 0.9]
0 9]
p(X,Y)
p(
, )

Y=1

X = -1

0.1

X=1

0.9

41

CONDITIONAL ENTROPY H(Y|X)

H (Y | X ) E[log p(Y | X )]

p(Y|X)

Y=0

Y=1

X=0

2/3

1/3

X=1

42

CONDITIONAL ENTROPY H(Y|X)


E
Example:
l

YX

X [1, 1]

pX [0.1, 0.9]
0 9]
p(Y|X)

Y=1

X = -1

X=1

p(X|Y)

Y=1

X = -1

0.1

X=1

0.9

43

INTERPRETATIONS OF CONDITIONAL
ENTROPY H(Y|X)

H(Y|X):
( | ) The average
g uncertainty/information
y
in
Y when you know X
H(Y|X): The weighted average of row entropies
p(X,Y)

Y=0

Y=1

H(Y|X) = x p(X)

X 0
X=0

1/2

1/4

H(1/3)

3/4

X=1

1/4

H(1)

1/4

44

CHAIN RULES

For two RVs,,

H ( X ,Y ) H ( X ) H (Y | X ) H (Y ) H ( X | Y )
Proof:

In general,

H ( X1, X 2 ,...,X n )

H( X | X
i

i 1,X i 2 ,...X1 )

i 1

Proof:

45

GRAPHICAL VIEW OF ENTROPY, JOINT


ENTROPY, AND CONDITIONAL ENTROPY
H(X,Y)

H(X|Y)
H(X)

H(Y)
H(Y|X)

H ( X ,Y ) H ( X ) H (Y | X ) H (Y ) H ( X | Y )
46

MUTUAL INFORMATION

The mutual information is the average


g amount of
information that you get about X from observing
the value of Y

I ( X ;Y ) H ( X ) H ( X | Y ) H ( X ) H (Y ) H ( X ,Y )
H(X,Y)

H(X|Y)
H(X)

I(X;Y)

H(Y)
H(Y|X)
47

MUTUAL INFORMATION

The mutual information is symmetrical


y

I ( X ;Y ) I (Y; X )
Proof:

48

MUTUAL INFORMATION EXAMPLE


If y
you try
y to g
guess Y,, you
y have
a 50% chance of being correct
However, what if you know X?

Best guess: choose Y = X


What is the overall probability
of guessing correctly?

p(X, Y)

Y=0

Y=1

X=0

1/2

1/4

X=1

1/4

H(X,Y)=1.5

H(X|Y)=0.5
H(X)=0 811
H(X)=0.811

I(X;Y)=0.311
H(Y|X)=0.689
( | )
H(Y)=1

49

CONDITIONAL MUTUAL INFORMATION


I ( X ;Y | Z ) H ( X | Z ) H ( X | Y , Z ) H ( X | Z ) H (Y | Z ) H ( X ,Y | Z )
Note: Z conditioning applies to both X and Y

50

CHAIN RULE FOR MUTUAL INFORMATION


I ( X1, X 2 ,...,X n ;Y )

I ( X ;Y | X
i

i 1, X i 2 ,...,X1 )

i 1

Proof:

51

EXAMPLE OF USING CHAIN RULE FOR


MUTUAL INFORMATION
X

+
Z

p(X, Z)

Z=0

Z=1

X=0

1/4

1/4

X=1

1/4

1/4

Find I(X,Z;Y).

52

CONCAVE AND CONVEX FUNCTIONS


f(x)) is strictly
f(
y convex over ((a,b)
, ) if

f (u (1 )v) f (u) (1 ) f (v)

u v (a, b), 0 1

Examples:

Technique to determine the


convexity of a function:
53

JENSENS INEQUALITY
f (X ) convex
f ((X ) strictly convex

E[ f ( X )] f (E[ X ])
E[ f ( X )] f (E[ X ])

Proof:

54

JENSENS INEQUALITY EXAMPLE

Mnemonic example:
p

f (x) x2

55

RELATIVE ENTROPY

Relative entropy
py of Kullback-Leibler Divergence
g
between two probability mass vectors (functions)
p and q is defined as:

p(x)
p(x)
Ep[log
] Ep[logq(x)] H ( X )
D( p || q) p(x) log
q(x)
q(x)
xA

Property of D( p || q)

56

RELATIVE ENTROPY EXAMPLE


Rain

Cloudy

Sunny

Weather at
Seattle

p(x)

1/4

1/2

1/4

Weather at
Corvallis

q(x)

1/3

1/3

1/3

D( p || q)

D(q || p)
57

INFORMATION INEQUALITY

D( p || q) 0
Proof:

58

INFORMATION INEQUALITY

Uniform distribution has highest


g
entropy
py H ( X ) log| A |

Proof:

Mutual Information is non-negative

I ( X ;Y ) 0

Proof:

59

INFORMATION INEQUALITY

Conditioning
g reduces entropy
py H ( X | Y ) H ( X )

Proof:

Independence bound H ( X1, X 2 ,...,X n )

Proof:

H( X )
i

i 1

60

INFORMATION INEQUALITY

Conditional independence
p
bound
H ( X1, X 2 ,...,X n | Y1,Y2 ,...,Yn )

H( X | Y )
i

i 1

Proof:

Mutual information independence bound

If all X1, X2 ,...Xn or Y1,Y2 ,...Yn are independent then


n

I ( X1, X 2 ,...,X n ;Y1,Y2 ,...,Yn ) I ( Xi ;Yi )


i 1

61

Proof:

SUMMARY

62

Вам также может понравиться