Вы находитесь на странице: 1из 42

Tucker Hermans

Probability Crash Course

CS 6300
Artificial Intelligence
Spring 2018
Tucker Hermans
thermans@cs.utah.edu
Many slides courtesy of
Pieter Abbeel and Dan Klein 1
Tucker Hermans

Today
Ø Probability
Ø Random Variables
Ø Joint and Marginal Distributions
Ø Conditional Distribution
Ø Inference by Enumeration
Ø Product Rule, Chain Rule, Bayes’ Rule
Ø Independence

Ø Statistics
Ø Expected value

Ø Material needed for the rest of the course

2
Tucker Hermans

The New AI: Probability and Statistics


• Old-style AI: search, logic, knowledge representation
• Emphasized “high-level” intelligence
• Fragile and didn't work in the real world
• AI received a bad name
• New-style AI: probabilistic and statistical
• Fantastically successful in robotics, etc.
• Emphasizes limitations of sensing, action, computing
• AIMA is encyclopedic, ours is one path through
• In fact, Berkeley AI course uses SB some of the time

3
Tucker Hermans

Random Variables

Ø A random variable is some aspect of the world about which


we (may) have uncertainty
Ø R = Is it raining?
Ø D = How long will it take to drive to work?
Ø L = Where am I?
Ø T = Will trump be elected?

Ø We denote random variables with capital letters


Ø Values of variables are lower-case letters

Ø Like in a CSP, each random variable has a domain


Ø R in {true, false} (often write as {r, ¬r})
Ø D in [0, ¥)
Ø L in possible locations, maybe {(0,0),(0,1),...}

4
Tucker Hermans

Uncertainty
Ø General situation:
Ø Observed variables (evidence):
Agent knows certain things about the
state of the world (e.g., sensor
readings or symptoms)
Ø Unobserved variables:
Agent needs to reason about other
aspects (e.g., where an object is or
what disease is present)
Ø Model:
Agent knows something about how the
known variables relate to the unknown
variables

Ø Probabilistic reasoning gives us a


framework for managing our beliefs
and knowledge

a 5
Tucker Hermans

Probability Distribution
• A probability distribution is an assignment of weights
to outcomes
• Example: traffic on freeway?
• Random variable: T = whether there’s traffic
• Outcomes: T in {none, light, heavy}
• Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) =
0.20
• Some laws of probability:
• Probabilities are always non-negative
• Probabilities over all possible outcomes sum to one
• As we get more evidence, probabilities may change:
• P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60

6
Tucker Hermans

Probability Distributions
Ø A distribution is a TABLE of probabilities of values

T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.4

Ø A probability (lower case value) is a single number


P(W = rain) = 0.4 P(rain) = 0.4
Ø Must have:

∀xP(x) ≥ 0 ∑ P(x) = 1
x∈X
7
Tucker Hermans

Joint Distributions
Ø A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
hot sun 0.4
Ø Size of distribution if n variables with
domain sizes d? hot rain 0.1
cold sun 0.2
Ø Must obey: cold rain 0.3

Ø For all but the smallest distributions, impractical to write out


8
Tucker Hermans

Probabilistic Models
Ø A probabilistic model is a joint distribution
over a set of random variables Constraint over T,W
T W P
Ø CSPs:
Ø Variables with domains hot sun T
Ø Constraints: state whether hot rain F
assignments are possible
Ø Ideally: only certain variables directly cold sun F
interact cold rain T

Ø Probabilistic models: Distribution over T,W


Ø (Random) variables with domains T W P
Ø Assignments are called outcomes
hot sun 0.4
Ø Joint distributions: say whether
assignments (outcomes) are likely hot rain 0.1
Ø Normalized: sum to 1.0 cold sun 0.2
Ø Ideally: only certain variables directly
interact cold rain 0.3

9
Tucker Hermans

Events
Ø An event is a set E of outcomes
T W P
hot sun 0.4
hot rain 0.1
Ø From a joint distribution, we can
calculate the probability of any event cold sun 0.2
cold rain 0.3
Ø Probability that it’s hot AND sunny?

Ø Probability that it’s hot?

Ø Probability that it’s hot OR sunny?

Ø Typically, the events we care about are


partial assignments, like P(T=hot)

10
Tucker Hermans

Quiz: Events
§ P(+x, +y) ?

X Y P
+x +y 0.2
+x -y 0.3
§ P(+x) ?
-x +y 0.4
-x -y 0.1

§ P(-y OR +x) ?

11
Tucker Hermans

Marginal Distributions
Ø Marginal distributions are sub-tables which eliminate variables
Ø Marginalization (summing out): Combine collapsed rows by
adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4

12
Tucker Hermans

Quiz: Marginal Distributions

X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4 Y P
-x -y 0.1 +y
-y

13
Tucker Hermans

Conditional Probabilities
Ø Conditional or posterior probabilities:
Ø E.g., P(cavity | toothache) = 0.8
Ø Given that toothache is all I know…

Ø Notation for conditional distributions:


Ø P(cavity | toothache) = a single number
Ø P(Cavity, Toothache) = 2x2 table summing to 1
Ø P(Cavity | Toothache) = Two 2-element vectors, each summing to 1

Ø If we know more:
Ø P(cavity | toothache, catch) = 0.9
Ø P(cavity | toothache, cavity) = 1

Ø Note: the less specific belief remains valid after more evidence arrives, but
is not always useful

Ø New evidence may be irrelevant, allowing simplification:


Ø P(cavity | toothache, traffic) = P(cavity | toothache) = 0.8

Ø This kind of inference, guided by domain knowledge, is crucial


14
Tucker Hermans

Example Problems
Ø Suppose a murder occurs in a town of population
10,000 (10,001 before the murder). A suspect is
brought in and DNA tested. The probability that
there is a DNA match given that a person is innocent
is 1/100,000; the probability of a match on a guilty
person is 1. What is the probability he is guilty
given a DNA match?

Ø Doctors have found that people with Kreuzfeld-


Jacob disease (KJ) almost invariably ate lots of
hamburgers, thus p(HamburgerEater|KJ) = 0.9. KJ is
a rare disease: about 1 in 100,000 people get it.
Eating hamburgers is widespread:
p(HamburgerEater) = 0.5. What is the probability
that a regular hamburger eater will have KJ disease?

15
Tucker Hermans

Conditional Probabilities in Ghostbusters


Ø A ghost is in the grid somewhere
Ø Sensor readings tell us how
close a square is to the ghost
Ø On the ghost: red
Ø 1 or 2 away: orange
Ø 3 or 4 away: yellow
Ø 5+ away: green

Ø Sensors are noisy, but we know


P(Color | Distance)

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)


0.05 0.15 0.5 0.3
16
Tucker Hermans

Conditional Probabilities
Ø A simple relation between joint and conditional probabilities
Ø In fact, this is taken as the definition of a conditional probability

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
17
Tucker Hermans

Quiz: Conditional Probabilities


§ P(+x | +y) ?

X Y P
+x +y 0.2
+x -y 0.3
§ P(-x | +y) ?
-x +y 0.4
-x -y 0.1

§ P(-y | +x) ?

18
Tucker Hermans

Conditional Distributions
Ø Conditional distributions are probability distributions over
some variables given fixed values of others

Conditional Distributions Joint Distribution

W P T W P
sun 0.8 hot sun 0.4
rain 0.2 hot rain 0.1
cold sun 0.2
cold rain 0.3
W P
sun 0.4
rain 0.6
19
Tucker Hermans

Normalization Trick
Ø A trick to get a whole conditional distribution at once:
Ø Select the joint probabilities matching the evidence
Ø Normalize the selection (make it sum to one)

T W P
hot sun 0.4 T P T P
hot rain 0.1 hot 0.1 hot 0.25
cold sun 0.2 Select cold 0.3 Normalize
cold 0.75
cold rain 0.3
Ø Why does this work? Because sum of selection is
P(evidence)! (P(r) here)

20
Tucker Hermans

Quiz: Normalization Trick

§ P(X | Y=-y) ?

NORMALIZE
X Y P the selection
SELECT the joint (make it sum to
+x +y 0.2 probabilities one)
+x -y 0.3 matching the
evidence
-x +y 0.4
-x -y 0.1

21
Tucker Hermans

Probabilistic Inference:

Ø Probabilistic inference: compute the desired probability from


other known probabilities
Ø E.g., conditional from joint

Ø We generally compute conditional probabilities


Ø P(on time | no reported accidents) = 0.9
Ø These represent the agent's belief given the evidence.

Ø Probabilities change with new evidence:


Ø P(on time | no accidents, 5 a.m.) = 0.95
Ø P(on time | no accidents, 5 a.m., raining) = 0.80
Ø Observing new evidence causes beliefs to be updated
22
Tucker Hermans

Inference by Enumeration

Ø P(sun)? S T W P
summer hot sun 0.30
summer hot rain 0.05
Ø P(sun | winter)? summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
Ø P(sun | winter, hot)? winter cold sun 0.15
winter cold rain 0.20

23
Tucker Hermans

Inference by Enumeration
Ø P(sun)?
S T W P
0.30+0.10+0.10+0.15= 0.65 summer hot sun 0.30
summer hot rain 0.05
Ø P(sun | winter)?
summer cold sun 0.10
P ( sun , winter )
P ( sun winter )= summer cold rain 0.05
P (winter )
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Ø P(sun | winter, hot)?

P (sun , winter , hot )


P ( sun winter , hot)=
P (winter , hot)

24
Tucker Hermans

Inference by Enumeration
Ø General case:
Ø Evidence variables:
Ø Query variables:
Ø Hidden variables:
All variables

Ø We want:

Ø First, select the entries consistent with the evidence


Ø Second, sum out H to get joint of Query and evidence:

Ø Finally, normalize the remaining entries to conditionalize

Ø Obvious problems: 25
Ø Worst-case time complexity O(dn)
Ø Space complexity O(dn) to store the joint distribution 25
Tucker Hermans

Inference by Enumeration Example 2:


Model for Ghostbusters
• Reminder: ghost is hidden, sensors are noisy
• T: Top sensor is red
• B: Bottom sensor is red
• G: Ghost is in the top
• Queries
• P(+g) = ??
• P(+g | +t) = ??
• P(+g | +t, -b) = ??

• Problem: joint distribution too large or complex


26
Tucker Hermans

The Product Rule

Ø Sometimes have a joint distribution but want a conditional

Ø Example:

D W P D W P
W P wet sun 0.1 wet sun 0.08
sun 0.8 dry sun 0.9 dry sun 0.72
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
27
Tucker Hermans

The Chain Rule

Ø More generally, can always write any joint distribution as an


incremental product of conditional distributions

Ø Why is this always true?

Ø Can now build a joint distribution by only specifying


conditionals!

28
Tucker Hermans

Bayes’ Rule
Ø Two ways to factor a joint distribution over two variables:

That’s my rule!

Ø Dividing, we get:

Ø Why is this at all helpful?


Ø Lets us build one conditional from its reverse
Ø Often one conditional is tricky but the other one is simple
Ø Foundation of many systems we’ll see later

Ø In the running for most important AI equation!

29
Tucker Hermans

Inference with Bayes’ Rule


Ø Example: Diagnostic probability from causal probability:

Ø Example:
Ø m is meningitis, s is stiff neck
Example
givens

Ø Note: posterior probability of meningitis still very small


Ø Note: you should still get stiff necks checked out! Why?

30
Tucker Hermans

Quiz: Bayes’ Rule

§ Given:
R P D W P
sun 0.8 wet sun 0.1

rain 0.2 dry sun 0.9

wet rain 0.7

dry rain 0.3

§ What is P(W | dry) ?

31
Tucker Hermans

Ghostbusters Revisited

Ø Let’s say we have two distributions:


Ø Prior distribution over ghost locations: P(L)
Ø Say this is uniform
Ø Sensor reading model: P(R | L)
Ø Given: we know what our sensors do
Ø R = reading color measured at (1,1)
Ø E.g., P(R = yellow | L=(1,1)) = 0.1

Ø We can calculate the posterior


distribution over ghost locations using
Bayes’ rule:

32
Tucker Hermans

Independence
Ø Two variables are independent in a joint distribution if:
P( X ,Y )= P( X )P(Y )
x , y : P( x , y)= P(x) P(y)
Ø Says the joint distribution factors into a product of two simpler distributions
Ø Usually variable aren’t independent!
Ø Equivalent definition of independence:
x , y : P( x y)= P( x)
Ø We write: x y

Ø Independence is a simplifying modeling assumption


Ø Empirical joint distributions: at best “close” to independent
Ø What could we assume for {Weather, Traffic, Cavity}?

Ø Independence is like something from CSPs: what?


33
Tucker Hermans

Example: Independence?

Ø Arbitrary joint
distributions can be
poorly modeled by T P W P
independent factors warm 0.5 sun 0.6
cold 0.5 rain 0.4

T W P T S P
hot sun 0.4 warm sun 0.3
hot rain 0.1 warm rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
34
Tucker Hermans

Example: Independence

Ø N fair, independent coin flips:

H 0.5 H 0.5 H 0.5


T 0.5 T 0.5 T 0.5

35
Tucker Hermans

Conditional Independence
Ø P(Toothache, Cavity, Catch)

Ø If I have a cavity, the probability that the probe catches in it


doesn't depend on whether I have a toothache:
Ø P(catch | toothache, cavity) = P(catch | cavity)

Ø The same independence holds if I don’t have a cavity:


Ø P(catch | toothache, ¬cavity) = P(catch| ¬cavity)

Ø Catch is conditionally independent of Toothache given Cavity:


Ø P(Catch | Toothache, Cavity) = P(Catch | Cavity)

Ø Equivalent statements:
Ø P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
Ø P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

36
Tucker Hermans

Conditional Independence

Ø Unconditional (absolute) independence is very rare (why?)

Ø Conditional independence is our most basic and robust form of


knowledge about uncertain environments:

P(X,Y | Z ) = P(X | Z )P(Y | Z )


X ⊥Y |Z
P(X | Z,Y ) = P(X | Z )
Ø What about this domain:
Ø Traffic
Ø Umbrella
Ø Raining

Ø What about fire, smoke, alarm?

37
Tucker Hermans

The Chain Rule Revisited

Trivial decomposition:

With conditional independence:

Representation size: 1 + 2 + 4 versus 1 + 2 + 2

38
Tucker Hermans

Ghostbusters Chain Rule


Ø Each sensor depends only on where the
ghost is

Ø That means the two sensors are


conditionally independent, given the
ghost position

Ø T: Top sensor is red


Ø B: Bottom sensor is red
Ø G: Ghost is in the top

Ø Givens:
Ø P(+g) = 0.5
Ø P(+t | +g) = 0.8
Ø P(+t | ¬g) = 0.4 P(T , B ,G)= P(G) P(T G) P(B G)
Ø P(+b | +t) = 0.4
Ø P(+b | ¬g) = 0.8
39
Tucker Hermans

Expectations
Ø The expected value of a function is its average output, weighted
by a given distribution over inputs

Ø Example: How late if I leave 60 min before my flight?


Ø Lateness is a function of traffic:
L(none) = -10, L(light) = -5, L(heavy) = 15
Ø What is my expected lateness?
Ø Need to specify some belief over T to weight the outcomes
Ø Say P(T) = {none: 2/5, light: 2/5, heavy: 1/5}
Ø The expected lateness:

40
Tucker Hermans

Expectations

Ø Real valued functions of random variables:

Ø Expectation of a function of a random variable

Ø Example: Expected value of a fair die roll

X P f
1 1/6 1
2 1/6 2
3 1/6 3
4 1/6 4
5 1/6 5
6 1/6 6 41
Tucker Hermans

The Laws of Probability are Not to be Trifled With


Ø There have been attempts to do
different methodologies for
uncertainty
Ø Fuzzy logic
Ø Three-valued logic
Ø Dempster-Shafer
Ø Non-monotonic reasoning

Ø But the laws of probability are


the only system with this property:
Ø If you gamble using them, you can't be
unfairly exploited by an opponent
using some other system
Ø Otherwise, an opponent can devise a
strategy where you lose all your money

42

Вам также может понравиться