08 Probability

Tucker Hermans
Probability Crash Course
CS 6300
Artificial Intelligence
Spring 2018
Tucker Hermans
thermans@cs.utah.edu
Many slides courtesy of
Pieter Abbeel and Dan Klein 1
Tucker Hermans
Today
Ø Probability
Ø Random Variables
Ø Joint and Marginal Distributions
Ø Conditional Distribution
Ø Inference by Enumeration
Ø Product Rule, Chain Rule, Bayes’ Rule
Ø Independence
Ø Statistics
Ø Expected value
Ø Material needed for the rest of the course
2
Tucker Hermans
The New AI: Probability and Statistics

• Old-style AI: search, logic, knowledge representation
• Emphasized “high-level” intelligence
• Fragile and didn't work in the real world
• AI received a bad name
• New-style AI: probabilistic and statistical
• Fantastically successful in robotics, etc.
• Emphasizes limitations of sensing, action, computing
• AIMA is encyclopedic, ours is one path through
• In fact, Berkeley AI course uses SB some of the time
3
Tucker Hermans
Random Variables
Ø A random variable is some aspect of the world about which

we (may) have uncertainty
Ø R = Is it raining?
Ø D = How long will it take to drive to work?
Ø L = Where am I?
Ø T = Will trump be elected?
Ø We denote random variables with capital letters

Ø Values of variables are lower-case letters
Ø Like in a CSP, each random variable has a domain

Ø R in {true, false} (often write as {r, ¬r})
Ø D in [0, ¥)
Ø L in possible locations, maybe {(0,0),(0,1),...}
4
Tucker Hermans
Uncertainty
Ø General situation:
Ø Observed variables (evidence):
Agent knows certain things about the
state of the world (e.g., sensor
readings or symptoms)
Ø Unobserved variables:
Agent needs to reason about other
aspects (e.g., where an object is or
what disease is present)
Ø Model:
Agent knows something about how the
known variables relate to the unknown
variables
Ø Probabilistic reasoning gives us a

framework for managing our beliefs
and knowledge
a 5
Tucker Hermans
Probability Distribution
• A probability distribution is an assignment of weights
to outcomes
• Example: traffic on freeway?
• Random variable: T = whether there’s traffic
• Outcomes: T in {none, light, heavy}
• Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) =
0.20
• Some laws of probability:
• Probabilities are always non-negative
• Probabilities over all possible outcomes sum to one
• As we get more evidence, probabilities may change:
• P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60
6
Tucker Hermans
Probability Distributions
Ø A distribution is a TABLE of probabilities of values
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.4
Ø A probability (lower case value) is a single number

P(W = rain) = 0.4 P(rain) = 0.4
Ø Must have:
∀xP(x) ≥ 0 ∑ P(x) = 1
x∈X
7
Tucker Hermans
Joint Distributions
Ø A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):
T W P
hot sun 0.4
Ø Size of distribution if n variables with
domain sizes d? hot rain 0.1
cold sun 0.2
Ø Must obey: cold rain 0.3
Ø For all but the smallest distributions, impractical to write out

8
Tucker Hermans
Probabilistic Models
Ø A probabilistic model is a joint distribution
over a set of random variables Constraint over T,W
T W P
Ø CSPs:
Ø Variables with domains hot sun T
Ø Constraints: state whether hot rain F
assignments are possible
Ø Ideally: only certain variables directly cold sun F
interact cold rain T
Ø Probabilistic models: Distribution over T,W

Ø (Random) variables with domains T W P
Ø Assignments are called outcomes
hot sun 0.4
Ø Joint distributions: say whether
assignments (outcomes) are likely hot rain 0.1
Ø Normalized: sum to 1.0 cold sun 0.2
Ø Ideally: only certain variables directly
interact cold rain 0.3
9
Tucker Hermans
Events
Ø An event is a set E of outcomes
T W P
hot sun 0.4
hot rain 0.1
Ø From a joint distribution, we can
calculate the probability of any event cold sun 0.2
cold rain 0.3
Ø Probability that it’s hot AND sunny?
Ø Probability that it’s hot?
Ø Probability that it’s hot OR sunny?
Ø Typically, the events we care about are

partial assignments, like P(T=hot)
10
Tucker Hermans
Quiz: Events
§ P(+x, +y) ?
X Y P
+x +y 0.2
+x -y 0.3
§ P(+x) ?
-x +y 0.4
-x -y 0.1
§ P(-y OR +x) ?
11
Tucker Hermans
Marginal Distributions
Ø Marginal distributions are sub-tables which eliminate variables
Ø Marginalization (summing out): Combine collapsed rows by
adding
T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
12
Tucker Hermans
Quiz: Marginal Distributions
X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4 Y P
-x -y 0.1 +y
-y
13
Tucker Hermans
Conditional Probabilities
Ø Conditional or posterior probabilities:
Ø E.g., P(cavity | toothache) = 0.8
Ø Given that toothache is all I know…
Ø Notation for conditional distributions:

Ø P(cavity | toothache) = a single number
Ø P(Cavity, Toothache) = 2x2 table summing to 1
Ø P(Cavity | Toothache) = Two 2-element vectors, each summing to 1
Ø If we know more:
Ø P(cavity | toothache, catch) = 0.9
Ø P(cavity | toothache, cavity) = 1
Ø Note: the less specific belief remains valid after more evidence arrives, but
is not always useful
Ø New evidence may be irrelevant, allowing simplification:

Ø P(cavity | toothache, traffic) = P(cavity | toothache) = 0.8
Ø This kind of inference, guided by domain knowledge, is crucial

14
Tucker Hermans
Example Problems
Ø Suppose a murder occurs in a town of population
10,000 (10,001 before the murder). A suspect is
brought in and DNA tested. The probability that
there is a DNA match given that a person is innocent
is 1/100,000; the probability of a match on a guilty
person is 1. What is the probability he is guilty
given a DNA match?
Ø Doctors have found that people with Kreuzfeld-

Jacob disease (KJ) almost invariably ate lots of
hamburgers, thus p(HamburgerEater|KJ) = 0.9. KJ is
a rare disease: about 1 in 100,000 people get it.
Eating hamburgers is widespread:
p(HamburgerEater) = 0.5. What is the probability
that a regular hamburger eater will have KJ disease?
15
Tucker Hermans
Conditional Probabilities in Ghostbusters

Ø A ghost is in the grid somewhere
Ø Sensor readings tell us how
close a square is to the ghost
Ø On the ghost: red
Ø 1 or 2 away: orange
Ø 3 or 4 away: yellow
Ø 5+ away: green
Ø Sensors are noisy, but we know

P(Color | Distance)
P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)

0.05 0.15 0.5 0.3
16
Tucker Hermans
Conditional Probabilities
Ø A simple relation between joint and conditional probabilities
Ø In fact, this is taken as the definition of a conditional probability
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
17
Tucker Hermans
Quiz: Conditional Probabilities

§ P(+x | +y) ?
X Y P
+x +y 0.2
+x -y 0.3
§ P(-x | +y) ?
-x +y 0.4
-x -y 0.1
§ P(-y | +x) ?
18
Tucker Hermans
Conditional Distributions
Ø Conditional distributions are probability distributions over
some variables given fixed values of others
Conditional Distributions Joint Distribution
W P T W P
sun 0.8 hot sun 0.4
rain 0.2 hot rain 0.1
cold sun 0.2
cold rain 0.3
W P
sun 0.4
rain 0.6
19
Tucker Hermans
Normalization Trick
Ø A trick to get a whole conditional distribution at once:
Ø Select the joint probabilities matching the evidence
Ø Normalize the selection (make it sum to one)
T W P
hot sun 0.4 T P T P
hot rain 0.1 hot 0.1 hot 0.25
cold sun 0.2 Select cold 0.3 Normalize
cold 0.75
cold rain 0.3
Ø Why does this work? Because sum of selection is
P(evidence)! (P(r) here)
20
Tucker Hermans
Quiz: Normalization Trick
§ P(X | Y=-y) ?
NORMALIZE
X Y P the selection
SELECT the joint (make it sum to
+x +y 0.2 probabilities one)
+x -y 0.3 matching the
evidence
-x +y 0.4
-x -y 0.1
21
Tucker Hermans
Probabilistic Inference:
Ø Probabilistic inference: compute the desired probability from

other known probabilities
Ø E.g., conditional from joint
Ø We generally compute conditional probabilities

Ø P(on time | no reported accidents) = 0.9
Ø These represent the agent's belief given the evidence.
Ø Probabilities change with new evidence:

Ø P(on time | no accidents, 5 a.m.) = 0.95
Ø P(on time | no accidents, 5 a.m., raining) = 0.80
Ø Observing new evidence causes beliefs to be updated
22
Tucker Hermans
Inference by Enumeration
Ø P(sun)? S T W P
summer hot sun 0.30
summer hot rain 0.05
Ø P(sun | winter)? summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
Ø P(sun | winter, hot)? winter cold sun 0.15
winter cold rain 0.20
23
Tucker Hermans
Ø P(sun)?
S T W P
0.30+0.10+0.10+0.15= 0.65 summer hot sun 0.30
summer hot rain 0.05
Ø P(sun | winter)?
summer cold sun 0.10
P ( sun , winter )
P ( sun winter )= summer cold rain 0.05
P (winter )
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Ø P(sun | winter, hot)?
P (sun , winter , hot )

P ( sun winter , hot)=
P (winter , hot)
24
Tucker Hermans
Ø General case:
Ø Evidence variables:
Ø Query variables:
Ø Hidden variables:
All variables
Ø We want:
Ø First, select the entries consistent with the evidence

Ø Second, sum out H to get joint of Query and evidence:
Ø Finally, normalize the remaining entries to conditionalize
Ø Obvious problems: 25
Ø Worst-case time complexity O(dn)
Ø Space complexity O(dn) to store the joint distribution 25
Tucker Hermans
Inference by Enumeration Example 2:

Model for Ghostbusters
• Reminder: ghost is hidden, sensors are noisy
• T: Top sensor is red
• B: Bottom sensor is red
• G: Ghost is in the top
• Queries
• P(+g) = ??
• P(+g | +t) = ??
• P(+g | +t, -b) = ??
• Problem: joint distribution too large or complex

26
Tucker Hermans
The Product Rule
Ø Sometimes have a joint distribution but want a conditional
Ø Example:
D W P D W P
W P wet sun 0.1 wet sun 0.08
sun 0.8 dry sun 0.9 dry sun 0.72
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
27
Tucker Hermans
The Chain Rule
Ø More generally, can always write any joint distribution as an

incremental product of conditional distributions
Ø Why is this always true?
Ø Can now build a joint distribution by only specifying

conditionals!
28
Tucker Hermans
Bayes’ Rule
Ø Two ways to factor a joint distribution over two variables:
That’s my rule!
Ø Dividing, we get:
Ø Why is this at all helpful?

Ø Lets us build one conditional from its reverse
Ø Often one conditional is tricky but the other one is simple
Ø Foundation of many systems we’ll see later
Ø In the running for most important AI equation!
29
Tucker Hermans
Inference with Bayes’ Rule

Ø Example: Diagnostic probability from causal probability:
Ø Example:
Ø m is meningitis, s is stiff neck
Example
givens
Ø Note: posterior probability of meningitis still very small

Ø Note: you should still get stiff necks checked out! Why?
30
Tucker Hermans
Quiz: Bayes’ Rule
§ Given:
R P D W P
sun 0.8 wet sun 0.1
rain 0.2 dry sun 0.9
wet rain 0.7
dry rain 0.3
§ What is P(W | dry) ?
31
Tucker Hermans
Ghostbusters Revisited
Ø Let’s say we have two distributions:

Ø Prior distribution over ghost locations: P(L)
Ø Say this is uniform
Ø Sensor reading model: P(R | L)
Ø Given: we know what our sensors do
Ø R = reading color measured at (1,1)
Ø E.g., P(R = yellow | L=(1,1)) = 0.1
Ø We can calculate the posterior

distribution over ghost locations using
Bayes’ rule:
32
Tucker Hermans
Independence
Ø Two variables are independent in a joint distribution if:
P( X ,Y )= P( X )P(Y )
x , y : P( x , y)= P(x) P(y)
Ø Says the joint distribution factors into a product of two simpler distributions
Ø Usually variable aren’t independent!
Ø Equivalent definition of independence:
x , y : P( x y)= P( x)
Ø We write: x y
Ø Independence is a simplifying modeling assumption

Ø Empirical joint distributions: at best “close” to independent
Ø What could we assume for {Weather, Traffic, Cavity}?
Ø Independence is like something from CSPs: what?

33
Tucker Hermans
Example: Independence?
Ø Arbitrary joint
distributions can be
poorly modeled by T P W P
independent factors warm 0.5 sun 0.6
cold 0.5 rain 0.4
T W P T S P
hot sun 0.4 warm sun 0.3
hot rain 0.1 warm rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
34
Tucker Hermans
Example: Independence
Ø N fair, independent coin flips:
H 0.5 H 0.5 H 0.5

T 0.5 T 0.5 T 0.5
35
Tucker Hermans
Conditional Independence
Ø P(Toothache, Cavity, Catch)
Ø If I have a cavity, the probability that the probe catches in it

doesn't depend on whether I have a toothache:
Ø P(catch | toothache, cavity) = P(catch | cavity)
Ø The same independence holds if I don’t have a cavity:

Ø P(catch | toothache, ¬cavity) = P(catch| ¬cavity)
Ø Catch is conditionally independent of Toothache given Cavity:

Ø P(Catch | Toothache, Cavity) = P(Catch | Cavity)
Ø Equivalent statements:
Ø P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
Ø P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
36
Tucker Hermans
Conditional Independence
Ø Unconditional (absolute) independence is very rare (why?)
Ø Conditional independence is our most basic and robust form of

knowledge about uncertain environments:
P(X,Y | Z ) = P(X | Z )P(Y | Z )

X ⊥Y |Z
P(X | Z,Y ) = P(X | Z )
Ø What about this domain:
Ø Traffic
Ø Umbrella
Ø Raining
Ø What about fire, smoke, alarm?
37
Tucker Hermans
The Chain Rule Revisited
Trivial decomposition:
With conditional independence:
Representation size: 1 + 2 + 4 versus 1 + 2 + 2
38
Tucker Hermans
Ghostbusters Chain Rule

Ø Each sensor depends only on where the
ghost is
Ø That means the two sensors are

conditionally independent, given the
ghost position
Ø T: Top sensor is red

Ø B: Bottom sensor is red
Ø G: Ghost is in the top
Ø Givens:
Ø P(+g) = 0.5
Ø P(+t | +g) = 0.8
Ø P(+t | ¬g) = 0.4 P(T , B ,G)= P(G) P(T G) P(B G)
Ø P(+b | +t) = 0.4
Ø P(+b | ¬g) = 0.8
39
Tucker Hermans
Expectations
Ø The expected value of a function is its average output, weighted
by a given distribution over inputs
Ø Example: How late if I leave 60 min before my flight?

Ø Lateness is a function of traffic:
L(none) = -10, L(light) = -5, L(heavy) = 15
Ø What is my expected lateness?
Ø Need to specify some belief over T to weight the outcomes
Ø Say P(T) = {none: 2/5, light: 2/5, heavy: 1/5}
Ø The expected lateness:
40
Tucker Hermans
Expectations
Ø Real valued functions of random variables:
Ø Expectation of a function of a random variable
Ø Example: Expected value of a fair die roll
X P f
1 1/6 1
2 1/6 2
3 1/6 3
4 1/6 4
5 1/6 5
6 1/6 6 41
Tucker Hermans
The Laws of Probability are Not to be Trifled With

Ø There have been attempts to do
different methodologies for
uncertainty
Ø Fuzzy logic
Ø Three-valued logic
Ø Dempster-Shafer
Ø Non-monotonic reasoning
Ø But the laws of probability are

the only system with this property:
Ø If you gamble using them, you can't be
unfairly exploited by an opponent
using some other system
Ø Otherwise, an opponent can devise a
strategy where you lose all your money
42

08 Probability

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

08 Probability

Загружено:

Авторское право:

Доступные форматы

Tucker Hermans

Probability Crash Course

Ø Material needed for the rest of the course

The New AI: Probability and Statistics

Ø A random variable is some aspect of the world about which

Ø We denote random variables with capital letters

Ø Like in a CSP, each random variable has a domain

Ø Probabilistic reasoning gives us a

Ø A probability (lower case value) is a single number

Ø For all but the smallest distributions, impractical to write out

Ø Probabilistic models: Distribution over T,W

Ø Probability that it’s hot?

Ø Probability that it’s hot OR sunny?

Ø Typically, the events we care about are

Quiz: Marginal Distributions

Ø Notation for conditional distributions:

Ø New evidence may be irrelevant, allowing simplification:

Ø This kind of inference, guided by domain knowledge, is crucial

Ø Doctors have found that people with Kreuzfeld-

Conditional Probabilities in Ghostbusters

Ø Sensors are noisy, but we know

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)

Quiz: Conditional Probabilities

Conditional Distributions Joint Distribution

Quiz: Normalization Trick

Ø Probabilistic inference: compute the desired probability from

Ø We generally compute conditional probabilities

Ø Probabilities change with new evidence:

P (sun , winter , hot )

Ø First, select the entries consistent with the evidence

Ø Finally, normalize the remaining entries to conditionalize

Inference by Enumeration Example 2:

• Problem: joint distribution too large or complex

The Product Rule

Ø Sometimes have a joint distribution but want a conditional

The Chain Rule

Ø More generally, can always write any joint distribution as an

Ø Why is this always true?

Ø Can now build a joint distribution by only specifying

Ø Why is this at all helpful?

Ø In the running for most important AI equation!

Inference with Bayes’ Rule

Ø Note: posterior probability of meningitis still very small

Quiz: Bayes’ Rule

rain 0.2 dry sun 0.9

wet rain 0.7

dry rain 0.3

§ What is P(W | dry) ?

Ø Let’s say we have two distributions:

Ø We can calculate the posterior

Ø Independence is a simplifying modeling assumption

Ø Independence is like something from CSPs: what?

Ø N fair, independent coin flips:

H 0.5 H 0.5 H 0.5

Ø If I have a cavity, the probability that the probe catches in it

Ø The same independence holds if I don’t have a cavity:

Ø Catch is conditionally independent of Toothache given Cavity:

Ø Unconditional (absolute) independence is very rare (why?)

Ø Conditional independence is our most basic and robust form of

P(X,Y | Z ) = P(X | Z )P(Y | Z )

Ø What about fire, smoke, alarm?

The Chain Rule Revisited

With conditional independence: