Probability

18.05 Spring 2005 Lecture Notes 18.
05 Lecture 1 February 2, 2005
Required Textbook - DeGroot & Schervish, Probability and Statistics, Third Edition Recommended Introduction to Probability Text - Feller, Vol. 1
1.2-1.4. Probability, Set Operations. What is probability? Classical Interpretation: all outcomes have equal probability (coin, dice) Subjective Interpretation (nature of problem): uses a model, randomness involved (such as weather) ex. drop of paint falls into a glass of water, model can describe P(hit bottom before sides) or, P(survival after surgery)- subjective, estimated by the doctor. Frequency Interpretation: probability based on history P(make a free shot) is based on history of shots made. Experiment has a random outcome. 1. Sample Space - set of all possible outcomes. coin: S={H, T}, die: S={1, 2, 3, 4, 5, 6} two dice: S={(i, j), i, j=1, 2, ..., 6} 2. Events - any subset of sample space ex. A S, A - collection of all events. 3. Probability Distribution - P: A [0, 1] Event A S, P(A) or Pr(A) - probability of A Properties of Probability: 1. 0 P(A) 1 2. P(S ) = 1 3. For disjoint (mutually exclusive) events A, B (denition A B = ) P(A or B) = P(A) + P(B) - this can be written for any number of events. For a sequence of events A1 , ..., An , ... all disjoint (Ai Aj = , i = j): P(

Ai ) =
i=1
which is called countably additive.

If continuous, cant talk about P(outcome), need to consider P(set)
Example: S = [0, 1], 0 < a < b < 1.
P([a, b]) = b a, P(a) = P(b) = 0.
i=1
P(Ai )
Need to group outcomes, not sum up individual points since they all have P = 0.
1.3 Events, Set Operations
Union of Sets: A B = {s S : s A or s B }
Intersection: A B = AB = {s S : s A and s B }
Complement: Ac = {s S : s / A}
Set Dierence: A \ B = A B = {s S : s A and s / B} = A B c
Symmetric Dierence: (A B c ) (B Ac ) Summary of Set Operations: 1. Union of Sets: A B = {s S : s A or s B } 2. Intersection: A B = AB = {s S : s A and s B } 3. Complement: Ac = {s S : s / A} 4. Set Dierence: A \ B = A B = {s S : s A and s / B} = A B c 5. Symmetric Dierence:
/ B ) or (s B and s /
A)} = AB = {s S : (s A and s (A B c ) (B Ac ) Properties of Set Operations: 1. A B = B A 2. (A B ) C = A (B C ) Note that 1. and 2. are also valid for intersections. 3. For mixed operations, associativity matters:
(A B ) C = (A C ) (B C )
think of union as addition and intersection as multiplication: (A+B)C = AC + BC
4. (A B )c = Ac B c - Can be proven by diagram below:
Both diagrams give the same shaded area of intersection. 5. (A B )c = Ac B c - Prove by looking at a particular point: s (A B )c = s / (A B ) s / A or s / B = s Ac or s B c s (Ac B c ) QED ** End of Lecture 1
18.05 Lecture 2 February 4, 2005
1.5 Properties of Probability. 1. P(A) [0, 1] 2. P(S ) = 1 3. P(Ai ) = P (Ai ) if disjoint Ai Aj = , i = j The probability of a union of disjoint events is the sum of their probabilities. 4. P(), P(S ) = P(S ) = P(S ) + P() = 1
where S and are disjoint by denition, P(S) = 1 by #2., therefore, P() = 0.
5. P(Ac ) = 1 P(A)
because A, Ac are disjoint, P(A Ac ) = P(S ) = 1 = P(A) + P(Ac )
the sum of the probabilities of an event and its complement is 1. 6. If A B, P(A) P(B )
by denition, B = A (B \ A), two disjoint sets.
P(B ) = P(A) + P(B \ A) P(A)
7. P(A B ) = P(A) + P(B ) P(AB )
must subtract out intersection because it would be counted twice, as shown:
write in terms of disjoint pieces to prove it:

P(A) = P(A \ B ) + P(AB )
P(B ) = P(B \ A) + P(AB )
P(A B ) = P(A \ B ) + P(B \ A) + P(AB )
Example: A doctor knows that P(bacterial infection) = 0.7 and P(viral infection) = 0.4
What is P(both) if P(bacterial viral) = 1?
P(both) = P(B V)
1 = 0.7 + 0.4 - P(BV)
P(BV) = 0.1
Finite Sample Spaces There are a nite # of outcomes S = {s1 , ..., sn } Dene pi = P(si ) as the probability function.
pi 0,
n i=1
pi = 1 P(s)
P(A) =
sA
Classical, simple sample spaces - all outcomes have equal probabilities. A) P(A) = #( #(S ) , by counting methods. Multiplication rule: #(s1 ) = m, #(s2 ) = n, #(s1 s2 ) = mn Sampling without replacement: one at a time, order is important
s1 ...sn outcomes
k n (k chosen from n)
#(outcome vectors) = (a1 , a2 , ..., ak ) = n(n 1) ... (n k + 1) = Pn,k
Example: order the numbers 1, 2, and 3 in groups of 2. (1, 2) and (2, 1) are dierent.
P3,2 = 3 2 = 6
Pn,n = n(n 1) ... 1 = n!
Pn,k = n! (n k )!
Example: Order 6 books on a shelf = 6! permutations.
Sampling with replacement, k out of n

number of possibilities = n n n... = nk
Example: Birthday Problem- In a group of k people,
what is the probability that 2 people will have the same birthday?
Assume n = 365 and that birthdays are equally distributed throughout the year, no twins, etc.
# of dierent combinations of birthdays= #(S = all possibilities) = 365k
# where at least 2 are the same = #(S ) #(all are dierent) = 365k P365,k
P(at least 2 have the same birthday) = 1 P365,k 365k
each set can be ordered k! ways, so divide that out of Pn,k Cn,k - binomial coecients Binomial Theorem: (x + y )n =
n n
Sampling without replacement, k at once s1 ...sn sample a subset of size k, b1 ...bk , if we arent concerned with order. n n! number of subsets = Cn,k = = k k !(n k )!
k=0
xk y nk
There are
Example: a - red balls, b - black balls.

number of distinguishable ways to order in a row =
a+b a+b = a b Example: r1 + ... + rk = n; ri = number of balls in each box; n, k given
How many ways to split n objects into k sets?
Visualize the balls in boxes, in a line - as shown:
n
k
times that each term will show up in the expansion.
Fix the outer walls, rearrange the balls and the separators. If you x the outer walls of the rst and last boxes,
you can rearrange the separators and the balls using the binomial theorem.
There are n balls and k-1 separators (k boxes).
Number of dierent ways to arrange the balls and separators =
n+k1 n+k1 = n k1 Example: f (x1 , x2 , ..., xk ), take n partial derivatives: nf 2 x1 x2 5 x3 ...xk k boxes k coordinates
n balls n partial derivatives
k1 n+k1
number of dierent partial derivatives = n+n = k 1
Example: In a deck of 52 cards, 5 cards are chosen.

What is the probability that all 5 cards have dierent face values?
total number of outcomes = 52 5 total number of face value combinations = 13 5 total number of suit possibilities, with replacement = 45 13 5 5 4 P(all 5 dierent face values) = 52
5
** End of Lecture 2.

n! Pn,k = (n k)! - choose k out of n, order counts, without replacement. k n - choose k out of n, order counts, with replacement. n! Cn,k = k!(n k)! - choose k out of n, order doesnt count, without replacement.
1.9 Multinomial Coecients These values are used to split objects into groups of various sizes.
s1 , s2 , ..., sn - n elements such that n1 in group 1, n2 in group 2, ..., nk in group k.
n1 + ... + nk = n
n n n1 n n1 n2 n n1 ... nk2 nk ... n2 n3 n k 1 nk n1 = (n n1 )! (n n1 n2 )! (n n1 ... nk2 )! n! 1 ... n1 !(n n1 )! n2 !(n n1 n2 )! n3 !(n n1 n2 n3 )! nk1 !(n n1 ... nk1 )! = n! = n1 !n2 !...nk1 !nk ! n n1 , n2 , ..., nk
These combinations are called multinomial coecients.
Further explanation: You have n spots in which you have n! ways to place your elements.
However, you can permute the elements within a particular group and the splitting is still the same.
You must therefore divide out these internal permutations.
This is a distinguishable permutations situation.
Example #1 - 20 members of a club need to be split into 3 committees (A, B, C) of 8, 8, and 4 people,
respectively. How many ways are there to split the club into these committees?
20 20! ways to split = = 8!8!4! 8, 8, 4 Example #2 - When rolling 12 dice, what is the probability that 6 pairs are thrown?
This can be thought of as each number appears twice
There are 612 possibilities for the dice throws, as each of the 12 dice has 6 possible values.
In pairs, the only freedom is where the dice show up.
12! 12! 12 = P= = 0.0034 (2!)6 612 2, 2, 2, 2, 2, 2 (2!)6
Example #3 - Playing Bridge

Players A, B, C, and D each get 13 cards.
P(A 6s, B 4s, C 2s, D 1) =?
13 39 (choose s)(choose other cards) 6,4,2,1 7,9,11,12 = P= = 0.00196 52 (ways to arrange all cards) 13,13,13,13 Note - If it didnt matter who got the cards, multiply by 4! to arrange people around the hands. Alternate way to solve - just track the locations of the s 13131313 P=
6 4 2 52 13 1
Probabilities of Unions of Events:
P(A B ) = P(A) + P(B ) P(AB )
P(A B C ) = P(A) + P(B ) + P(C ) P(AB ) P(BC ) P(AC ) + P(ABC ) 1.10 - Calculating a Union of Events - P(union of events)
P(A B ) = P(A) + P(B ) P(AB ) (Figure 1)
P(A B C ) = P(A) + P(B ) + P(C ) P(AB ) P(BC ) P(AC ) + P(ABC ) (Figure 2)
Theorem:
P(
i=1
Ai ) =
i n
P(Ai )
i<j
P(Ai Aj ) +
i<j<k
P(Ai Aj Ak ) ... + (1)n+1 P(Ai ...An )
Express each disjoint piece, then add them up according to what sets each piece
belongs or doesnt belong to.
A1 ... An can be split into a disjoint partition of sets:
c Ai1 Ai2 ... Aik Ac i(k+1) ... Ain
where k = last set the piece is a part of. P(

n
Ai ) =
i=1
P(disjoint partition)
To check if the theorem is correct, see how many times each partition is counted.
P (A1 ), P(A2 ), ..., P ( Ak ) - k times
k P ( A A ) i j i<j 2 times
(needs to contain Ai and Aj in k dierent intersections.) Example: Consider the piece A B C c , as shown:
This piece is counted: P(A B C ) = once. P(A) + P(B ) + P(C ) = counted twice.
P(AB ) P(AC ) P(BC ) = subtracted once.
+P(ABC ) = counted zero times.
The sum: 2 - 1 + 0 = 1, piece only counted once.
Example: Consider the piece A1 A2 A3 Ac 4 k = 3, n = 4.
P(A1 ) + P(A2 ) + P(A3 ) + P(A4 ) = counted k times (3 times).
P(A1 A2 ) P(A1 A3 ) P(A1 A4 ) P(A2 A3 ) P(A2 A4 ) P(A3 A4 ) = counted k 2
times (3 times).
k as follows: i<j<k = counted 3 times (1 time). k k k+1 k total in general: k k 2 + 3 4 + ... + (1) k = sum of times counted. To simplify, this is a binomial situation.
0 = (1 1) =
k k i=0
(1) (1)
(ki)
k k k k = + ... 0 1 2 3
0 = 1 sum of times counted therefore, all disjoint pieces are counted once.
** End of Lecture 3
10
Union of Events P(A1 ... An ) =

i
P(Ai )
i<j
P(Ai Aj ) +
i<j<k
P(Ai Aj Ak ) + ...
It is often easier to calculate P(intersections) than P(unions)

Matching Problem: You have n letters and n envelopes, randomly stu the letters into the envelopes.
What is the probability that at least one letter will match its intended envelope?
P(A1 ... An ), Ai = {ith position will match}
1)! 1 = (n P(Ai ) = n n! (permute everyone else if just Ai is in the right place.) 2)! P(Ai Aj ) = (n (Ai and Aj are in the right place) n! k)! P(Ai1 Ai2 ...Aik ) = (n n! 1 n (n 2)! n (n 3)! n (n n)! P(A1 ... An ) = n + ... + (1)n+1 n n! n 2 n! 3 n! general term: n!(n k )! 1 n (n k )! = = k n! k !(n k )!n! k! SUM = 1
2
1 1 1 + ... + (1)n+1 2! 3! n!
3
x Recall: Taylor series for ex = 1 + x + x 2! + 3! + ... 1 1 1 for x= -1, e = 1 1 + 2 3! + ... therefore, SUM = 1 - limit of Taylor series as n When n is large, the probability converges to 1 e1 = 0.63
2.1 - Conditional Probability Given that B happened, what is the probability that A also happened? The sample space is narrowed down to the space where B has occurred:
The sample size now only includes the determination that event B happened. Denition: Conditional probability of Event A given Event B: P(A|B ) = P(AB ) P(B )
Visually, conditional probability is the area shown below: 11
It is sometimes easier to calculate intersection given conditional probability: P(AB ) = P(A|B )P(B ) Example: Roll 2 dice, sum (T) is odd. Find P(T < 8). B = {T is odd}, A = {T < 8} P(A|B ) = P(AB ) 18 1 , P(B ) = 2 = P(B ) 6 2
All possible odd T = 3, 5, 7, 9, 11.

Ways to get T = 2, 4, 6, 4, 2 - respectively.
1/3 2 12 =1 P(AB ) = 36 3 ; P(A|B ) = 1/2 = 3 Example: Roll 2 dice until sum of 7 or 8 results (T = 7 or 8)
P(A = {T = 7}), B = {T = 7 or 8}
This is the same case as if you roll once.
(AB ) P(A) 6/36 6 P(A|B ) = PP (B ) = P(B ) = (6+5)/36 = 11 Example: Treatments for a Result Relapse No Relapse disease, results A B C 18 13 22 22 25 16 after 2 years: Placebo 24 10
24 24+10
Example, considering Placebo: B = Placebo, A = Relapse. P(A|B ) = 13 Example, considering treatment B: P(A|B ) = 13+25 = 0.34
= 0.7
As stated earlier, conditional probability can be used to calculate intersections:

Example: You have r red balls and b black balls in a bin.
Draw 2 without replacement, What is P(1 = red, 2 = black)?
r What is P(2 = black) given that 1 = red ? P(1 = red) = r+ b Now, there are only r - 1 red balls and still b black balls. b b r P(2 = black|1 = red) = b+r 1 P(AB ) = b+r 1 r +b P(A1 A2 ...An ) = P(A1 ) P(A2 |A1 ) P(A3 |A2 |A1 ) ... P(An |An1 ...A2 |A1 ) = = P(A1 ) P(An An1 ...A1 ) P(A2 A1 ) P(A3 A2 A1 ) ... = P(A1 ) P(A2 A1 ) P(An1 ...A1 ) = P(An An1 ...A1 ) Example, continued: Now, nd P(r, b, b, r) 12
r b b1 r1 r+b r1+b r+b2 r+b3
Example, Casino game - Craps. Whats the probability of actually winning??

On rst roll: 7, 11 - win; 2, 3, 12 - lose; any other number (x1 ), you continue playing.
If you eventually roll 7 - lose; x1 , you win!
P(win) = P(x1 = 7 or 11) + P(x1 = 4)P(get 4 before 7|x1 = 4)+ +P(x1 = 5)P(get 5 before 7|x1 = 5) + ... = 0.493 The game is almost fair! ** End of Lecture 4
13
2.2 Independence of events. (AB ) P(A|B ) = PP (B ) ; Denition - A and B are independent if P(A|B ) = P(A) P(A|B ) = P(AB ) = P(A) P(AB ) = P(A)P(B ) P(B )
Experiments can be physically independent (roll 1 die, then roll another die),
or seem physically related and still be independent.
Example: A = {odd}, B = {1, 2, 3, 4}. Related events, but independent.
2 1 .P(B ) = 3 .AB = {1, 3} P(A) = 2 1 2 1 , therefore independent. P(AB ) = 2 3 = P(AB ) = 3 Independence does not imply that the sets do not intersect.
Disjoint = Independent. If A, B are independent, nd P(AB c )

P(AB ) = P(A)P(B )
AB c = A \ AB , as shown:
so, P(AB c ) = P(A) P(AB )

= P(A) P(A)P(B )
= P(A)(1 P(B ))
= P(A)P(B c )
therefore, A and B c are independent as well.
similarly, Ac and B c are independent. See Pset 3 for proof.
Independence allows you to nd P(intersection) through simple multiplication. 14
Example: Toss an unfair coin twice, these are independent events. P(H ) = p, 0 p 1, nd P(T H ) = tails rst, heads second P(T H ) = P(T )P(H ) = (1 p)p Since this is an unfair coin, the probability is not just 1 4 TH 1 If fair, HH +HT +T H +T T = 4 If you have several events: A1 , A2 , ...An that you need to prove independent:
It is necessary to show that any subset is independent.
Total subsets: Ai1 , Ai2 , ..., Aik , 2 k n
Prove: P(Ai1 Ai2 ...Aik ) = P(Ai1 )P(Ai2 )...P(Aik )
You could prove that any 2 events are independent, which is called pairwise independence,
but this is not sucient to prove that all events are independent.
Example of pairwise independence:
Consider a tetrahedral die, equally weighted.
Three of the faces are each colored red, blue, and green,
but the last face is multicolored, containing red, blue and green.
P(red) = 2/4 = 1/2 = P(blue) = P(green)
P(red and blue) = 1/4 = 1/2 1/2 = P(red)P(blue)
Therefore, the pair {red, blue} is independent.
The same can be proven for {red, green} and {blue, green}.
but, what about all three together?
P(red, blue, and green) = 1/4 = P(red)P(blue)P(green) = 1/8, not fully independent.
Example: P(H ) = p, P(T ) = 1 p for unfair coin
Toss the coin 5 times P(HTHTT)
= P(H )P(T )P(H )P(T )P(T )
= p(1 p)p(1 p)(1 p) = p2 (1 p)3
Example: Find P(get 2H and 3T, in any order)
= sum of probabilities for ordering
= P(HHT T T ) + P(HT HT T ) = ...
2 =p (1 p)3 + p2 (1 p)3 + ...
5 = 2 p2 (1 p)3
General Example: Throw a coin n times, P(k heads out of n throws)

n k = p (1 p)nk k
Example: Toss a coin until the result is heads; there are n tosses before H results.
P(number of tosses = n) =?
needs to result as TTT....TH, number of Ts = (n - 1)
P(tosses = n) = P(T T...H ) = (1 p)n1 p
Example: In a criminal case, witnesses give a specic description of the couple seen eeing the scene.
P(random couple meets description) = 8.3 108 = p
We know at the beginning that 1 couple exists. Perhaps a better question to be asked is:
Given a couple exists, what is the probability that another couple ts the same description?
P(2 couples exists)
A = P(at least 1 couple), B = P(at least 2 couples), nd P(B |A)
(BA) P(B ) P(B |A) = PP (A) = P(A) 15
n Out of n couples, P(A) = P(at least 1 couple) = 1 P(no couples) = 1 i=1 (1 p)

*Each* couple doesnt satisfy the description, if no couples exist.
Use independence property, and multiply.
P(A) = 1 (1 p)n
P(B ) = P(at least two) = 1 P(0 couples) P(exactly 1 couple)
= 1 (1 p)n n p(1 p)n1 , keep in mind that P(exactly 1) falls into P(k out of n)
P(B |A) = 1 (1 p)n np(1 p)n1 1 (1 p)n
If n = 8 million people, P(B |A) = 0.2966, which is within reasonable doubt! P(2 couples) < P(1 couple), but given that 1 couple exists, the probability that 2 exist is not insignicant.
In the large sample space, the probability that B occurs when we know that A occured is signicant! 2.3 Bayess Theorem It is sometimes useful to separate a sample space S into a set of disjoint partitions:
B1 , ..., Bk - a partition of sample space S. k Bi Bj = , for i = j, S = i=1 Bi (disjoint) k k Total probability: P(A) = i=1 P(ABi ) = i =1 P(A|Bi )P(Bi ) k (all ABi are disjoint, i=1 ABi = A)
** End of Lecture 5
16
Solutions to Problem Set #1 1-1 pg. 12 #9 Bn = i=n Ai , Cn = i=n Ai a) Bn Bn +1 ... Bn = An ( i=n+1 Ai ) = An Bn+1 s Bn+1 s Bn+1 An = Bn Cn Cn+1 ... Cn = An Cn+1 s Cn = n Cn+1 s Cn+1 A b) s n=1 Bn s Bn for all n s i=1 Ai for all n s some Ai for i n, for all n s innitely many events Ai Ai happen innitely often.
c) s n=1 Cn s some Cn = i=n Ai for some n, s all Ai for i n
s all events starting at n. 1-2 pg. 18 #4 P (at least 1 fails) = 1 P (neither fail) = 1 0.4 = 0.6 1-3 pg. 18 #12 A1 , A2 , ... c c B1 = A1 , B2 = Ac 1 A2 , ..., Bn = A1 ...An1 An n n P ( i=1 Ai ) = i=1 P (B i ) splits the union into disjoint events, and covers the entire space. n follows from: n A = i i=1 i=1 Bi n take point (s) in i=1
Ai , s at least one s A1 = B1 , c
if not, s Ac 1 , if s A2 , then s A1 A2 = B2 , if not... etc. at some point, the point belongs to a set. c The sequence stops when s Ac Ac 1 2 ... Ak1 Ak = Bk n n n s i=1 Bi .P ( i=1 Ai ) = P ( i=1 Bi ) n = i=1 P (Bi ) if Bi s are disjoint. Should also prove that the point in Bi belongs in Ai . Need to prove Bi s disjoint - by construction: c Bi , Bj B i = A c i ... Ai1 Ai

c c Bj = A1 ... Ai ... Ac j 1 A j s B i s A i , s Bj s / Ai . implies that s = s 1-4 pg. 27 #5 #(S ) = 6 6 6 6 = 64 #(all dierent) = 6 5 4 3 = P6,4 P6,4 5 P (all dierent) = 6 = 18 4 1-5 pg. 27 #7
12 balls in 20 boxes.
P(no box receives > 1 ball, each box will have 0 or 1 balls)
also means that all balls fall into dierent boxes.
#(S ) = 2012
#(all dierent) = 20 19... 9 = P20,12
17
P (...) =
P20,12 2012
1-6 pg. 27 #10

100 balls, r red balls.
Ai = {draw red at step i}
think of arranging the balls in 100 spots in a row.
r a) P (A1 ) = 100 b) P (A50 )
sample space = sequences of length 50.
#(S ) = 100 99 ... 50 = P100,50
#(A50 ) = r P99,49 red on 50. There are 99 balls left, r choices to put red on 50.
r P (A50 ) = 100 , same as part a.
c) As shown in part b, the particular draw doesnt matter, probability is the same.
r P (A100 ) = 100 1-7 pg. 34 #6
Seat n people in n spots.
#(S ) = n!
#(AB sit together) =?
visualize n seats, you have n-1 choices for the pair.
2(n-1) ways to seat the pair, because you can switch the two people.
but, need to account for the (n-2) people remaining!
#(AB ) = 2(n 1)(n 2)!
1)! 2 =n therefore, P = 2(nn ! or, think of the pair as 1 entity. There are (n-1) entities, permute them, multiply by 2 to swap the pair. 1-8 pg. 34 #11 Out of 100, choose 12. #(S ) = 100 98 12 #(AB are on committee) = 10 , choose 10 from the 98 remaining. (98 10) P = 100 ( 12 ) 1-9 pg. 34 #16 50 states 2 senators each. a) Select 8 , #(S ) = 100 8 298 2 #(state 1 or state 2) = 98 6 2 + 1 7
1-10 pg. 34 #17

In the sample 13
of the aces in the hands.
space, only consider the positions , #(all go to 1 player) = 4 #(S ) = 52 4 4 (13 4) P = 4 52 (4) 1-11 r balls, n boxes, no box is empty.
rst of all, put 1 ball in each box from the beginning.
r-n balls remain to be distributed in n boxes.
50 b) #(one senator from each state) = 2 100 select group of 50 = 50
or, calculate: 1 P(neither chosen) = 1
(98 8) (100 8 )
18
n + (r n) 1 rn
r1 rn
1-12 30 people, 12 months.

P(6 months with 3 birthdays, 6 months with 2 birthdays)
#(S ) = 1230
Need to choose the 6 months with 3 or 2 birthdays, then the multinomial coecient:
12 30 #(possibilities) = 6 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2 ** End of Lecture 6
19
Bayes Formula.
Partition B1 , ..., Bk k B = S, Bi Bj = for i = j i i=1 k P(A) = k i=1 P(ABi ) = i=1 P(A|Bi )P(Bi ) - total probability.
Example: In box 1, there are 60 short bolts and 40 long bolts. In box 2,
there are 10 short bolts and 20 long bolts. Take a box at random, and pick a bolt.
What is the probability that you chose a short bolt?
B1 = choose Box 1.
B2 = choose Box 2.
60 1 1 P(short) = P(short|B1 )P(B1 ) + P(short|B2 )P(B2 ) = 100 ( 2 ) + 10 30 ( 2 )
Example:
Partitions: B1 , B2 , ...Bk and you know the distribution.
Events: A, A, ..., A and you know the P(A) for each Bi
If you know that A happened, what is the probability that it came from a particular B i ?
P(Bi |A) = P(Bi A) P(A|Bi )P(Bi ) = : Bayess Formula P(A) P(A|B1 )P(B1 ) + ... + P(A|Bk )P(Bk )
Example: Medical detection test, 90% accurate.

Partition - you have the disease (B1 ), you dont have the disease (B2 )
The accuracy means, in terms of probability: P(positive|B1 ) = 0.9, P(positive|B2 ) = 0.1
In the general public, the chance of getting the disease is 1 in 10,000.
In terms of probability: P(B1 ) = 0.0001, P(B2) = 0.9999
If the result comes up positive, what is the probability that you actually have the disease? P(B 1 |positive)?
P(B1 |positive) = P(positive|B1 )P(B1 ) P(positive|B1 )P(B1 ) + P(positive|B2 )P(B2 )
(0.9)(0.0001) = 0.0009 (0.9)(0.0001) + (0.1)(0.9999)
The probability is still very small that you actually have the disease.
20
Example: Identify the source of a defective item.

There are 3 machines: M1 , M2 , M3 . P(defective): 0.01, 0.02, 0.03, respectively.
The percent of items made that come from each machine is: 20%, 30%, and 50%, respectively.
Probability that the item comes from a machine: P (M1 ) = 0.2, P (M2 ) = 0.3, P (M3 ) = 0.5
Probability that a machines item is defective: P (D|M1 ) = 0.01, P (D|M2 ) = 0.02, P (D|M3 ) = 0.03
Probability that it came from Machine 1: P (M1 |D) = P (D|M1 )P (M1 ) P (D|M1 )P (M1 ) + P (D|M2 )P (M2 ) + P (D|M3 )P (M |3) (0.01)(0.2) = 0.087 (0.01)(0.2) + (0.02)(0.3) + (0.03)(0.5)
Example: A gene has 2 alleles: A, a. The gene exhibits itself through a trait with two versions.
The possible phenotypes are dominant, with genotypes AA or Aa, and recessive, with genotype aa.
Alleles travel independently, derived from a parents genotype.
In a population, the probability of having a particular allele: P(A) = 0.5, P(a) = 0.5
Therefore, the probabilities of the genotypes are: P(AA) = 0.25, P(Aa) = 0.5, P(aa) = 0.25
Partitions: genotypes of parents: (AA, AA), (AA, Aa), (AA, aa), (Aa, Aa), (Aa, aa), (aa, aa).
Assume pairs match regardless of genotype.
Parent genotypes
(AA, AA)
(AA, Aa)
(AA, aa)
(Aa, Aa)
(Aa, aa)
(aa, aa)
Probabilities
1 2 (1 4 )( 2 ) = 1 1 2 ( 4 )( 4 ) = 1 1 (1 2 )( 2 ) = 4 1 2 (1 2 )( 4 ) = 1 16 1 16 1 4 1 8 1 4
Probability that child has dominant phenotype 1 1 1

3 4 1 2
If you see that a person has dark hair, predict the genotypes of the parents: P ((AA, AA)|A) =
1 16 (1) 1 4 (1) 1 8 (1) 1 1 16 1 1 3 + 4(4) + 1 4(2) + 1 16 (0)
1 12
You can do the same computation to nd the probabilities of each type of couple. Bayess formula gives a prediction inside the parents that you arent able to directly see. Example: You have 1 machine.
In good condition: defective items only produced 1% of the time. P(in good condition) = 90%
In broken condition: defective items produced 40% of the time. P(broken) = 10%
Sample 6 items, and nd that 2 are defective. Is the machine broken?
This is very similar to the medical example worked earlier in lecture:
P(good|2 out of 6 are defective) =
= P (2of 6|good)P (good) P (2of 6|good)P (good) + P (2of 6|broken)P (broken) 6 2 4 2 (0.01) (0.99) (0.9) 6 = 6 = 0.04 2 4 2 4 2 (0.01) (0.99) (0.9) + 2 (0.4) (0.6) (0.1)
** End of Lecture 7
21
3.1 - Random Variables and Distributions Transforms the outcome of an experiment into a number.
Denitions:
Probability Space: (S, A, P)
S - sample space, A - events, P - probability
Random variable is a function on S with values in real numbers, X:S R
Examples:
Toss a coin 10 times, Sample Space = {HTH...HT, ....}, all congurations of H & T.
Random Variable X = number of heads, X: S R
X: S {0, 1, ..., 10} for this example.
There are fewer outcomes than in S, you need to give the distribution of the
random variable in order to get the entire picture. Probabilities are therefore given.
Denition: The distribution of a random variable X:S R, is dened by: A R, P(A) = P(X A) = P(s S : X (s) A)
The random variable maps outcomes and probabilities to real numbers.

This simplies the problem, as you only need to dene the mapped R, P, not the original S, P.
The mapped variables describe X, so you dont need to consider the original
complicated probability space.
1 k 1 10k 10 1 From the example, P(X = #(heads in 10 tosses) = k ) = 10 = k 210 k (2) (2) Note: need to distribute the heads among the tosses,
account for probability of both heads and tails tossed.
This is a specic example of the more general binomial problem:
A random variable X {1, ..., n}
n k P(X = k ) = p (1 p)nk k This distribution is called the binomial distribution: B(n, p), which is an example of a discrete distribution. Discrete Distribution A random variable X is called discrete if it takes a nite or countable number (sequence) of values: X {s1 , s2 , s3 , ...} It is completely described by telling the probability of each outcome. Distribution dened by: P(X = sk ) = f (sk ), the probability function (p.f.) p.f. cannot be negative and should sum to 1 over all outcomes. P(X A) = sk A f (sk )
Example: Uniform distribution of a nite number of values {1, 2, 3, ..., n} each outcome 22
1 : uniform probability function. has equal probability f (sk ) = n random variable X R, P(A) = P(X A), A R
can redene probability space on random variable distribution:
(R, A, P) - sample space, X: R R, X (x) = x (identity map)
P(A) = P(X : X (x) A) = P(x A) = P(x A) = P(A)
all you need is the outcomes mapped to real numbers and relative probabilities
of the mapped outcomes.
Example: Poisson Distribution, {0, 1, 2, 3, ...} (), = intensity

probability function:
f (k ) = P(X = k ) =
k k0 k! e
k e , where parameter > 0. k!
k = e k 0 e = e0 = 1 k! = e Very common distribution, will be used later in statistics.

Represents a variety of situations - ex. distribution of typos in a book on a particular page,
number of stars in a random spot in the sky, etc.
Good approximation for real world problems, as P > 10 is small.
Continuous Distribution Need to consider intervals not points.
Probability distribution function (p.d.f.): f (x) 0.
Summation replaced by integral: f (x)dx = 1
then, P(A) = A f (x)dx, as shown:
Example: In a uniform distribution [a, b], denoted U[a, b]: 1 p.d.f.: f (x) = b / [a, b] a , for x [a, b]; 0, for x Example: On an interval [a, b], such that a < c < d < b, d 1 dc P([c, d]) = c b a dx = ba (probability on a subinterval) Example: Exponential Distribution
If you were to choose a random point on an interval, the probability of choosing

a particular point is equal to zero.
You cant assign positive probability to any point, as it would add up innitely on a continuous interval.
It is necessary to take P(point is in a particular sub-interval).
a Denition implies that P({a}) = a f (x)dx = 0
E (), > 0 parameter p.d.f.: f (x) = ex , if x 0; 0, if x < 0 Check that it integrates to 1:
23
1 x ex dx = ( e |0 = 1 Real world: Exponential distribution describes the life span of quality products (electronics). 0
** End of Lecture 8
24
Cumulative distribution function (c.d.f):

F (x) = P(X x), x R
Properties:
1. x1 x2 , {X x1 } {X x2 } P(X x1 ) P(X x2 ) non-decreasing function. 2. limx F (x) = P(X ) = 0, limx F (x) = P(X ) = 1. A random variable only takes real numbers, as x , set becomes empty.
1 , P(X = 1) = Example: P(X = 0) = 2 1 2
Discrete Random Variable: - dened by probability function (p.f.) {s1 , s2 , ...}, f (si ) = P(X = si ) Continuous: probability distribution function (p.d.f.) - also called density function. f (x) 0, f (x)dx, P(X A) = A f (x)dx
P(X x < 0) = 0 1 1 P(X 0) = P(X = 0) = 2 , P(X x) = P(X = 0) = 2 , x [0, 1) P(X x) = P(X = 0 or 1) = 1, x [1, ) 3. right continuous: limyx+ F (y ) = F (x) F (y ) = P(X y ), event {X y }

n=1
{X yn } = {X x}, F (yn ) P(X x) = F (x)
Probability of random variable occuring within interval: P(x1 < X < x2 ) = P({X x2 }\{X x1 }) = P(X x2 ) P(X x1 ) = F (x2 ) F (x1 )
25
{X x2 } {X x1 } Probability of a point x, P(X = x) = F (x) F (x ) where F (x ) = limxx F (x), F (x+ ) = limxx+ F (x)
If continuous, probability at a point is equal to 0, unless there is a jump,
where the probability is the value of the jump.

P(x1 X x2 ) = F (x2 ) F (x 1 ) P(A) = P(X A)
X - random variable with distribution P
When observing a c.d.f:
Discrete: sum of probabilities at all the jumps = 1. Graph is horizontal in between the jumps, meaning that probability = 0 in those intervals.
x Continuous: F (x) = P(X x) = f (x)dx eventually, the graph approaches 1.
26
If f continuous, f (x) = F (x) Quantile: p [0, 1], p-quantile = inf {x : F (x) = P(X x) p} nd the smallest point such that the probability up to the point is at least p.
The area underneath F(x) up to this point x is equal to p.
If the 0.25 quantile is at x = 0, P(X 0) 0.25
Note that if disjoint, the 0.25 quantile is at x = 0, but so is the 0.3, 0.4...all the way up to 0.5. What if you have 2 random variables? multiple?
ex. take a person, measure weight and height. Separate behavior tells you nothing
about the pairing, need to describe the joint distribution.
Consider a pair of random variables (X, Y)
Joint distribution of (X, Y): P((X, Y ) A)
Event, set A R2
27
2 1 2 Discrete distribution: {(s1 1 , s1 ), (s2 , s2 ), ...} (X, Y ) 1 2 2 Joint p.f.: f (si , si ) = P((X, Y ) = (s1 i , s1 )) 1 2 = P(X = si , Y = si ) Often visualized as a table, assign probability for each point:
1 1.5 3 Continuous: f (x, y ) 0,
0 0.1 0 0.2
-1 0 0 0
-2.5 0.2 0 0.4
5 0 0.1 0
f (x, y )dxdy =
Joint p.d.f. f (x, y ) : P((X, Y ) A) = A f (x, y )dxdy Joint c.d.f. F (x, y ) = P(X x, Y y )
R2
f (x, y )dxdy = 1
If you want the c.d.f. only for x, F (x) = P(X x) = P(X x, Y +)

= F (x, ) = limy F (x, y )
Same for y.
To nd the probability within a rectangle on the (x, y) plane:
Continuous: F (x, y ) =
x y
f (x, y )dxdy. Also,
2F xy
= f (x, y )
** End of Lecture 9
28
x y In the continuous case: F (x, y ) = P(X x, T y ) = f (x, y )dxdy. Marginal Distributions Given the joint distribution of (X, Y), the individual distributions of X, Y
are marginal distributions.
Discrete (X, Y): marginal
probability function f1 (x) = P(X = x) = y P(X = x, Y = y ) = y f (x, y )
In the table for the previous lecture, of probabilities for each point (x, y):
Add up all values for y in the row x = 1 to determine P(X = 1)
Continuous (X, Y): joint p.d.f. f(x, y); p.d.f. of X: f1 (x) = f (x, y )dy x F (x) = P(X x) = P(X x, Y ) = f (x, y )dydx
Review of Distribution Types Discrete distribution for (X, Y): joint p.f. f (x, y ) = P(X = x, Y = y ) Continuous: joint p.d.f. f (x, y ) 0, R2 f (x, y )dxdy = 1 Joint c.d.f.: F (x, y ) = P(X x, Y y ) F (x) = P(X x) = limy F (x, y )
f1 (x) = F x = f (x, y )dy Why not integrate over x line?

P({X = x}) = ( x f (x, y )dx)dy = 0
P(of continuous random variable at a specic point) = 0. Example: Joint p.d.f. 2 2 f (x, y ) = 21 4 x y, x y 1, 0 x 1; 0 otherwise
29
What is the distribution of x? 1 1 2 1 2 2 x ydy = 21 p.d.f. f1 (x) = x2 21 4 4 x 2 y |x 2 =
21 2 8 x (1
x4 ), 1 x 1
Discrete values for X, Y in tabular form: 1 2 1 0.5 0 0.5 2 0 0.5 0.5 0.5 0.5 Note: If all entries had 0.25 values, the two variables would have the same marginal dist. Independent X and Y: Denition: X, Y independent if P(X A, Y B ) = P(X A)P(Y B )
Joint c.d.f. F (x, y ) = P(X x, Y y ) = P(X x)P(Y y ) = F1 (x)F2 (y ) (intersection of events)
The joint c.d.f can be factored for independent random variables.
Implication: (X, Y): joint p.d.f. f(x, y), x), f2 (y ) y x continuous x marginal f1 ( y F (x, y ) = f (x, y )dydx = F1 (x)F2 (y ) = f1 (x)dx f2 (y )dy
2
Take xy of both sides: f (x, y ) = f1 (x)f2 (y ) Independent if joint density is a product.
Much simpler in the discrete case:

Discrete (X, Y): f (x, y ) = P(X = x, Y = y ) = P(X = x)P(Y = y ) = f1 (x)f2 (y ) by denition.
Example: Joint p.d.f.
f (x, y ) = kx2 y 2 , x2 + y 2 1; 0 otherwise
X and Y are not independent variables.
f (x, y ) = f1 (x)f2 (y ) because of the circle condition.
30
P(square) = 0 = P(X side) P(Y side) Example: f (x, y ) = kx2 y 2 , 0 x 1, 0 y 1; 0 otherwise Can be written as a product, as they are independent:
f (x, y ) = kx2 y 2 I (0 x 1, 0 y 1) = k1 x2 I (0 x 1) k2 y 2 I (0 y 1)
Conditions on x and y can be separated.
Note: Indicator Notation
/
A I (x A) = 1, x A; 0, x For the discrete case, given a table of values, you can tell independence: b1 p11 ... ... pn1 p+1 b2 p12 ... ... ... p+2 ... ... ... ... ... ... bm p1m ... ... pnm p+n
a1 a2 ... an
p1+ p2+ ... pn+
pij = P(X = ai , Y = bj ) = P(X = ai )P(Y = bj ) m pi+ = P(X = ai ) = j =1 pij n p+j = P(Y = bj ) = i=1 pij pij = pi+ p+j , for every i, j - all points in table. ** End of Lecture 10
31

Probability

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability

Загружено:

Авторское право:

Доступные форматы

18.05 Spring 2005 Lecture Notes 18.

05 Lecture 1 February 2, 2005

which is called countably additive.

1.3 Events, Set Operations

Set Dierence: A \ B = A B = {s S : s A and s / B} = A B c

18.05 Lecture 2 February 4, 2005

write in terms of disjoint pieces to prove it:

Example: Order 6 books on a shelf = 6! permutations.

Sampling with replacement, k out of n

Example: a - red balls, b - black balls.

times that each term will show up in the expansion.

Example: In a deck of 52 cards, 5 cards are chosen.

18.05 Lecture 3 February 7, 2005

These combinations are called multinomial coecients.

Example #3 - Playing Bridge

Probabilities of Unions of Events:

P(A B ) = P(A) + P(B ) P(AB )

P(Ai Aj Ak ) ... + (1)n+1 P(Ai ...An )

c Ai1 Ai2 ... Aik Ac i(k+1) ... Ain

where k = last set the piece is a part of. P(

18.05 Lecture 4 February 11, 2005

Union of Events P(A1 ... An ) =

It is often easier to calculate P(intersections) than P(unions)

Visually, conditional probability is the area shown below: 11

All possible odd T = 3, 5, 7, 9, 11.

As stated earlier, conditional probability can be used to calculate intersections:

r b b1 r1 r+b r1+b r+b2 r+b3

Example, Casino game - Craps. Whats the probability of actually winning??

18.05 Lecture 5 February 14, 2005

Disjoint = Independent. If A, B are independent, nd P(AB c )

so, P(AB c ) = P(A) P(AB )

General Example: Throw a coin n times, P(k heads out of n throws)

n Out of n couples, P(A) = P(at least 1 couple) = 1 P(no couples) = 1 i=1 (1 p)

18.05 Lecture 6 February 16, 2005

1-6 pg. 27 #10

1-10 pg. 34 #17

50 b) #(one senator from each state) = 2 100 select group of 50 = 50

or, calculate: 1 P(neither chosen) = 1

1-12 30 people, 12 months.

18.05 Lecture 7 February 18, 2005

Example: Medical detection test, 90% accurate.

(0.9)(0.0001) = 0.0009 (0.9)(0.0001) + (0.1)(0.9999)

Example: Identify the source of a defective item.

Probability that child has dominant phenotype 1 1 1

18.05 Lecture 8 February 22, 2005

The random variable maps outcomes and probabilities to real numbers.

Example: Poisson Distribution, {0, 1, 2, 3, ...} (), = intensity

k e , where parameter > 0. k!

k = e k 0 e = e0 = 1 k! = e Very common distribution, will be used later in statistics.

If you were to choose a random point on an interval, the probability of choosing

E (), > 0 parameter p.d.f.: f (x) = ex , if x 0; 0, if x < 0 Check that it integrates to 1:

18.05 Lecture 9 February 23, 2005

Cumulative distribution function (c.d.f):

{X yn } = {X x}, F (yn ) P(X x) = F (x)

x Continuous: F (x) = P(X x) = f (x)dx eventually, the graph approaches 1.

1 1.5 3 Continuous: f (x, y ) 0,

-2.5 0.2 0 0.4

If you want the c.d.f. only for x, F (x) = P(X x) = P(X x, Y +)

f (x, y )dxdy. Also,

18.05 Lecture 10 February 25, 2005

f1 (x) = F x = f (x, y )dy Why not integrate over x line?

What is the distribution of x? 1 1 2 1 2 2 x ydy = 21 p.d.f. f1 (x) = x2 21 4 4 x 2 y |x 2 =

Take xy of both sides: f (x, y ) = f1 (x)f2 (y ) Independent if joint density is a product.

Much simpler in the discrete case: