Академический Документы
Профессиональный Документы
Культура Документы
Fall 2014
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.
Probability theory is the branch of mathematics that deals with uncertainty and the analysis of random
phenomena. The central objects of probability theory are random variables, stochastic processes, and events.
In this class, the basic concepts of probability theory are discussed.
Basic Elements
Sample Space S
All the possible outcomes of an experiment. e.g. tossing a coin has a sample space S={H,T} Discrete
sample space;
selecting a number between 0 & 1 has the sample space S={[0,1]} Continuous sample space.
Event F
Subset of the Sample Space which is of interest to a person. Elements of A F will be subsets of S.
If A & B are two sets,
ABxA&xB
equality; A = B A B & B A
Operations on Sets
Union
A B: the set of all elements that are in A or in B or in both A and B.
Intersection
A B: the set of all elements that are in both A and B.
Compliment
Ac : the set of all elements that are not in A.
Distribution: A (B C) = (A B) (A C)
De Morgans Law1 (A B)c = Ac Bc
Disjoint Sets
Two sets A and B are said to be disjoint if there are no elements common to both A and B; i.e., A B =
. See Fig.2.
1 Augustus De Morgan was a British mathematician and logician. He formulated De Morgans laws and introduced the term
mathematical induction, making its idea rigorous.
Partitioning
Dividing a sample space S into different sets such that there is nothing in common between the sets.
n
n
T
S
i.e.,
Ai = &
Ai = S. See Fig.3
i=1
i=1
Sigma Algebra
Sigma algebra B is the collection of all subsets of S which satisfies the following properties:
1. B
2. A B Ac B
3. A1 , A2 , . . . , An B
n
S
Ai B.
i=1
Probability Function
n
S
i=1
Ai ) =
n
P
i=1
P(Ai )
Example: Tossing of a coin; S={H, T }; B = {{H}, {T }, {H, T }, {}} The domain over which probability
is defined.
P(H)= 12 ; P(T)= 12 ;
P(H or T)=1; P()=0.
Properties of Probability
1. P(A) 1
2. P(Ac ) = 1-P(A)
3. P(A B) = P(A) + P(B) - P(AB)
P (A B) P (A) + P (B) 1
This is called the Bonferronis2 inequality which gives a lower bound on the intersection of two sets. However,
when P(A) and P(B) are too small, the RHS will be negative which makes it a trivial bound. In other words,
when you have two rare events, this inequality does not help you much. Also, this inequality can be used to
approximate P(A B) when it is difficult to calculate it.Generalizing this,
P(
n
T
i=1
n
P
P(Ai )-(n-1)
i=1
Conditional Probability
10
= 13 .
Independence
Occurrence of one event has no effect on the occurence of the other event.
2 Carlo Emilio Bonferronis first interests were in music and he studied conducting and the piano at the Music Conservatory
of Turin. However, his interests turned towards mathematics, encouraged by his father.
P(A B) = P(A)P(B);
P(A | B) = P(AB)
P(B) =P(A).
11
Bayes Theorem
P(A | B) =
P(AB)
P(B)
P(A B) = P(A|B)P(B)
(1)
P(A B) = P(B|A)P(A)
(2)
Also,
From eqns. 1 and 2
P(A|B) =
P(B|A)P(A)
P(B)
where P(A|B) is called the posterior probability, P(B|A) the likelihood, P(A) the prior, and P(B) the
evidence.
12
Random Variables
P(X=2) = 83 ; P(X=0)= 18 ;
P(X=1) = 83 ; P(X=3)= 18 .
3 Thomas Bayes never published what would eventually become his most famous accomplishment; his notes were edited and
published after his death by Richard Price.
13
Gives the probability of each numerical value that the random variable can take.
px (X):P(X=xi ) xi
With respect to the previous 3-coin tossing example,
(
px =
14
1
8,
3
8,
if x = 0 or 3
if x = 1 or 2
P(X B) =
fx (X)dx
P(a X b) =
Rb
fx (X)dx
Point probability is zero in the continuous case. (check by putting a=b in the above equation)
Figure 6: cdf of X
15
P(X x) =
P(X=xi ); discrete.
xi x
Fx (X x) =
Rx
fX (X);
0,
8,
FX (X) = 18 +
1 +
7 +
8
3
8
3
8
1
8
x<0
0 x < 1,
1
= 2,
1 x < 2,
+ 38 = 87 , 2 x < 3,
= 1,
x 3.
Fig.6 shows the cdf of X. The cdf has the following properties:
1. Every cumulative distribution function F is non-decreasing and right-continuous;
2.
lim FX (X)=0;
3. lim FX (X)=1.
x
16
16.1
The mean of a discrete random variable X is a weighted average of the possible values that the random
variable can take.
X: set of random variables; let g(X) be a function on X. Then the expected value of g(X) is given as
=E[g(x)] =
g(X)P(X=x)
xX
g(X)fX (x)
xX
Example: Toss a biased coin twice. Let the probability of it showing up heads by p = 34 .
17
P(X=x)
1
4
2. 43 . 14
( 34 )2
3
2
18
Central Moment
n : E[(X-)n ]
When n = 2,
Variance = 2 = E[(X )2 ]
= E[X2 + 2 2X]
= E[X2 ] E[X]2
Standard deviation =
The variance of a discrete random variable X measures the spread, or variability, of the distribution. Variance
is the second central moment.
19
1. E[a.g1 (X) + b.g2 (X)] = aE[g1 (X)] + bE[g2 (X)]; where a and b are scalars.
2. Var(a.g(X)) = a2 Var(g(X))
3. Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y)
20
Next topic
The next section of this will lecture will introduce the different probability distributions.
References
[GR06]
George Casella and Roger Berger, Statistical Inference, Thomson Press (India) Ltd;
2 Edition edition, 2006.