Intro To Probability Theory

CS5011: Introduction to Machine Learning
Fall 2014
Lecture 1: A - Introduction to Probability Theory

Lecturers: Nalini Deswal & Harini A
Scribes: Dibu John Philip
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.
Probability theory is the branch of mathematics that deals with uncertainty and the analysis of random
phenomena. The central objects of probability theory are random variables, stochastic processes, and events.
In this class, the basic concepts of probability theory are discussed.
Basic Elements
Sample Space S
All the possible outcomes of an experiment. e.g. tossing a coin has a sample space S={H,T} Discrete
sample space;
selecting a number between 0 & 1 has the sample space S={[0,1]} Continuous sample space.
Event F
Subset of the Sample Space which is of interest to a person. Elements of A F will be subsets of S.
If A & B are two sets,
ABxA&xB
equality; A = B A B & B A
Operations on Sets
Union
A B: the set of all elements that are in A or in B or in both A and B.
Intersection
A B: the set of all elements that are in both A and B.
Compliment
Ac : the set of all elements that are not in A.
See Fig.1 for the Venn diagram representation of these operations.
Properties of Set Operations

Commutative: A B = A B; A B = A B
Associative: A (B C) = (A B) C
1
Figure 1: Set operations
Figure 2: Disjoint sets
Distribution: A (B C) = (A B) (A C)
De Morgans Law1 (A B)c = Ac Bc
Disjoint Sets
Two sets A and B are said to be disjoint if there are no elements common to both A and B; i.e., A B =
. See Fig.2.
1 Augustus De Morgan was a British mathematician and logician. He formulated De Morgans laws and introduced the term
mathematical induction, making its idea rigorous.
Figure 3: Partitioning of Sample Space
Partitioning
Dividing a sample space S into different sets such that there is nothing in common between the sets.
n
n
T
S
i.e.,
Ai = &
Ai = S. See Fig.3
i=1
i=1
Sigma Algebra
Sigma algebra B is the collection of all subsets of S which satisfies the following properties:
1. B
2. A B Ac B
3. A1 , A2 , . . . , An B
n
S
Ai B.
i=1
Example: S={1, 2} B = {{1}, {2}, {1, 2}, } (power set of S; PS )
Probability Function
P: F R which satisfies the following properties called the Axioms of Probability :

1. P(A) 0
2. P(S) = 1
3. For disjoint events A1 , A2 , . . . , Ai , P(
n
S
i=1
Ai ) =
n
P
i=1
P(Ai )
Example: Tossing of a coin; S={H, T }; B = {{H}, {T }, {H, T }, {}} The domain over which probability
is defined.
P(H)= 12 ; P(T)= 12 ;
P(H or T)=1; P()=0.
Properties of Probability
1. P(A) 1
2. P(Ac ) = 1-P(A)
3. P(A B) = P(A) + P(B) - P(AB)
From (1) and (3), P(A B) 1;

P(A) + P(B) - P(AB) 1
P (A B) P (A) + P (B) 1
This is called the Bonferronis2 inequality which gives a lower bound on the intersection of two sets. However,
when P(A) and P(B) are too small, the RHS will be negative which makes it a trivial bound. In other words,
when you have two rare events, this inequality does not help you much. Also, this inequality can be used to
approximate P(A B) when it is difficult to calculate it.Generalizing this,
P(
n
T
i=1
n
P
P(Ai )-(n-1)
i=1
Conditional Probability
Conditional probability of any event A given B is

P(A|B)= P(AB)
P(B)
Fig.4 explains this with Venn diagrams.
Example: Roll of a single die. S = {1, 2 ,3, 4, 5 , 6}. Let A be the event of getting 1. Let B be the event
that the die shows up an odd number. P(A) = P(1) = 16 . Now, adding the knowledge that the die throws
out an odd number increases our confidence that 1 would show up;P(A|B) = P(1 | odd no.) = P(AB)
P(B) =
1/6
1/2
10
= 13 .
Independence
Occurrence of one event has no effect on the occurence of the other event.
2 Carlo Emilio Bonferronis first interests were in music and he studied conducting and the piano at the Music Conservatory
of Turin. However, his interests turned towards mathematics, encouraged by his father.
Figure 4: Conditional Probability
P(A B) = P(A)P(B);
P(A | B) = P(AB)
P(B) =P(A).
11
Bayes Theorem
P(A | B) =
P(AB)
P(B)
P(A B) = P(A|B)P(B)
(1)
P(A B) = P(B|A)P(A)
(2)
Also,
From eqns. 1 and 2
P(A|B) =
P(B|A)P(A)
P(B)
where P(A|B) is called the posterior probability, P(B|A) the likelihood, P(A) the prior, and P(B) the
evidence.
12
Random Variables
A function that maps sample space S to R. See Fig.5.

Example: Tossing a coin thrice. S={HHH, HHT, HT T, T T T, T T H, T HH, T HT, HT H}. Let us be interested in the no. of possible Heads in the experiment. X=0, 1, 2, 3. P(X=xi )=P({sj S : X(sj ) = xi })
P(X=2) = 83 ; P(X=0)= 18 ;
P(X=1) = 83 ; P(X=3)= 18 .
3 Thomas Bayes never published what would eventually become his most famous accomplishment; his notes were edited and
published after his death by Richard Price.
Figure 5: Random Variable
Another example; S:[-1, 1]; a taking a value between -1 and 1. Let X: a2 ;

X
S R([0, 1]); continuous random variable.
13
Probability Mass Function (pmf ): discrete random variable
Gives the probability of each numerical value that the random variable can take.
px (X):P(X=xi ) xi
With respect to the previous 3-coin tossing example,
(
px =
14
1
8,
3
8,
if x = 0 or 3
if x = 1 or 2
Probability Density Function (pdf ): continuous random variable
fx (X) is continuous and non-negative function.
P(X B) =
fx (X)dx
P(a X b) =
Rb
fx (X)dx
Point probability is zero in the continuous case. (check by putting a=b in the above equation)
Figure 6: cdf of X
15
Cumulative Distribution Function

P
P(X x) =
P(X=xi ); discrete.
xi x
Fx (X x) =
Rx
fX (X);
Example: Again from the previous 3-coin tossing example;
0,
8,
FX (X) = 18 +
1 +
7 +
8
3
8
3
8
1
8
x<0
0 x < 1,
1
= 2,
1 x < 2,
+ 38 = 87 , 2 x < 3,
= 1,
x 3.
Fig.6 shows the cdf of X. The cdf has the following properties:
1. Every cumulative distribution function F is non-decreasing and right-continuous;
2.
lim FX (X)=0;
3. lim FX (X)=1.
x
16
Functions of Random Variables
Y = g(X) is a function of the random variable X; Y is also a random variable.

e.g. X: temp in C;
Y: 1.8X + 32.
16.1
Expected Value / Mean
The mean of a discrete random variable X is a weighted average of the possible values that the random
variable can take.
X: set of random variables; let g(X) be a function on X. Then the expected value of g(X) is given as
=E[g(x)] =
g(X)P(X=x)
xX
For continuous random variables;

=E[g(x)] =
g(X)fX (x)
xX
Example: Toss a biased coin twice. Let the probability of it showing up heads by p = 34 .
=E[X]=0. 14 +1.2. 34 14 +2. 34 34 =
17
P(X=x)
1
4
2. 43 . 14
( 34 )2
3
2
Moment of a Random Variable
For an integer n, the nth moment

n : E[Xn ]
When n=1,
: E[X]
where is the mean or expected value of the random variable. That is, mean is the first moment.
18
Central Moment
n : E[(X-)n ]
When n = 2,
Variance = 2 = E[(X )2 ]
= E[X2 + 2 2X]
= E[X2 ] E[X]2
Standard deviation =
The variance of a discrete random variable X measures the spread, or variability, of the distribution. Variance
is the second central moment.
19
Some Properties of Mean and Variance
1. E[a.g1 (X) + b.g2 (X)] = aE[g1 (X)] + bE[g2 (X)]; where a and b are scalars.
2. Var(a.g(X)) = a2 Var(g(X))
3. Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y)
20
Next topic
The next section of this will lecture will introduce the different probability distributions.
References
[GR06]
George Casella and Roger Berger, Statistical Inference, Thomson Press (India) Ltd;
2 Edition edition, 2006.

Intro To Probability Theory

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Intro To Probability Theory

Загружено:

Авторское право:

Доступные форматы

CS5011: Introduction to Machine Learning

Lecture 1: A - Introduction to Probability Theory

Scribes: Dibu John Philip

See Fig.1 for the Venn diagram representation of these operations.

Properties of Set Operations

Lecture 1: A - Introduction to Probability Theory

Figure 1: Set operations

Figure 2: Disjoint sets

Lecture 1: A - Introduction to Probability Theory

Figure 3: Partitioning of Sample Space

Example: S={1, 2} B = {{1}, {2}, {1, 2}, } (power set of S; PS )

P: F R which satisfies the following properties called the Axioms of Probability :

Lecture 1: A - Introduction to Probability Theory

From (1) and (3), P(A B) 1;

Conditional probability of any event A given B is

Lecture 1: A - Introduction to Probability Theory

Figure 4: Conditional Probability

A function that maps sample space S to R. See Fig.5.

Lecture 1: A - Introduction to Probability Theory

Figure 5: Random Variable

Another example; S:[-1, 1]; a taking a value between -1 and 1. Let X: a2 ;

Probability Mass Function (pmf ): discrete random variable

Probability Density Function (pdf ): continuous random variable

fx (X) is continuous and non-negative function.

Lecture 1: A - Introduction to Probability Theory

Cumulative Distribution Function

Example: Again from the previous 3-coin tossing example;

Functions of Random Variables

Y = g(X) is a function of the random variable X; Y is also a random variable.

Lecture 1: A - Introduction to Probability Theory

Expected Value / Mean

For continuous random variables;

=E[X]=0. 14 +1.2. 34 14 +2. 34 34 =

Moment of a Random Variable

For an integer n, the nth moment

Lecture 1: A - Introduction to Probability Theory

Some Properties of Mean and Variance

Вам также может понравиться