Class Notes

1 Combinatorics
1.1 The Counting Principle

We begin the study of probability with a study of counting principles. We begin with the
basic principle of counting. If we have any two experiments, one with n outcomes and
the other with m outcomes, then the two experiments together have mn outcomes. This
principle may be demonstrated by simple enumeration. We now have a few simple examples:
Example 1 : A small community has 10 women, each of whom has three children. Suppose
a mother and one of her children will be chosen as Mother and Child of the year. Then, there
are 3 × 10 possibilities or 30. Now, suppose that the child does not have to be related to the
mother. This changes the situation such that there are now 10 × 30 = 300 possibilities.
We may generalize the counting principle as follows: Given r experiments with n1 , ...nr
possibilities, the number of possible outcomes is
r
Y
nk
k=1
1.2 Permutations
From the counting principle, we may develop the concept of the permuation. Assume we
wish to know the number of ways we can arrange something in a row. If we have n objects
to arrange, we have n options for the first place, and one fewer objects in each successive
place. Thus, the number of permuations of n objects is:
n−1
Y
(n − k) := n!
k=0
Example 2 We have a baseball team with 9 players. The number of possible batting
orders is 9!.
Example 3 A class in probability theory consists of 6 men and 4 women. Assuming no two
students get the same score, how many rankings are possible? There are 10 students and thus
10! rankings. Now supposing the men and women were only ranked among themselves, then,
using the permutation formula and the counting principle, we arrive at (6!)(4!) possibilities.
Example 4 Now suppose that someone has 10 books broken down by subject as follows:
4 mathematics books, 3 chemistry books, 2 history books and 1 language book. If the books
are arranged such that each subject is together, how many arrangements are possible? Once
again, this requires both the counting principle and the concept of a permuation. There are
4! ways to arrange the subjects. Thus, the answer is (4!)(4!)(3!)(2!)(1).
1.3 Combinations
We will now introduce the concept of a combination. Unlike a permutation, a combination
is simply a set of objects not endowed with any natural order. The number of combinations
1
of size r that can be made from a set of size n is denoted nr . We recall that in general,

when we take order into account, r things out of a group of n can be arranged in:
n!
(n − r)!
. Now, since every arrangement of the r items is counted once, and we wish to count each
set once, we can get the combination formula by dividing by r! since that is the number of
ways to arrange r items. Thus, we have the result:

n n!
=
r (n − r)!r!
This is subject to the condition r ≤ n.

An important application of this theorem is the binomial theorem. We will state it here
without proof.
n
n
X n k n−k
(x + y) = x y
k=0
k
1.4 Multinomial Coefficients

The related multinomial theorem is as follows:
X n

n
x1 + x2 ... + xr ) = xn1 1 xn2 2 ...xnr r
(n1 , n2 , ..., nr
(n1 ,n2 ,...,nr
Notice that in the case of r = 2, this reduces to the binomial theorem. The multinomial
coefficient is defined as follows:

n n!
=
n1 , n2 , ..., nr n1 !n2 !...nr !
2 Axioms of Probability
2.1 Set Theory
We begin with concept of a sample space. The sample space is all possible outcomes of an
experiment. An event is an element of the powerset of the sample space. Since events are
sets, they are subject to the normal set theoretic laws. A few of which I list below:
Commutativity: E ∪ F = F ∪ E and EF = F E
Associativity: (E ∪ F ) ∪ G = E ∪ (F ∪ G) and (EF )G = E(F G)
Distributivity: (E ∪ F )G = EG ∪ F G and EF ∪ G = (E ∪ G)(F ∪ G)
Finally, we have De Morgan’s Laws:
2
n
!c n
[ \
Ei = Eic
i=1 i=1
and
n
!c n
\ [
Ei = Eic
i=1 i=1
2.2 Axioms of Probability

We introduce three axioms of a probability:
Axiom 1
0 ≥ P (E) ≥ 1
Axiom 2
P (S) = 1
Axiom 3
For any sequence of mutually exclusive events E1 , E2 , ...:
∞
! ∞
[ X
P Ei = P (Ei )
i=1 i=1
Proposition 1 :
P (E c ) = 1 − P (E). This is manifest from axioms 1 and 3.
Proposition 2
If E ⊂ F , then P (E) ≤ P (F )
Proof : We may decompose F as F = E ∪ E c F . These two sets are disjoint. Therefore:
P (F ) = P (E) + P (E c F ) Since P (E c F ) ≥ 0, the result follows.
Proposition 3 : P (E ∪ F ) = P (E) − P (EF )
Proof : Since E ∪ F = E ∪ E c F and E ∩ E c F and E ∩ E c F = ∅, we appeal to axiom 3
as follows: P (E ∪ F ) = P (E) + P (E c F ). Further P (F ) = P (EF ) + P (E c F ) or P (E c F ) =
P (F ) − P (EF ) completing the proof. P
Proposition 4 : P (E1 ∪ E2 ∪ ...En ) = ni=1 P (Ei ) − i1 < i2 P (Ei1 Ei2 ) + ...
P
+ (−1)r+1 i1 <i2 <...<ir P (Ei1 Ei2 ...Eir + (−1)n+1 P (E1 )E2 ...En )
P
2.3 Probability as a Continuous Set Function

A sequence of events {En }, n ≥ 1 is said to be increasing provided ∀n, En ⊂ En+1 . Similarly,
a sequence is decreasing if En ⊃ En+1 . We then may define limit events for S increasing
and decreasing sequences. We have two new events, denoted by lim En = ∞ i=1 Ei and,
T∞ n→∞
lim En = i=1 Ei . We now may prove the following proposition.
n→∞
3
Proposition 6.1 : If {En , n ≥ 1} is either an increasing or decreasing sequence of events,
then:
lim P (En ) = P lim En
n→∞ n→∞
Proof : Suppose first the {En } is an increasing or decreasing sequence, and define events
{Fn } as follows:
F1 = E1
n−1
! c
[
c
Fn = En Ei = En En−1 where n > 1
1
where we have used the fact that ∪1n−1 Ei = En−1 , since the events are increasing. In
other words, Fn consists of those outcomes in En which are not in any of the earlier Ei .
However, it is evident that:
[∞ ∞
[
Fi = Ei and
i=1 i=1
n
[ n
[
Fi = Ei
i=1 i=1
for all n. Therefore:

∞
! ∞
!
[ [
P Ei =P Fi
i=1 i=1
∞
X
= P (Fi )
i=1
(by axiom 3). Then,

n
X
= lim P (Fi )
n→∞
1
n
! n
!
[ [
= lim P Fi = lim P Ei
n→∞ n→∞
i=1 i=1
= lim P (En )
n→∞
This proves the result for the increasing case. We now move to the decreasing case. If
{En }, then it follows that {Enc } Hence, using the previous equation, we may conclude that
∞
!
[
P Ei = lim P (Enc )
c
n→∞
i=1
4
T∞
Eic = ( ∞
T c
Since i=1 i=1 Ei ) , it follows that
∞
!c !
\
P Ei = lim P (Enc )
n→∞
i=1
or equivalently:
∞
!
\
1−P Ei = lim [1 − P (En )] = 1 − lim P (En )
n→∞ n→∞
i=1
or :
∞
!
\
P Ei = lim P (En )
n→∞
i=1
which proves the result.
3 Conditional Probability and Independence

We begin with the definition of a conditional probability. A conditional probability, P (E|F ) =
P (EF )
P (F )
. Notice that the conditional probability is not defined when P (F ) = 0. From this, we
may derive the following rule: The Multiplication Rule. It states:
P (E1 E2 . . . En ) = P (E1 ) P (E2 |E1 ) P (E3 |E1 E2 ) . . . P (En |E1 . . . En−1 )

The proof is simply multiplication:
P (E1 E2 ) P (E1 E2 E3 )
P (E1 ) ··· = P (E1 E2 . . . En )
P (E1 ) P (E1 E2 . . . En−1 )
3.1 Baye’s Formula

Recall that we may express E as EF ∪ EF c . Using the previous definition then:
P (E) = P (EF ) + P (EF c )
= P (E|F ) P (F ) + P (E|F c ) P (F c )
= P (E|F ) P (F ) + P (E|F c ) (1 − P (F ))
P (A)
We now define the odds of an event A. The Odds of A are defined as: P (Ac )
.
5
3.2 Independent Events
Two events are said to be independent provided: P (EF ) = P (E) P (F ). Otherwise, these
events are dependent. Note that the pairwise independence of a set of events does not imply
that the intersection of any n events are independent. Three events are independent provided
the following four conditions hold:
P (EF G) = P (E) P (F ) P (G)
P (EF ) = P (E) P (F )
P (F G) = P (F ) P (G)
P (EG) = P (E) P (G)
3.3 Conditional Probabilities are Probabilities

It turns out that for all A ∈ S and some E ∈ S, P (A|E) is a probability.
Proposition 5.1 :
1. 0 ≤ P (E|F ) ≤ 1
2. P (S|F ) = 1
3. if {Ei } are mutually exclusive, then:

∞
! ∞
[ X
P Ei |F = P (Ei |F )
i=1 1
Proof :
For the first part, the left side of the inequality is obvious. The right holds because EF ⊂
F → P (EF ) ≤ P (F ). The second statement holds because P (S|F ) = PP(SF (F )
)
= PP (F
(F )
)
= 1.
The third statement follows from:
∞
!
P (( ∞
S
i=1 Ei ) F )
[
P Ei |F =
i=1
P (F )
P( ∞
S
i=1 Ei F )
=
P (F )
since
∞
! ∞
[ [
Ei F = Ei F
i=1 i=1
6
P∞
1 P (Ei F )
=
P (F )
∞
X
= P (Ei |F )
1
where the penultimate equality follows because Ei Ej = ∅ → Ei F Ej F ∅.
4 Random Variables
4.1 Discrete Random Variables
A random variable that takes on at most a countable number of values is a Discrete Random
Variable. For some discrete random variable X, we define the Probability Mass Function by:
p(a) = P (X = a). Since the variable takes on an at most countable number of values, we
may index them and:
X∞
p(xi ) = 1
i=1
4.2 Expected Value

We now present the concept of the expectation of a Random Variable. For a discrete random
variable X having PMF p(x), the Expected Value or Expectation. The expectation is an
average of the values of the random variable, weighted by the probability of each event.
Thus:
X
E[X] = xp(x)
x:p(x)>0
We can use this concept to calculate the expectations of functions of random variables
as well. The trick to finding the expectation of some g(X) is to find the probability mass
function of g(X). Once we have done this, we may use the definition to determine expectation
as some function of a random variable is itself a random variable.
Proposition: If X is a discrete random variable, that takes on one of the following values
xi with i ≥ 1, with respective probabilities p(xi ), then:
X
E[X] = g(xi )p(xi )
i
Proof :
7
X X X
g(xi )p(xi ) = g(xi )p(xi )
i j i:g(xi )=yi
X X
= yj p(xi )
j i:g(xi )=yi
X X
= yj p(xi )
j i:g(xi )=yi
X
= yj P [g(X) = yi ]
j
= E[g(X)]
4.3 Variance
The Variance of a random variable is the Expected Value of the Variables squared difference
from it’s own expected value. more succinctly:
V AR(X) = E[X − E[X]2 ] = E[X 2 ] − (E[X])2

The square root of the Variance is called the Standard Deviation .
4.4 The Bernoulli and Binomial Random Variables

Consider a random variable X that takes on two values {0, 1}. Let p(0) = P [X = 0] = 1 − p
and p(1) = P [X = 1] = p
Suppose now we take n independent trials. Suppose we want the probability of k successes
where we are not concerned about order. This is called a Bernoulli Random Variable. It is a
special case of the more general Binomial Random Variable whose probability mass function
is:

n i
p(i) = p (1 − p)n−1
i
i ∈ {0, 1, . . . , n}.
where n and p are said to be its parameters.
We now introduce some important properties of the binomial random variable.
5 Joint Distribution Functions

Often, we are interested in probability statements concerning two or more random vari-
ables. In order to deal with such probabilities, we introduce the concept of Joint Cumulative
Probability Distribution Functions. We define:
8
F (a, b) = P (X ≤ a, Y ≤ b)
where a and b are real numbers. From this we can obtain the distribution of X as follows:
FX (a) = P (X ≤ a) = P (X ≤ a, Y ≤ ∞)
= P (limb→∞ X ≤ a, Y ≤ b)
= limb→∞ F (a, b)
We find the cumulative distribution function of Y similarly. These are sometimes referred
to as the MArginal Distributions of X and Y .
In the discrete case, it is convenient to define the Joint Probability Mass Function:
p(x, y) = P (X = x, Y = y)
We further say that X and Y are jointly continuous if there exists a function f (x, y) such
that:
Z Z
P (X, Y ∈ C) = f (x, y)dxdy
(x,y)∈C
Then this function f is called the joint probability distribution. It follows up differentiation
that:
∂2
f (a, b) = F (a, b)
∂a∂b
. An alternative interpretation is that f (a, b) is a measure of how likely it is that a random
vector will be near (a, b). If X and Y are jointly continuous, they are individually continuous
and we may find the probability density function of X by integrating with respect y over
the domain of Y .
5.1 Independent Random Variables

Two random variables X and Y are said to be independent if for any two sets of real numbers
A and B:
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (y ∈ B)
In fact, this condition is necessary and sufficient. Thus, F (a, b) = FX (a)FY (b)∀a, b. In the
jointly continuous case, we have the formulation: f (x, y) = fX (x)fY (y)∀x, y If these random
variables are not independent, they are said to be dependent. Independence is symmetric.
9
5.2 Sums of Independent Random Variables
FX+Y = P (X + Y ≤ a) =
Z Z
fX (x)fY (y)dxdy
x+y≤a
Z ∞ Z a−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞
= FX (a − y)fY (y)dy
−∞
This cumulative distribution function is called the convolution of the distributions FX
and FY . We can arrive at a pdf if we differentiate:
Z inf ty
d
fX+Y = FX (a − y)fY (y)dy
da −∞
Z inf ty
d d
fX+Y = FX (a − y)fY (y)dy
da −∞ da
= fX (a − y)fY (y)dy
Proposition: If X and Y are independent gamma random variables with parameters (s, λ)
and (t, λ), then X + Y is a gamma random variable with parameters (s + t, λ).
Proposition : If Xi i = 1, . . . , n are independent random variables where N (µi , σi2 , then
n
X
Xi
i=1
is a normal random variable with N ( ni=1 µi , ni=1 σi2 ).

P P
Proposition: Let X1 , . . . , Xn be indepdent geometric random variables where Xi has
parameter pi . If all the Pi are distinct then for k ≤ n :
n
X Y pj
P (Sn = k) = pi qik−1
i=1
pj − pi
5.3 Conditional Distributions

Discrete Case:
P (x, y)
PX|Y (x | y) =
pY (x, y)
Continuous Case:
The Conditional Probability Density Function:
f (x, y)
fX|Y (x | y) =
fY (y)
10
5.4 Exchangeable Random Variables
A set of random variables is said to be exchangeable if their joint distribution does not change
dependent on the order they are observed in. That is, a permutation of variables will not
change the result.
6 Properties of Expectation
Proposition: If X and Y have a joint pmf p(x, y), then:
XX
E[g(X, Y )] = g(x, y)P (x, y)
y x
In the continuous case:

Z ∞ Z ∞
E[g(X, Y )] = g(x, y)f (x, y)dxdy
−∞ −∞
Proposition: If X and Y are independent, then, for any functions h and g:
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
Definition:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ]]

= E[XY ] − E[X]E[Y ]
Proposition:
1. Cov(X, Y ) = Cov(Y, X)
2. Cov(X, X) = V ar(X)
3. Cov(aX, Y ) = aCov(X, Y )
4. Cov[ ni=1 Xi , nj=1 Yj ] = ni=1 nj=1 Cov[Xi , Yj ]

P P P P
6.1 Expectations and Conditions

Proposition: E[X] = E[E[X|Y ]]
6.2 Conditional Variance

V ar(X|Y ) = E[X − E[X|Y ]2 |Y ]
Proposition: V ar(X) = E[V ar(X|Y ) + V ar(E[X|Y ]]
11
6.3 Moment Generating Functions
:
The moment generating function is defined as: M (t) = E[etX ]. We call it this because:
M (k) (0) = E[X k ]
for all natural numbers. We define a joint moment generating function:
M (t1 , . . . , tn ) = E[et1 X1 +...+tn Xn ]
The individual moment generating functions can be obtained from the joint by setting all
other tj values equal to 0.
6.4 Correlation
Cov(X,Y )
ρ(X, Y ) = √ .
V ar(X)V ar(Y )
7 Limit Theorems
Proposition: Markov’s Inequality
If Xis a random variable that takes only nonnegative values, then ∀a > 0:
E[X]
P [X ≥ a] ≤
a
Proposition: Chebyshev’s Inequality
If X is a random variable with finite mean µ and variance σ 2 , then, for any value k > 0,
1
P {|X − µ| ≥ kσ} ≤
k2
Proposition: If V ar(X) = 0, then P (X = E[X]) = 1.
Theorem: The Weak Law of Large Numbers
Let {Xi } be a sequence of independent and identically distributed random variables, each
having finite expectations E[Xi ] = µ, then for any > 0,
Pn
Xi
P {| i=1 − µ| ≥ } → 0
n
as n → ∞.
Theorem: The Central Limit Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
each having mean µ and variance σ 2 . Then the distribution of:
Pn
k=1 Xk − nµ
√
σ n
tends to the standard normal distribution.
12
7.1 Other Inequalities
Proposition: One-sided Chebyshev Inequality
If X is a random variable with mean 0 and finite variance σ 2 , then, for any a > 0,
σ2
P {X ≥ a} ≤
σ 2 + a2
Proposition: Chernoff Bounds
∀t > 0 : P {X ≥ a} ≤ e−ta M (t)

∀t < 0 : P {X ≤ a} ≤ e−ta M (t)
Proposition: Jensen’s Inequality
If f (x) is convex, then E[f (X)] ≥ f (E[X]).
8 Probabilistic Convergence
A sequence of random variables is said to converge in distribution or converge weakly if:
n → ∞Fn (x) = F (x)

lim
where F is the cdf of X.

A sequence of random variables is said to converge in probability provided:
n → ∞P (|Xn − X| ≥ ) = 0
lim
13

Class Notes

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Class Notes

Загружено:

Авторское право:

Доступные форматы

1 Combinatorics

1.1 The Counting Principle

This is subject to the condition r ≤ n.

1.4 Multinomial Coefficients

2.2 Axioms of Probability

2.3 Probability as a Continuous Set Function

for all n. Therefore:

(by axiom 3). Then,

which proves the result.

3 Conditional Probability and Independence

P (E1 E2 . . . En ) = P (E1 ) P (E2 |E1 ) P (E3 |E1 E2 ) . . . P (En |E1 . . . En−1 )

3.1 Baye’s Formula

P (E) = P (EF ) + P (EF c )

P (EF G) = P (E) P (F ) P (G)

3.3 Conditional Probabilities are Probabilities

3. if {Ei } are mutually exclusive, then:

where the penultimate equality follows because Ei Ej = ∅ → Ei F Ej F ∅.

4.2 Expected Value

V AR(X) = E[X − E[X]2 ] = E[X 2 ] − (E[X])2

4.4 The Bernoulli and Binomial Random Variables

5 Joint Distribution Functions

5.1 Independent Random Variables

is a normal random variable with N ( ni=1 µi , ni=1 σi2 ).

5.3 Conditional Distributions

In the continuous case:

Proposition: If X and Y are independent, then, for any functions h and g:

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ]]

4. Cov[ ni=1 Xi , nj=1 Yj ] = ni=1 nj=1 Cov[Xi , Yj ]

6.1 Expectations and Conditions

6.2 Conditional Variance

M (k) (0) = E[X k ]

for all natural numbers. We define a joint moment generating function:

M (t1 , . . . , tn ) = E[et1 X1 +...+tn Xn ]

∀t > 0 : P {X ≥ a} ≤ e−ta M (t)

n → ∞Fn (x) = F (x)

where F is the cdf of X.

Вам также может понравиться