Академический Документы
Профессиональный Документы
Культура Документы
1.2 Permutations
From the counting principle, we may develop the concept of the permuation. Assume we
wish to know the number of ways we can arrange something in a row. If we have n objects
to arrange, we have n options for the first place, and one fewer objects in each successive
place. Thus, the number of permuations of n objects is:
n−1
Y
(n − k) := n!
k=0
Example 2 We have a baseball team with 9 players. The number of possible batting
orders is 9!.
Example 3 A class in probability theory consists of 6 men and 4 women. Assuming no two
students get the same score, how many rankings are possible? There are 10 students and thus
10! rankings. Now supposing the men and women were only ranked among themselves, then,
using the permutation formula and the counting principle, we arrive at (6!)(4!) possibilities.
Example 4 Now suppose that someone has 10 books broken down by subject as follows:
4 mathematics books, 3 chemistry books, 2 history books and 1 language book. If the books
are arranged such that each subject is together, how many arrangements are possible? Once
again, this requires both the counting principle and the concept of a permuation. There are
4! ways to arrange the subjects. Thus, the answer is (4!)(4!)(3!)(2!)(1).
1.3 Combinations
We will now introduce the concept of a combination. Unlike a permutation, a combination
is simply a set of objects not endowed with any natural order. The number of combinations
1
of size r that can be made from a set of size n is denoted nr . We recall that in general,
when we take order into account, r things out of a group of n can be arranged in:
n!
(n − r)!
. Now, since every arrangement of the r items is counted once, and we wish to count each
set once, we can get the combination formula by dividing by r! since that is the number of
ways to arrange r items. Thus, we have the result:
n n!
=
r (n − r)!r!
Notice that in the case of r = 2, this reduces to the binomial theorem. The multinomial
coefficient is defined as follows:
n n!
=
n1 , n2 , ..., nr n1 !n2 !...nr !
2 Axioms of Probability
2.1 Set Theory
We begin with concept of a sample space. The sample space is all possible outcomes of an
experiment. An event is an element of the powerset of the sample space. Since events are
sets, they are subject to the normal set theoretic laws. A few of which I list below:
Commutativity: E ∪ F = F ∪ E and EF = F E
Associativity: (E ∪ F ) ∪ G = E ∪ (F ∪ G) and (EF )G = E(F G)
Distributivity: (E ∪ F )G = EG ∪ F G and EF ∪ G = (E ∪ G)(F ∪ G)
Finally, we have De Morgan’s Laws:
2
n
!c n
[ \
Ei = Eic
i=1 i=1
and
n
!c n
\ [
Ei = Eic
i=1 i=1
0 ≥ P (E) ≥ 1
Axiom 2
P (S) = 1
Axiom 3
For any sequence of mutually exclusive events E1 , E2 , ...:
∞
! ∞
[ X
P Ei = P (Ei )
i=1 i=1
Proposition 1 :
P (E c ) = 1 − P (E). This is manifest from axioms 1 and 3.
Proposition 2
If E ⊂ F , then P (E) ≤ P (F )
Proof : We may decompose F as F = E ∪ E c F . These two sets are disjoint. Therefore:
P (F ) = P (E) + P (E c F ) Since P (E c F ) ≥ 0, the result follows.
Proposition 3 : P (E ∪ F ) = P (E) − P (EF )
Proof : Since E ∪ F = E ∪ E c F and E ∩ E c F and E ∩ E c F = ∅, we appeal to axiom 3
as follows: P (E ∪ F ) = P (E) + P (E c F ). Further P (F ) = P (EF ) + P (E c F ) or P (E c F ) =
P (F ) − P (EF ) completing the proof. P
Proposition 4 : P (E1 ∪ E2 ∪ ...En ) = ni=1 P (Ei ) − i1 < i2 P (Ei1 Ei2 ) + ...
P
+ (−1)r+1 i1 <i2 <...<ir P (Ei1 Ei2 ...Eir + (−1)n+1 P (E1 )E2 ...En )
P
3
Proposition 6.1 : If {En , n ≥ 1} is either an increasing or decreasing sequence of events,
then:
lim P (En ) = P lim En
n→∞ n→∞
Proof : Suppose first the {En } is an increasing or decreasing sequence, and define events
{Fn } as follows:
F1 = E1
n−1
! c
[
c
Fn = En Ei = En En−1 where n > 1
1
where we have used the fact that ∪1n−1 Ei = En−1 , since the events are increasing. In
other words, Fn consists of those outcomes in En which are not in any of the earlier Ei .
However, it is evident that:
[∞ ∞
[
Fi = Ei and
i=1 i=1
n
[ n
[
Fi = Ei
i=1 i=1
n
! n
!
[ [
= lim P Fi = lim P Ei
n→∞ n→∞
i=1 i=1
= lim P (En )
n→∞
This proves the result for the increasing case. We now move to the decreasing case. If
{En }, then it follows that {Enc } Hence, using the previous equation, we may conclude that
∞
!
[
P Ei = lim P (Enc )
c
n→∞
i=1
4
T∞
Eic = ( ∞
T c
Since i=1 i=1 Ei ) , it follows that
∞
!c !
\
P Ei = lim P (Enc )
n→∞
i=1
or equivalently:
∞
!
\
1−P Ei = lim [1 − P (En )] = 1 − lim P (En )
n→∞ n→∞
i=1
or :
∞
!
\
P Ei = lim P (En )
n→∞
i=1
P (E1 E2 ) P (E1 E2 E3 )
P (E1 ) ··· = P (E1 E2 . . . En )
P (E1 ) P (E1 E2 . . . En−1 )
= P (E|F ) P (F ) + P (E|F c ) P (F c )
= P (E|F ) P (F ) + P (E|F c ) (1 − P (F ))
P (A)
We now define the odds of an event A. The Odds of A are defined as: P (Ac )
.
5
3.2 Independent Events
Two events are said to be independent provided: P (EF ) = P (E) P (F ). Otherwise, these
events are dependent. Note that the pairwise independence of a set of events does not imply
that the intersection of any n events are independent. Three events are independent provided
the following four conditions hold:
P (EF ) = P (E) P (F )
P (F G) = P (F ) P (G)
P (EG) = P (E) P (G)
1. 0 ≤ P (E|F ) ≤ 1
2. P (S|F ) = 1
Proof :
For the first part, the left side of the inequality is obvious. The right holds because EF ⊂
F → P (EF ) ≤ P (F ). The second statement holds because P (S|F ) = PP(SF (F )
)
= PP (F
(F )
)
= 1.
The third statement follows from:
∞
!
P (( ∞
S
i=1 Ei ) F )
[
P Ei |F =
i=1
P (F )
P( ∞
S
i=1 Ei F )
=
P (F )
since
∞
! ∞
[ [
Ei F = Ei F
i=1 i=1
6
P∞
1 P (Ei F )
=
P (F )
∞
X
= P (Ei |F )
1
4 Random Variables
4.1 Discrete Random Variables
A random variable that takes on at most a countable number of values is a Discrete Random
Variable. For some discrete random variable X, we define the Probability Mass Function by:
p(a) = P (X = a). Since the variable takes on an at most countable number of values, we
may index them and:
X∞
p(xi ) = 1
i=1
We can use this concept to calculate the expectations of functions of random variables
as well. The trick to finding the expectation of some g(X) is to find the probability mass
function of g(X). Once we have done this, we may use the definition to determine expectation
as some function of a random variable is itself a random variable.
Proposition: If X is a discrete random variable, that takes on one of the following values
xi with i ≥ 1, with respective probabilities p(xi ), then:
X
E[X] = g(xi )p(xi )
i
Proof :
7
X X X
g(xi )p(xi ) = g(xi )p(xi )
i j i:g(xi )=yi
X X
= yj p(xi )
j i:g(xi )=yi
X X
= yj p(xi )
j i:g(xi )=yi
X
= yj P [g(X) = yi ]
j
= E[g(X)]
4.3 Variance
The Variance of a random variable is the Expected Value of the Variables squared difference
from it’s own expected value. more succinctly:
8
F (a, b) = P (X ≤ a, Y ≤ b)
where a and b are real numbers. From this we can obtain the distribution of X as follows:
FX (a) = P (X ≤ a) = P (X ≤ a, Y ≤ ∞)
= P (limb→∞ X ≤ a, Y ≤ b)
= limb→∞ F (a, b)
We find the cumulative distribution function of Y similarly. These are sometimes referred
to as the MArginal Distributions of X and Y .
In the discrete case, it is convenient to define the Joint Probability Mass Function:
p(x, y) = P (X = x, Y = y)
We further say that X and Y are jointly continuous if there exists a function f (x, y) such
that:
Z Z
P (X, Y ∈ C) = f (x, y)dxdy
(x,y)∈C
Then this function f is called the joint probability distribution. It follows up differentiation
that:
∂2
f (a, b) = F (a, b)
∂a∂b
. An alternative interpretation is that f (a, b) is a measure of how likely it is that a random
vector will be near (a, b). If X and Y are jointly continuous, they are individually continuous
and we may find the probability density function of X by integrating with respect y over
the domain of Y .
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (y ∈ B)
In fact, this condition is necessary and sufficient. Thus, F (a, b) = FX (a)FY (b)∀a, b. In the
jointly continuous case, we have the formulation: f (x, y) = fX (x)fY (y)∀x, y If these random
variables are not independent, they are said to be dependent. Independence is symmetric.
9
5.2 Sums of Independent Random Variables
FX+Y = P (X + Y ≤ a) =
Z Z
fX (x)fY (y)dxdy
x+y≤a
Z ∞ Z a−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞
= FX (a − y)fY (y)dy
−∞
This cumulative distribution function is called the convolution of the distributions FX
and FY . We can arrive at a pdf if we differentiate:
Z inf ty
d
fX+Y = FX (a − y)fY (y)dy
da −∞
Z inf ty
d d
fX+Y = FX (a − y)fY (y)dy
da −∞ da
= fX (a − y)fY (y)dy
Proposition: If X and Y are independent gamma random variables with parameters (s, λ)
and (t, λ), then X + Y is a gamma random variable with parameters (s + t, λ).
Proposition : If Xi i = 1, . . . , n are independent random variables where N (µi , σi2 , then
n
X
Xi
i=1
P (x, y)
PX|Y (x | y) =
pY (x, y)
Continuous Case:
The Conditional Probability Density Function:
f (x, y)
fX|Y (x | y) =
fY (y)
10
5.4 Exchangeable Random Variables
A set of random variables is said to be exchangeable if their joint distribution does not change
dependent on the order they are observed in. That is, a permutation of variables will not
change the result.
6 Properties of Expectation
Proposition: If X and Y have a joint pmf p(x, y), then:
XX
E[g(X, Y )] = g(x, y)P (x, y)
y x
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
Definition:
1. Cov(X, Y ) = Cov(Y, X)
2. Cov(X, X) = V ar(X)
3. Cov(aX, Y ) = aCov(X, Y )
11
6.3 Moment Generating Functions
:
The moment generating function is defined as: M (t) = E[etX ]. We call it this because:
The individual moment generating functions can be obtained from the joint by setting all
other tj values equal to 0.
6.4 Correlation
Cov(X,Y )
ρ(X, Y ) = √ .
V ar(X)V ar(Y )
7 Limit Theorems
Proposition: Markov’s Inequality
If Xis a random variable that takes only nonnegative values, then ∀a > 0:
E[X]
P [X ≥ a] ≤
a
Proposition: Chebyshev’s Inequality
If X is a random variable with finite mean µ and variance σ 2 , then, for any value k > 0,
1
P {|X − µ| ≥ kσ} ≤
k2
Proposition: If V ar(X) = 0, then P (X = E[X]) = 1.
Theorem: The Weak Law of Large Numbers
Let {Xi } be a sequence of independent and identically distributed random variables, each
having finite expectations E[Xi ] = µ, then for any > 0,
Pn
Xi
P {| i=1 − µ| ≥ } → 0
n
as n → ∞.
Theorem: The Central Limit Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
each having mean µ and variance σ 2 . Then the distribution of:
Pn
k=1 Xk − nµ
√
σ n
tends to the standard normal distribution.
12
7.1 Other Inequalities
Proposition: One-sided Chebyshev Inequality
If X is a random variable with mean 0 and finite variance σ 2 , then, for any a > 0,
σ2
P {X ≥ a} ≤
σ 2 + a2
Proposition: Chernoff Bounds
8 Probabilistic Convergence
A sequence of random variables is said to converge in distribution or converge weakly if:
n → ∞P (|Xn − X| ≥ ) = 0
lim
13