Вы находитесь на странице: 1из 89

Probability Theory and Stochastic Processes

İstanbul Teknik Üniversitesi

Mustafa Kamasak, PhD

These slides are licensed under a Creative Commons Attribution 4.0 License.
License: https://creativecommons.org/licenses/by-nc-nd/4.0/
v2018.10.03

1 / 89
Stochastic vs. Deterministic Systems

I Deterministic system
I no randomness
I same output for the same input/other conditions
I Stochastic system
I randomness due to
I Limited capabilities of production, measurement,
I various unknown factors (noise, uncertain parameters etc.)
I different output even for the same input/other conditions
I Only of the theoretical systems are deterministic. Their
physical implementations and measurements are stochastic.

2 / 89
Stochastic vs. Deterministic Systems

I This course deals with modeling the output (events /


outcomes) of stochastic systems.
I Outcomes are the observation of results from experiments,
trials etc.
I Events are the observation of events that happen without a
human setup
I Random output instance (at a certain time) can be modelled
with probability theory.
I The output can be
I nominal
I ordinal
I interval (continuous-valued or discrete-valued)
I Time series of output can be modelled as a stochastic process.

3 / 89
Sets

I A set is a collection of objects/elements

A = {ζ1 , ζ2 , · · · , ζN }

There are N elements in set A


I Notation:
ζ2 ∈ A means ζ2 is in set A
ζ2 ∈
/ A means ζ2 is not in set A
I Empty (or null) set Φ contains no elements by definition
I If a set has n elements, it has 2n subsets including the empty
set and itself
I A ⊇ B means B is a subset of A
I A ⊇ B and B ⊇ A then A = B

4 / 89
Set Operators

I Complement:

Ac = {x : x ∈ S but x ∈
/ A}
I Union:

A ∪ B = {x : x ∈ A or x ∈ B}
I Intersection:

A ∩ B = {x : x ∈ A and x ∈ B}
I Symmetric difference:

A 4 B = (Ac ∩ B) ∪ (A ∩ B c )

5 / 89
Properties of Sets

For any subset of S


I Commutative:
A∪B =B ∪A
A∩B =B ∩A
I Associative law:
(A ∪ B) ∪ C = A ∪ (B ∪ C )
(A ∩ B) ∩ C = A ∩ (B ∩ C )
I Distributive law:
(A ∪ B) ∩ C = (A ∩ C ) ∪ (B ∩ C )
(A ∩ B) ∪ C = (A ∪ C ) ∩ (B ∪ C )

6 / 89
Disjoint Sets

I Sets A and B are disjoint (mutually exclusive) if A ∩ B = Φ


I Several sets A1 , A2 , · · · , AN are mutually exclusive if
Ai ∩ Aj = Φ when i 6= j.
I Ai are called a partition of S if
I Ai are mutually exclusive
I ∪Ai = S

7 / 89
De Morgan Law

I De Morgan law is used to find the complement of complicated


operations on sets
I De Morgan Law
(∪i Ai )c = ∩i Aci (∩i Ai )c = ∪i Aci
I General form
Replace all sets with its complement
Replace union with intersection
Replace intersection with union
Replace Φ with S
Replace S with Φ
I For example:

[A ∩ (B ∪ Φ)]c = Ac ∪ (B c ∩ S)

8 / 89
Duality

I If a complicated equality is proven then its dual is also correct.


I General form
Exchange union with intersection
Exchange intersection with union
Exchange Φ with S
Exchange S with Φ
I For example: if S ∩ A = A is proven then its dual Φ ∪ A = A
is also correct

9 / 89
Sample space and empty set
I S: sample space / certain event
It is the set of all possible outcomes/events
I Φ: empty set/ impossible event
I Field:
Let Ai be a subset of S
If Ai are finitely many F = {Ai : Ai ⊆ S, i ≤ N} and
I Φ∈F
I If Ai ∈ F then Aci ∈ F
I If Ai ∈ F for i = 1, 2...N then ∪N
i=1 Ai ∈ F
Then F is called a field.
I Borel field:
If Ai are infinitely many then it is called a Borel field.
A Borel field is closed under complement and countable union
operations
I Suppose B = {Ai : Ai ⊆ S and i ∈ N} is a Borel field. Any
subset A of S is called and event iff A ∈ B
10 / 89
Axioms of Probability

I Probability assigns a unique number in [0,1] range to each


event
I Axioms of probability (by Kolmogorov in 1933)
I P(S) = 1
I P(A) ≥ 0 for
Pevery A ∈ B
I P(∪i Ai ) = i P(Ai ) for all A ∈ B is Ai are mutually disjoint
I Axioms are accepted without a proof.

11 / 89
Theorems

Suppose A and B are two events and Bi forms a partition of S


then
I P(Φ) = 0 and P(A) ≤ 1
I P(Ac ) = 1 − P(A)
I P(B ∩ Ac ) = P(B) − P(B ∩ A)
I P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
I If A ⊆ B then P(A) ≤ P(B)
P
I P(A) = i P(A ∩ Bi )
These theorems can be proven using axioms and other proven
theorems

12 / 89
Conditional Probability

If S is the sample space, B is Borel field and let A, B ∈ B then

P(A ∩ B)
P(A|B) =
P(B)

13 / 89
Independence

I If P(A|B) = P(A) then events A and B are independent.


I Observing event B has no effect (gives us no extra
information) on observation of event A.
I Events A and B are independent iff P(A ∩ B) = P(A)P(B)
I The following statements are equal
I A and B are independent
I Ac and B c are independent
I A and B c are independent
I Ac and B are independent
Proof:

P(Ac ∩ B) = P(B) − P(A ∩ B)


= P(B)(1 − P(A))
= P(B)P(Ac )

14 / 89
Mutual Independence

I A collection of events Ai are called mutually independent iff


every subcollection consists of independent events
I It is possible to have pairwise independent events, but the
whole set may not be mutually independent

Example:
Consider tossing a fair coin (S = {H, T })
A1 : H on the first toss
A2 : H on the second toss
A3 : same outcome on both tosses
Are events A1 , A2 , A3 mutually independent?

15 / 89
Bayes Theorem

Suppose A1 , ..., AN form a partition of the sample space S For an


arbitrary event B

P(Aj )P(B|Aj )
P(Aj |B) = PN
i=1 P(Ai )P(B|Ai )

I P(Ai ) is called prior information


I P(B|Ai ) is called likelihood
I P(Aj |B) is called posteriori probability

16 / 89
Bayes Theorem – Example 1

I Consider a rare disease that is seen 1 in every million.


I Consider a medical test that is 99% accurate. This means if a
person has this disease (case positive), the test will detect it
correctly with the probability of 0.99
I When someone takes this test and the test result turns out to
be positive, what is the probability of this person having this
disease?
I Prior information: P(disease+) = 0.000001
Likelihood: P(test + |disease+) = 0.99
Posterior probability: P(disease + |test+) =?

17 / 89
Bayes Theorem – Example 1

I Prior information: P(disease+) = 0.000001


Likelihood: P(test + |disease+) = 0.99
Posterior probability: P(disease + |test+) =?
I Using Bayes theorem:

P(test + |disease+)P(disease+)
P(disease + |test+) =
P(test+)
I What is P(test+) =?
I The test can result positive when there is disease (true
positive), or when there is no disease (false positive). Hence

P(test+) = P(test + |disease+)P(disease+)


+ P(test + |disease−)P(disease−)

18 / 89
Bayes Theorem – Example 1
I Prior information: P(disease+) = 0.000001
Likelihood: P(test + |disease+) = 0.99

P(test+) = P(test + |disease+)P(disease+)


+ P(test + |disease−)P(disease−)
= 0.99 × 0.000001 + 0.01 × 0.999999
= 0.01000097999901 ≈ 0.01
I Hence
P(test + |disease+)P(disease+)
P(disease + |test+) =
P(test+)
0.99 × 0.000001
=
0.01
= 0.000099 < 0.01%
I Although it is quite an accurate test, it seems meaningless
19 / 89
Bayes Theorem – Example 2

A cab was involved in a hit and run accident at night. Two cab
companies, the Green and the Blue, operate in the city. You are
given the following data:1
I 85% of the cabs in the city are Green and 15% are Blue.
I A witness identified the cab as Blue.
I The court tested the reliability of the witness under the same
circumstances that existed on the night of the accident and
concluded that the witness correctly identified each one of the
two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was
Blue rather than Green?

1
Example taken from A. Tversky, D. Kahneman, Evidential impact of base
rates, in Judgement under uncertainty: Heuristics and biases, D. Kahneman, P.
Slovic, A. Tversky (editors), Cambridge University Press, 1982
20 / 89
Bayes Theorem – Example 2
I Apriori probabilities: P(Green) = 0.85 and P(Blue) = 0.15
I Likelihoods: P(Witness = Blue|Blue) = 0.8
I From Bayes theorem
P(Witness = Blue|Blue) × P(Blue)
P(Blue|Witness = Blue) =
P(Witness = Blue)

P(Witness = Blue) = P(Witness = Blue|Blue) × P(Blue)


+ P(Witness = Blue|Green) × P(Green)
= 0.8 × 0.15 + 0.2 × 0.85
= 0.29

0.8 × 0.15
P(Blue|Witness = Blue) = ≈ 41%
0.29
21 / 89
Permutation and Combination
In a repeated trial, we want to enumerate the number of possible
outcomes (without repetition of objects)
I Permutation:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is important.
Pkn = n(n − 1)(n − 2)..(n − k + 1)
n!
=
(n − k)!
I Combination:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is NOT important.
 
n n n(n − 1)...(n − k + 1)
Ck = =
k k!
n!
=
k!(n − k!)
22 / 89
Properties of Combination

 
n
I C0n = = 1 if n > 0
0
I Ckn = Cn−k
n

I Binomial theorem: The combinations Ckn gives the binomial


coefficients.
n  
n
X n
(a + b) = ak b n−k
k
k=0

23 / 89
Permutation and Combination

In a repeated trial, we want to enumerate the number of possible


outcomes (with repetition of objects)
I Permutation:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is important.

P̃kn = nk

I Combination:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is NOT important.
 
n n+k −1
C̃k =
k

24 / 89
Permutation Examples
Example: How many different 2 digit numbers can you obtain
using digits {2, 5, 8} without repeating digits?
I Ordering is important as 25 6= 52
I For the first digit there are 3 options from {2, 5, 8}, for the
second digit there are 2 options.
I Hence there are 3 × 2 = 6 possibilities
258, 285, 528, 582, 825, 852
Example: Assuming 20 letters are used to form 3-letter license
plates. How many different possibilities if the letters can be
repeated?
I License plate ABC 6= ACB, ordering is important
I 20 × 20 × 20 = 8000
I If repeated letters is not permitted, then 20 × 19 × 18 = 6840

25 / 89
Combination Examples

Example: A thesis committee will be formed with 4 professors out


of 10 professors in a department. How many different committees
can be formed?
I The order of the committee members is not important.
I Hence there are C410 different committees that can be formed.
Example: There are

26 / 89
Combination Examples

Example: A thesis committee will be formed with 2 professors


from mechanical engineering and 3 professors from computer
engineering. If mechanical engineering department has 10 and
computer engineering department has 8 professors, how many
different committees can be formed?
I The order of the committee members is not important.
I There can be C210 different selection from mechanical eng.
and C38 different possibilities from computer eng.
I Hence, there area C210 × C38 different committees

27 / 89
Random Variables
I A random variable X is a mapping from the sample space S
to a subset X of the real line R
X :S →R
I Using a random variable (rv) a real number can be assigned
to an event/outcome
I For example, the experiment of coin flipping can generate
S = {H, T }

X :H → 1
T → −1
or
Y :H → 100
T → 40
Both X and Y are random variables.
28 / 89
Discrete Random Variables

I A discrete rv takes a finite or countably infinite number of


possible values with specific probabilities assigned to each
value.
I If X is a discrete rv it assigns discrete values such as x1 , x2 , ...
to the events/outcomes
I Then pi means probability of X generating the value of xi :
pi = P(X = xi )
I It is possible for X to assign multiple events/outcomes to a
certain value such asP xi . Hence
pi = P(X = xi ) = s∈S,X (s)=xi P(s)
I While assigning probabilities
I p
Pi ≥ 0 for all i
i pi = 1
I

29 / 89
Probability Mass Function (pmf)

I An assignment xi → pi is called discrete distribution or a


discrete probability distribution
I A function f (x) = P(X = x) for x ∈ X is called a probability
mass function (pmf)
I A pmf is discrete in values
I Why do we use “mass” in pmf?

30 / 89
Cumulative Distribution Function (cdf)

I F (x) = P(X ≤ x) = P{s : s ∈ SsuchthatX (s) ≤ s} is called


cumulative distribution function (cdf)
I Properties of cdf
I F (x) is a nondecreasing function
F (x) ≤ F (y ) for all x ≤ y where x, y ∈ R
I limx→−∞ F (x) = 0 and limx→∞ F (x) = 1
I F(x) is right continuous
For all x ∈ R, limh→0 F (x + h) = F (x)
Examples of valid and invalid cdf comes here!!!
I Probability of x is: p(x) = F (x) − F (x − )

31 / 89
Probability Density Function (pdf)

I For a continuous rv p(X = x) = 0


I Define f (x) associated with x ∈ R such that
I fR (x) ≥ 0 for all x ∈ R
I
R
f (x)dx = 1
I For a given pdf f(x)
Z
f (x ∈ A) = f (x)dx
A

which is the area under pdf for the given region A

32 / 89
Cumulative Distribution Function (cdf)
I cdf is defined as
Z ∞
F (x) = P(X ≤ x) = f (t)dt
−∞
for all x ∈ R
I F(x) should have finite or countably infinite number of
discontinuities.
I P(X < a) and P(X ≤ a) are the same, which is
Z a
F (a) = f (t)dt
−∞
I P(X > b) and P(X ≥ b)
Z ∞
1 − F (b) = f (t)dt
b
I P(a < X < b) or P(a ≤ X < b) or P(a < X ≤ b) or
P(a ≤ X ≤ b)
Z b
F (b) − F (a) = f (t)dt
a 33 / 89
Relation of pdf with cdf
For discrete rv
I pdf → cdf
x
X
F (x) = f (x)
−∞
I cdf → pdf
f (x) = F (x) − F (x − )
For continuous rv
I pdf → cdf Z x
F (x) = f (t)dt
−∞
I cdf → pdf
d
f (x) = F (t)
dt t=x

34 / 89
Expected Value of a Distribution

I All possible values of a rv X such that f (x) > 0 is called the


support of the distribution of X . Support of X is denoted by
X
I Expected value of a distribution is denoted by E (x)
I For discrete distributions:
X
E (X ) = xi f (xi )
xi ∈X

I For continuous distributions:


Z
E (X ) = xf (x)dx
X
I The expected value is also called the mean of the distribution
and typically denoted by µ

35 / 89
Variance of a Distribution

I The variance of rv X is
I For discrete distributions:
X
σx2 = (xi − µ)2 f (xi )
xi ∈X

I For continuous distributions:


Z
σx2 = (x − µ)2 f (x)dx
X
I Variance is a measure of deviation of a rv from its mean

36 / 89
Standard Deviation of a Distribution

I The square root of variance is called the standard deviation of


the distribution and it is typically denoted by σ
I For discrete distributions:
sX
σ= (xi − µ)2 f (xi )
xi ∈X

I For continuous distributions:


s Z
σ= σx2 = (x − µ)2 f (x)dx
X

I Both variance and standard deviation of all distributions are


nonnegative σ > 0 σ 2 > 0

37 / 89
What does SD mean?
I For a Gaussian (will cover this later) distributed rv X , the
range
I [µX − σ, µX + σ] contains %68.2
I [µX − 2σ, µX + 2σ] contains %95.4
I [µX − 3σ, µX + 3σ] contains %99.7
of the values of this rv.
I Hence for a continuous rv:

Z µ+σ
f (x)dx = 0.682
µ−σ
Z µ+2σ
f (x)dx = 0.954
µ−2σ
Z µ+3σ
f (x)dx = 0.997
µ−3σ

38 / 89
Expected Value of a Function of rv

I Consider a function of a rv: X → g (X )


I g (X ) is also a rv as it maps events/outcomes to another
subset of R
I The expected value of g (X ) is:
For discrete rv:
X
E (g (X )) = g (xi )f (xi )
xi ∈X

For continuous rv:


Z
E (g (X )) = g (x)f (x)dx
X

39 / 89
Mean an Variance of Translation and Scaling

Consider a rv X and constant values T and S


I Y =X +T
I Z = SX
What is the mean and variance of rv’s Y and Z ?

40 / 89
Expectation is a Linear Operator
I Expectation is a linear operator
I It can exchange order with other linear operators such as
summation, integration etc.
I For example:
Consider a series of functions
PN gi (X ) and constants ai . What is
the expected value of Y = i=1 ai gi (X )?
I The variance was given as:

σ 2 = E ((X − µx )2 )
= E (X 2 − 2X µx + µ2x )
= E (X 2 ) − 2µx E (X ) + µ2x
= E (X 2 ) − µ2x

Hence:

E (X 2 ) = σX2 + µ2X
41 / 89
Median of a distribution

I xm is the median of f (x) if F (xm ) = 0.5


I For a continous distribution:

P(x ≥ xm ) = P(X ≤ xm ) = 0.5


I How is it defined for discrete rv?

42 / 89
Mode of a distribution

I For discrete rv: The mode is the value x at which its pmf
takes its maximum value.
I For a continuous rv: The mode is the value x at which its pdf
has its maximum value

43 / 89
Comparison of Mean, Median and Mode
I Mode is the most likely value of an rv
that has the highest value of pmf/pdf
I Median is the value of an rv that
divides the pmf/pdf in half
I Mean is the value of an rv that is the
center of mass of pmf/pdf

44 / 89
Comparison of Mean, Median and Mode

I Mean median and mode have very close values for some
distributions
I For other distributions, their values can be quite different.

45 / 89
Comparison of Mean, Median and Mode

I If a random variable has symmetric (no skew) distribution, its


mean and median are the same.
I However, having the same median and mean does not
necessarily imply a symmetric distribution. For example:
Consider a discrete rv with a support of X = {−2, 0, 4}. The
probabilities for these values are P(X = −2) = 1/3,
P(X = 0) = 1/2, P(X = 4) = 1/6. Then:

1 1 1
µX = (−2) + 0 + 4
3 2 6
=0

and since CX (−2) = 1/3, CX (0) = 5/6, C (4) = 1 the median


of X is also 0.

46 / 89
Comparison of Mean, Median and Mode

I If a random variable has symmetric and unimodal (single


peak) distribution, its mode, mean and median are the same.
I If it is positively skewed then mode<median<mean
I If it is negatively skewed then mode>median>mean

47 / 89
Standard Probability Distributions

For discrete rv I Exponential


I Bernoulli I Chi-square
I Binomial I Lognormal
I Poisson I Student’s t
I Geometric I Cauchy
I Uniform I F
For continuous rv I Beta
I Uniform I Negative exponential
I Normal (Gaussian) I Weibull
I Standard normal I Rayleigh
I Gamma

48 / 89
Bernoulli Distribution

I Discrete distribution
I Single parameter p
I Bernoulli(p)

f (x) = P(X = x) = p x (1 − p)1−x

for x = {0, 1} and 0 ≤ p ≤ 1


I This means:
P(X = 0) = 1 − p
and
P(X = 1) = p

49 / 89
Bernoulli Distribution – Mean and Variance

f (x) = P(X = x) = p x (1 − p)1−x


Expected value:

E (X ) = µX = (1)p + (0)(1 − p)
=p

Variance:

E (X 2 ) = (12 )p + (02 )(1 − p)


=p

Hence:

σX2 = E (X 2 ) − µ2X
= p − p2
= p(1 − p)
50 / 89
Binomial Distribution
I Discrete distribution
I Two parameters (n, p)
I Binomial(n,p)
 
n
f (x) = p x (1 − p)(n−x)
x

I Repeated Bernoulli trials lead to Binomial distribution.


Example: If a fair coin is tossed 20 times, what is the
probability of getting 6 tails?
Probability of getting 6 tails and 19 heads with a particular
order is 0.56 (1 − 0.5)19 . There are C625 different cases (order
is not important) with 6 tails.
Example: If a fair coin is tossed 25 times, what is the
probability of getting 6 consecutive tails?

51 / 89
Binomial Distribution – Mean and Variance
Mean:
n  
X n
E (X ) = x p x (1 − p)(n−x)
x
x=0
n  
X n n−1
= p x (1 − p)(n−x)
x x −1
x=1
n  
X n n−1
= np p x−1 (1 − p)(n−x)
x x −1
x=1
= np[p + (1 − p)](n−1)
= np

Remember binomial theorem:


n  
X n
(a + b)n = ax b n−x
x
x=0

52 / 89
Binomial Distribution – Mean and Variance
Variance:
n  
X n−2
E (X (X − 1)) = n(n − 1)p 2 p x−2 (1 − p)n−x
x −2
x=2
= n(n − 1)p 2
Furthermore
E (X 2 ) = E (X (X − 1)) + E (X )
= n(n − 1)p 2 + np
Then
σX2 = E (X 2 ) − µ2X
= n(n − 1)p 2 + np − (np)2
= n2 p 2 − np 2 + np − n2 p 2
= np(1 − p)
53 / 89
Binomial Distribution – Example

For the same p value:


I Expected value increases linearly with n (see the shift in pdf)
I Variance increases linearly with n (see the expansion in pdf)
I Hence for a fixed p value, the pdf shifts right and expands as
n increases

54 / 89
Poisson Distribution

I Discrete distribution
I Single parameter λ > 0
I Poisson(λ)
e −λ λx
f (x) =
x!
I Mean: µX = λ (Derivation is left as an exercise)
I Variance: σX2 = λ
I Mean is equal to variance.

55 / 89
Poisson Distribution – Example

I As λ increases the mean increases linearly. Observe the shift


in the pdf.
I As λ increases the variance increases linearly. Observe the
expansion in the pdf.
56 / 89
Geometric Distribution

I Discrete distribution
I Single parameter p
I Geometric(p)
f (x) = p(1 − p)x−1
I x is the number of trials needed for the Bernoulli trials to
produce “1” for the first time. This can also be regarded as
the number of trials before a success.
I Hence, it should produce x − 1 times “0” and “1” in the x th
trial.
1
I Mean: µX = p
1−p
I Variance: σX2 = p2

57 / 89
Uniform Distribution

I Discrete distribution
I Each of the possible K outcomes are equally likely
1
f (xi ) =
K
for i ∈ {0, 1, ..., K − 1}
I Assuming xi ∈ [a, b] with b > a
I Mean:
a+b
µX =
2
I Variance:
(b − a + 1)2 − 1
σX2 =
12

58 / 89
Uniform Distribution

I Continuous distribution x ∈ R
I Two parameters: (a, b) with b > a
I Uniform(a,b)
1
f (x) =
b−a

59 / 89
Normal Distribution

I Continuous distribution x ∈ R
I Typically referred as Gaussian distribution
I Widely used
I Two parameters (µ, σ)
I N (µ, σ 2 )
(x − µ)2
 
1
f (x) = √ exp −
2πσ 2σ 2

60 / 89
Standard Normal Distribution

I Gaussian Distribution with zero mean and unit variance


I N (0, 1)  2
1 z
Φ(z) = √ exp −
2π 2
I It is possible to normalize X from N (µ, σ) into N (0, 1) with
the following transformation
X −µ
Z=
σ
I It is possible to normalize Z from N (0, 1) into N (µ, σ) with
the following transformation

X = σ(Z + µ)

61 / 89
Gamma Distribution

I Continuous distribution
I Two parameters (α, β) with 0 < α, β < 1
I Gamma(α, β)
 
1 x
f (x) = α exp − x α−1
β Γ(α) β
R∞
where 0 < α, β < 1 and Γ(α) = 0 e −x x α−1 dx
I For Γ function
I αΓ(α − 1) for any positive α. For large values of α
Γ(α) = √
Γ(α) ≈ 2πe −α αα−0.5 (Stirling’s approximation)
√ (n − 1)! for any positive integer n. For large values of n
Γ(n) =
I

n! ≈ 2πe√−n−1 nn+0.5
I Γ(1/2) = π

62 / 89
Exponential Distribution

I Continuous distribution
I Single parameter β
I Exp(β)  
1 x
f (x) = exp −
β β
where 0 < x, β < ∞
I This is a specific case of Gamma function with α = 1
I If β = 1 this distribution is called standard exponential
distribution
I cdf (
0 if − ∞ < 0 ≤ 0
F (X ) =
1 − e −x/β if x > 0

63 / 89
Chi-square Distribution

I Continuous distribution
I Single parameter υ (this is also called degrees of freedom)
I Chi(υ) is chi-square distribution with υ degrees of freedom

f (x) = Gamma(υ/2, 2)

64 / 89
Lognormal Distribution
I Continuous distribution
I Two parameters (µ, σ)
I pdf
(log(x) − µ)2
 
1
f (x) = √ exp −
2πσx 2σ 2
with 0 < x < ∞,−∞ < µ < ∞, and 0 < σ < ∞
I cdf  
log(x) − µ
F (X ) = Φ
σ
where Φ is the cdf of standard normal distribution
I  
x x −µ
P(log(X ) ≤ x) = P(X ≤ e ) = Φ
σ
I Logarithm of the rv X has N (µ, σ) distribution. Hence, it is
called lognormal distribution.
65 / 89
Student’s t Distribution

I Continuous distribution
I Single parameter υ (degrees of freedom)
I pdf
 − 1 (υ+1)
1 2
f (x) = a(υ) 1 + x 2
υ
where
1
a(υ) = √
υπ Γ(0.5(υ+1)
Γ(0.5υ)

for integer values of υ


I Distribution is symmetric around x = 0
I Discovered by W.S. Goset under pseudonym “Student” in
1908

66 / 89
Cauchy Distribution

I Continuous distribution
I Specific case of Student’s t distribution with υ = 1
I pdf
1 1
f (x) =
π 1 + x2
I cdf
1 1
F (x) = arctan(x) +
π 2
for x ∈ R

67 / 89
F Distribution

I Continuous distribution
I Two parameters (υ1 , υ2 )
I Order of parameters are important. Hence fυ1 ,υ2 (x) 6= fυ2 ,υ1 (x)
I pdf

υ1 −0.5(υ1 +υ2 )
 
0.5(υ1 −2)
f (x) = k(υ1 , υ2 )x 1+ x
υ2

where
υ1 0.5υ1 Γ(0.5(υ1 + υ2 ))
k(υ1 , υ2 ) =
υ2 Γ(0.5υ1 )Γ(0.5υ2 )

68 / 89
Beta Distribution
I Beta function
Z 1
b(α, β) = x α−1 (1 − x)β−1 dx
0
Γ(α)Γ(β)
=
Γ(α + β)

where α, β are positive real numbers.


I Beta distribution
I Continuous distribution
I Two parameters (α, β)
I Beta(α, β)
1
f (x) = x α−1 (1 − x)β−1
b(α, β)
I Beta(1,1) is equal to uniform distribution with range [0,1]

69 / 89
Negative Exponential Distribution

I Continuous distribution
I Two parameters (γ, β)
I pdf  
1 (x − γ)
f (x) = exp −
β β
where 0 < β < ∞, −∞ < γ < ∞ and γ < x < ∞

70 / 89
Weibull Distribution

I Continuous distribution
I Two parameters (α, β)
I pdf  
−α −α−1 x α
f (x) = αβ x exp ( )
β
where 0 < α, β and 0 < x < ∞

71 / 89
Rayleigh Distribution

I Continuous distribution
I Single parameter θ
I pdf  2
2 x
f (x) = x exp −
θ θ
where 0 < x < ∞ and 0 < θ

72 / 89
Laplace Distribution

I Continuous distribution
I Two parameters (µ, σ)
I pdf  
1 |x − µ|
f (x) = exp −
2σ σ

73 / 89
Moments

I r th moment of a rv X is

nr = E (X r )

I For r = 1, the first moment of an rv is its mean

74 / 89
Central Moments

I r th central moment of a rv X is

cr = E ((X − µ)r )

I c1 = 0
I c2 = σ 2
I If nr is finite then
I cr is finite
I ns is finite for s ∈ {1, 2, ..., r − 1}
I cs is finite for s ∈ {1, 2, ..., r − 1}

75 / 89
Moment Generating Function

MX (t) = E (e (tX ) )

I Why do we need it?


Remember
t 2X 2
e tX = 1 + tX + + ...
2!
I Hence the moments of an rv X can be computed from its
moment generating function

dr


nr = r Mx (t)
dt t=0

76 / 89
Moment Generating Function of Distributions

The moment generating function of some of the distributions that


we have covered
I Bernoulli: MX (t) = ((1 − p) + pe t )n
t)
I Poisson: MX (t) = e (−λ+λe
2 µ2 )
I Normal: MX (t) = e (tµ+0.5t
I Gamma: MX (t) = (1 − βt)−λ for all t < 1/β
I Exponential: MX (t) = (1 − βt)−1
I Chi-square: MX (t) = (1 − 2υ)−0.5υ for t < 0.5

77 / 89
Functions of Random Variables

I Functions of an rv are also rv


I Consider a function that maps an rv X into another rv Y ie.
Y = g (X )
I If the pdf of X is given as fX (x), how do we find the pdf of
Y , fY (y )? How are they related?

78 / 89
Functions of Discrete Random Variables

I The function g (.) may or may not be a one-to-one function.


X Y X Y
1 D 1 D
2 B 2 B
3 C 3 C
A 4

One-to-one function (injection) Not one-to-one (surjection)

I If function g (.) is a one-to-one function there will be a single


yi for each xi value. Hence: P(Y = yi ) = P(X = xi ) for
yi = g (xi ). Then
f (yi ) = f (xi )
for yi = g (xi )

79 / 89
Functions of Discrete Random Variables

I If function g (.) is not a one-to-one function Hence there may


be many xi that maps to the same yi such as yi = g (xi ).
I In this case
X
P(Y = yi ) = P(X = xj )
j,yi =g (xj )

and X
f (yi ) = f (xj )
j,yi =g (xj )

80 / 89
Functions of Continuous Random Variables
I Continuous functions can also be one-to-one or not.
im f
y
y

y Y
y Y2 im f
Y1 , Y 2 Y
y Y1

x
x X x

f :X Y x X1 x X2
f :X Y
y f x y f x X1 , X 2 X

One-to-one Not one-to-one

I To find fY (y ), first consider

P(y ≤ Y ≤ y + dy ) = fY (y )dy = FY (y + dy ) − FY (y )

81 / 89
Functions of Continuous Random Variables
I If the function is one-to-one then

P(y ≤ Y ≤ y + dy ) = P(x ≤ X ≤ x + dx)

which is
fY (y )dy = fX (x)dx
I If the function is not one-to-one and there are N different
values of x that maps to the same y value: y = g (xi )
i = 1, 2..N. In this case

P(y ≤ Y ≤ y + dy ) = P(x1 ≤ X ≤ x1 + dx)


+ P(x2 ≤ X ≤ x2 + dx) + · · ·
+ P(xN ≤ X ≤ xN + dx)

which is

fY (y )dy = fX (x1 )dx + fX (x2 )dx + · · · + fX (xN )dx


82 / 89
Functions of Continuous Random Variables

From the graph dy /dx = g 0 (x). Hence dx = dy /g 0 (x)

dy dy dy
fY (y )dy = fX (x1 ) + fX (x2 ) 0 + · · · + fX (xN ) 0
|g 0 (x1 )| |g (x2 )| |g (xN )|

I Absolute value is taken to avoid negativity in pdf


Finally:
X fX (xi )
fY (y ) =
|g (xi )|
i,y =g (xi )

83 / 89
Example for Function of Discrete RV – 1
I One-to-one discrete rv Y = 3X + 2 and X = {1, 2, 3}.
12

10

8
Y

0
0 0.5 1 1.5 2 2.5 3 3.5 4
X

0.6 0.6

0.4 0.4
fX(x)

fY(y)

0.2 0.2

0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4 5 6 7 8 9 10 11 12
X y

84 / 89
Example for Function of Continuous RV – 1

I Consider a linear transformation Y = g (X ) = aX + b where a


and b are constant real numbers.
d y −b
I For this function dX (aX + b) = a and x = a . Hence

fX (x) fX ( y −b
a )
fY (y ) = =
|a| |a|

Linear transformation of rv
Linear transformation of a random variable does not change the
type of distribution (ie. uniform, Gaussian etc.). It may change the
parameters such as mean, variance etc.
I If X has uniform distribution in [x1 , x2 ] range, then Y has also
uniform distribution in [ax1 + b, ax2 + b] range.

85 / 89
Example for Function of Continuous RV – 2

I Consider Y = g (X ) = 1/X . Find fY (y ) in terms of fX (x).


I This is a one-to-on function with a single value of X such that
X = 1/Y .

1
g 0 (X ) = −
X2
1
=− = −Y 2
(1/Y )2
I Hence  
1 1
fY (y ) = 2 fX
Y Y

86 / 89
Example for Function of Continuous RV – 3
I Consider Y = g (X ) = aX 2 where a > 0 ∈ R is a constant.
Find fY (y ) in terms of fX (x).
I This is not a one-to-on function
I For y < 0 there is no x p
I For y >p0 there are two values of X: x1 = y /a and
x1 = − y /a
that satisfies Y = g (X ) = aX 2 .
I g 0 (X ) = 2aX and

1/|g 0 (X )| = 1/(2aX ) = 1/(2a y /) = 1/(2 ay ) for both
p
p p
x1 = y /a and x1 = − y /a when y > 0
I Then
( q q
y
√1
2 ay (fX ( a) + fX (− ya )) if y > 0
fY (y ) =
0 if y < 0

87 / 89
Example for Function of Continuous RV – 4
I Consider Y = g (X ) = e X . Find fY (y ) in terms of fX (x).
I This is not a one-to-on function with a single solution at
x = ln y
I

1 1
=
|g 0 (x)| ex
1
= ln y
e
1
=
y
I Note that y > 0 for all values of x
I Then (
1
y fX (ln y ) if y > 0
fY (y ) =
0 if y ≤ 0

88 / 89
PDF Conversion

I We have seen how to find the pdf of a function of a rv.


I Now a different problem: How can we convert a rv X with
fX (x) to another rv Y with fY (y ) using a function y = g (x)?
I We will use 2 steps
1. Convert X into another temporary rv Z that has uniform
distribution in [0, 1]
2. Convert Z into Y

89 / 89

Вам также может понравиться