Академический Документы
Профессиональный Документы
Культура Документы
These slides are licensed under a Creative Commons Attribution 4.0 License.
License: https://creativecommons.org/licenses/by-nc-nd/4.0/
v2018.10.03
1 / 89
Stochastic vs. Deterministic Systems
I Deterministic system
I no randomness
I same output for the same input/other conditions
I Stochastic system
I randomness due to
I Limited capabilities of production, measurement,
I various unknown factors (noise, uncertain parameters etc.)
I different output even for the same input/other conditions
I Only of the theoretical systems are deterministic. Their
physical implementations and measurements are stochastic.
2 / 89
Stochastic vs. Deterministic Systems
3 / 89
Sets
A = {ζ1 , ζ2 , · · · , ζN }
4 / 89
Set Operators
I Complement:
Ac = {x : x ∈ S but x ∈
/ A}
I Union:
A ∪ B = {x : x ∈ A or x ∈ B}
I Intersection:
A ∩ B = {x : x ∈ A and x ∈ B}
I Symmetric difference:
A 4 B = (Ac ∩ B) ∪ (A ∩ B c )
5 / 89
Properties of Sets
6 / 89
Disjoint Sets
7 / 89
De Morgan Law
[A ∩ (B ∪ Φ)]c = Ac ∪ (B c ∩ S)
8 / 89
Duality
9 / 89
Sample space and empty set
I S: sample space / certain event
It is the set of all possible outcomes/events
I Φ: empty set/ impossible event
I Field:
Let Ai be a subset of S
If Ai are finitely many F = {Ai : Ai ⊆ S, i ≤ N} and
I Φ∈F
I If Ai ∈ F then Aci ∈ F
I If Ai ∈ F for i = 1, 2...N then ∪N
i=1 Ai ∈ F
Then F is called a field.
I Borel field:
If Ai are infinitely many then it is called a Borel field.
A Borel field is closed under complement and countable union
operations
I Suppose B = {Ai : Ai ⊆ S and i ∈ N} is a Borel field. Any
subset A of S is called and event iff A ∈ B
10 / 89
Axioms of Probability
11 / 89
Theorems
12 / 89
Conditional Probability
P(A ∩ B)
P(A|B) =
P(B)
13 / 89
Independence
14 / 89
Mutual Independence
Example:
Consider tossing a fair coin (S = {H, T })
A1 : H on the first toss
A2 : H on the second toss
A3 : same outcome on both tosses
Are events A1 , A2 , A3 mutually independent?
15 / 89
Bayes Theorem
P(Aj )P(B|Aj )
P(Aj |B) = PN
i=1 P(Ai )P(B|Ai )
16 / 89
Bayes Theorem – Example 1
17 / 89
Bayes Theorem – Example 1
P(test + |disease+)P(disease+)
P(disease + |test+) =
P(test+)
I What is P(test+) =?
I The test can result positive when there is disease (true
positive), or when there is no disease (false positive). Hence
18 / 89
Bayes Theorem – Example 1
I Prior information: P(disease+) = 0.000001
Likelihood: P(test + |disease+) = 0.99
A cab was involved in a hit and run accident at night. Two cab
companies, the Green and the Blue, operate in the city. You are
given the following data:1
I 85% of the cabs in the city are Green and 15% are Blue.
I A witness identified the cab as Blue.
I The court tested the reliability of the witness under the same
circumstances that existed on the night of the accident and
concluded that the witness correctly identified each one of the
two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was
Blue rather than Green?
1
Example taken from A. Tversky, D. Kahneman, Evidential impact of base
rates, in Judgement under uncertainty: Heuristics and biases, D. Kahneman, P.
Slovic, A. Tversky (editors), Cambridge University Press, 1982
20 / 89
Bayes Theorem – Example 2
I Apriori probabilities: P(Green) = 0.85 and P(Blue) = 0.15
I Likelihoods: P(Witness = Blue|Blue) = 0.8
I From Bayes theorem
P(Witness = Blue|Blue) × P(Blue)
P(Blue|Witness = Blue) =
P(Witness = Blue)
0.8 × 0.15
P(Blue|Witness = Blue) = ≈ 41%
0.29
21 / 89
Permutation and Combination
In a repeated trial, we want to enumerate the number of possible
outcomes (without repetition of objects)
I Permutation:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is important.
Pkn = n(n − 1)(n − 2)..(n − k + 1)
n!
=
(n − k)!
I Combination:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is NOT important.
n n n(n − 1)...(n − k + 1)
Ck = =
k k!
n!
=
k!(n − k!)
22 / 89
Properties of Combination
n
I C0n = = 1 if n > 0
0
I Ckn = Cn−k
n
23 / 89
Permutation and Combination
P̃kn = nk
I Combination:
The number of possible arrangements of k objects from a
collection of n objects when the ordering is NOT important.
n n+k −1
C̃k =
k
24 / 89
Permutation Examples
Example: How many different 2 digit numbers can you obtain
using digits {2, 5, 8} without repeating digits?
I Ordering is important as 25 6= 52
I For the first digit there are 3 options from {2, 5, 8}, for the
second digit there are 2 options.
I Hence there are 3 × 2 = 6 possibilities
258, 285, 528, 582, 825, 852
Example: Assuming 20 letters are used to form 3-letter license
plates. How many different possibilities if the letters can be
repeated?
I License plate ABC 6= ACB, ordering is important
I 20 × 20 × 20 = 8000
I If repeated letters is not permitted, then 20 × 19 × 18 = 6840
25 / 89
Combination Examples
26 / 89
Combination Examples
27 / 89
Random Variables
I A random variable X is a mapping from the sample space S
to a subset X of the real line R
X :S →R
I Using a random variable (rv) a real number can be assigned
to an event/outcome
I For example, the experiment of coin flipping can generate
S = {H, T }
X :H → 1
T → −1
or
Y :H → 100
T → 40
Both X and Y are random variables.
28 / 89
Discrete Random Variables
29 / 89
Probability Mass Function (pmf)
30 / 89
Cumulative Distribution Function (cdf)
31 / 89
Probability Density Function (pdf)
32 / 89
Cumulative Distribution Function (cdf)
I cdf is defined as
Z ∞
F (x) = P(X ≤ x) = f (t)dt
−∞
for all x ∈ R
I F(x) should have finite or countably infinite number of
discontinuities.
I P(X < a) and P(X ≤ a) are the same, which is
Z a
F (a) = f (t)dt
−∞
I P(X > b) and P(X ≥ b)
Z ∞
1 − F (b) = f (t)dt
b
I P(a < X < b) or P(a ≤ X < b) or P(a < X ≤ b) or
P(a ≤ X ≤ b)
Z b
F (b) − F (a) = f (t)dt
a 33 / 89
Relation of pdf with cdf
For discrete rv
I pdf → cdf
x
X
F (x) = f (x)
−∞
I cdf → pdf
f (x) = F (x) − F (x − )
For continuous rv
I pdf → cdf Z x
F (x) = f (t)dt
−∞
I cdf → pdf
d
f (x) = F (t)
dt t=x
34 / 89
Expected Value of a Distribution
35 / 89
Variance of a Distribution
I The variance of rv X is
I For discrete distributions:
X
σx2 = (xi − µ)2 f (xi )
xi ∈X
36 / 89
Standard Deviation of a Distribution
37 / 89
What does SD mean?
I For a Gaussian (will cover this later) distributed rv X , the
range
I [µX − σ, µX + σ] contains %68.2
I [µX − 2σ, µX + 2σ] contains %95.4
I [µX − 3σ, µX + 3σ] contains %99.7
of the values of this rv.
I Hence for a continuous rv:
Z µ+σ
f (x)dx = 0.682
µ−σ
Z µ+2σ
f (x)dx = 0.954
µ−2σ
Z µ+3σ
f (x)dx = 0.997
µ−3σ
38 / 89
Expected Value of a Function of rv
39 / 89
Mean an Variance of Translation and Scaling
40 / 89
Expectation is a Linear Operator
I Expectation is a linear operator
I It can exchange order with other linear operators such as
summation, integration etc.
I For example:
Consider a series of functions
PN gi (X ) and constants ai . What is
the expected value of Y = i=1 ai gi (X )?
I The variance was given as:
σ 2 = E ((X − µx )2 )
= E (X 2 − 2X µx + µ2x )
= E (X 2 ) − 2µx E (X ) + µ2x
= E (X 2 ) − µ2x
Hence:
E (X 2 ) = σX2 + µ2X
41 / 89
Median of a distribution
42 / 89
Mode of a distribution
I For discrete rv: The mode is the value x at which its pmf
takes its maximum value.
I For a continuous rv: The mode is the value x at which its pdf
has its maximum value
43 / 89
Comparison of Mean, Median and Mode
I Mode is the most likely value of an rv
that has the highest value of pmf/pdf
I Median is the value of an rv that
divides the pmf/pdf in half
I Mean is the value of an rv that is the
center of mass of pmf/pdf
44 / 89
Comparison of Mean, Median and Mode
I Mean median and mode have very close values for some
distributions
I For other distributions, their values can be quite different.
45 / 89
Comparison of Mean, Median and Mode
1 1 1
µX = (−2) + 0 + 4
3 2 6
=0
46 / 89
Comparison of Mean, Median and Mode
47 / 89
Standard Probability Distributions
48 / 89
Bernoulli Distribution
I Discrete distribution
I Single parameter p
I Bernoulli(p)
49 / 89
Bernoulli Distribution – Mean and Variance
E (X ) = µX = (1)p + (0)(1 − p)
=p
Variance:
Hence:
σX2 = E (X 2 ) − µ2X
= p − p2
= p(1 − p)
50 / 89
Binomial Distribution
I Discrete distribution
I Two parameters (n, p)
I Binomial(n,p)
n
f (x) = p x (1 − p)(n−x)
x
51 / 89
Binomial Distribution – Mean and Variance
Mean:
n
X n
E (X ) = x p x (1 − p)(n−x)
x
x=0
n
X n n−1
= p x (1 − p)(n−x)
x x −1
x=1
n
X n n−1
= np p x−1 (1 − p)(n−x)
x x −1
x=1
= np[p + (1 − p)](n−1)
= np
52 / 89
Binomial Distribution – Mean and Variance
Variance:
n
X n−2
E (X (X − 1)) = n(n − 1)p 2 p x−2 (1 − p)n−x
x −2
x=2
= n(n − 1)p 2
Furthermore
E (X 2 ) = E (X (X − 1)) + E (X )
= n(n − 1)p 2 + np
Then
σX2 = E (X 2 ) − µ2X
= n(n − 1)p 2 + np − (np)2
= n2 p 2 − np 2 + np − n2 p 2
= np(1 − p)
53 / 89
Binomial Distribution – Example
54 / 89
Poisson Distribution
I Discrete distribution
I Single parameter λ > 0
I Poisson(λ)
e −λ λx
f (x) =
x!
I Mean: µX = λ (Derivation is left as an exercise)
I Variance: σX2 = λ
I Mean is equal to variance.
55 / 89
Poisson Distribution – Example
I Discrete distribution
I Single parameter p
I Geometric(p)
f (x) = p(1 − p)x−1
I x is the number of trials needed for the Bernoulli trials to
produce “1” for the first time. This can also be regarded as
the number of trials before a success.
I Hence, it should produce x − 1 times “0” and “1” in the x th
trial.
1
I Mean: µX = p
1−p
I Variance: σX2 = p2
57 / 89
Uniform Distribution
I Discrete distribution
I Each of the possible K outcomes are equally likely
1
f (xi ) =
K
for i ∈ {0, 1, ..., K − 1}
I Assuming xi ∈ [a, b] with b > a
I Mean:
a+b
µX =
2
I Variance:
(b − a + 1)2 − 1
σX2 =
12
58 / 89
Uniform Distribution
I Continuous distribution x ∈ R
I Two parameters: (a, b) with b > a
I Uniform(a,b)
1
f (x) =
b−a
59 / 89
Normal Distribution
I Continuous distribution x ∈ R
I Typically referred as Gaussian distribution
I Widely used
I Two parameters (µ, σ)
I N (µ, σ 2 )
(x − µ)2
1
f (x) = √ exp −
2πσ 2σ 2
60 / 89
Standard Normal Distribution
X = σ(Z + µ)
61 / 89
Gamma Distribution
I Continuous distribution
I Two parameters (α, β) with 0 < α, β < 1
I Gamma(α, β)
1 x
f (x) = α exp − x α−1
β Γ(α) β
R∞
where 0 < α, β < 1 and Γ(α) = 0 e −x x α−1 dx
I For Γ function
I αΓ(α − 1) for any positive α. For large values of α
Γ(α) = √
Γ(α) ≈ 2πe −α αα−0.5 (Stirling’s approximation)
√ (n − 1)! for any positive integer n. For large values of n
Γ(n) =
I
n! ≈ 2πe√−n−1 nn+0.5
I Γ(1/2) = π
62 / 89
Exponential Distribution
I Continuous distribution
I Single parameter β
I Exp(β)
1 x
f (x) = exp −
β β
where 0 < x, β < ∞
I This is a specific case of Gamma function with α = 1
I If β = 1 this distribution is called standard exponential
distribution
I cdf (
0 if − ∞ < 0 ≤ 0
F (X ) =
1 − e −x/β if x > 0
63 / 89
Chi-square Distribution
I Continuous distribution
I Single parameter υ (this is also called degrees of freedom)
I Chi(υ) is chi-square distribution with υ degrees of freedom
f (x) = Gamma(υ/2, 2)
64 / 89
Lognormal Distribution
I Continuous distribution
I Two parameters (µ, σ)
I pdf
(log(x) − µ)2
1
f (x) = √ exp −
2πσx 2σ 2
with 0 < x < ∞,−∞ < µ < ∞, and 0 < σ < ∞
I cdf
log(x) − µ
F (X ) = Φ
σ
where Φ is the cdf of standard normal distribution
I
x x −µ
P(log(X ) ≤ x) = P(X ≤ e ) = Φ
σ
I Logarithm of the rv X has N (µ, σ) distribution. Hence, it is
called lognormal distribution.
65 / 89
Student’s t Distribution
I Continuous distribution
I Single parameter υ (degrees of freedom)
I pdf
− 1 (υ+1)
1 2
f (x) = a(υ) 1 + x 2
υ
where
1
a(υ) = √
υπ Γ(0.5(υ+1)
Γ(0.5υ)
66 / 89
Cauchy Distribution
I Continuous distribution
I Specific case of Student’s t distribution with υ = 1
I pdf
1 1
f (x) =
π 1 + x2
I cdf
1 1
F (x) = arctan(x) +
π 2
for x ∈ R
67 / 89
F Distribution
I Continuous distribution
I Two parameters (υ1 , υ2 )
I Order of parameters are important. Hence fυ1 ,υ2 (x) 6= fυ2 ,υ1 (x)
I pdf
υ1 −0.5(υ1 +υ2 )
0.5(υ1 −2)
f (x) = k(υ1 , υ2 )x 1+ x
υ2
where
υ1 0.5υ1 Γ(0.5(υ1 + υ2 ))
k(υ1 , υ2 ) =
υ2 Γ(0.5υ1 )Γ(0.5υ2 )
68 / 89
Beta Distribution
I Beta function
Z 1
b(α, β) = x α−1 (1 − x)β−1 dx
0
Γ(α)Γ(β)
=
Γ(α + β)
69 / 89
Negative Exponential Distribution
I Continuous distribution
I Two parameters (γ, β)
I pdf
1 (x − γ)
f (x) = exp −
β β
where 0 < β < ∞, −∞ < γ < ∞ and γ < x < ∞
70 / 89
Weibull Distribution
I Continuous distribution
I Two parameters (α, β)
I pdf
−α −α−1 x α
f (x) = αβ x exp ( )
β
where 0 < α, β and 0 < x < ∞
71 / 89
Rayleigh Distribution
I Continuous distribution
I Single parameter θ
I pdf 2
2 x
f (x) = x exp −
θ θ
where 0 < x < ∞ and 0 < θ
72 / 89
Laplace Distribution
I Continuous distribution
I Two parameters (µ, σ)
I pdf
1 |x − µ|
f (x) = exp −
2σ σ
73 / 89
Moments
I r th moment of a rv X is
nr = E (X r )
74 / 89
Central Moments
I r th central moment of a rv X is
cr = E ((X − µ)r )
I c1 = 0
I c2 = σ 2
I If nr is finite then
I cr is finite
I ns is finite for s ∈ {1, 2, ..., r − 1}
I cs is finite for s ∈ {1, 2, ..., r − 1}
75 / 89
Moment Generating Function
MX (t) = E (e (tX ) )
dr
nr = r Mx (t)
dt t=0
76 / 89
Moment Generating Function of Distributions
77 / 89
Functions of Random Variables
78 / 89
Functions of Discrete Random Variables
79 / 89
Functions of Discrete Random Variables
and X
f (yi ) = f (xj )
j,yi =g (xj )
80 / 89
Functions of Continuous Random Variables
I Continuous functions can also be one-to-one or not.
im f
y
y
y Y
y Y2 im f
Y1 , Y 2 Y
y Y1
x
x X x
f :X Y x X1 x X2
f :X Y
y f x y f x X1 , X 2 X
P(y ≤ Y ≤ y + dy ) = fY (y )dy = FY (y + dy ) − FY (y )
81 / 89
Functions of Continuous Random Variables
I If the function is one-to-one then
which is
fY (y )dy = fX (x)dx
I If the function is not one-to-one and there are N different
values of x that maps to the same y value: y = g (xi )
i = 1, 2..N. In this case
which is
dy dy dy
fY (y )dy = fX (x1 ) + fX (x2 ) 0 + · · · + fX (xN ) 0
|g 0 (x1 )| |g (x2 )| |g (xN )|
83 / 89
Example for Function of Discrete RV – 1
I One-to-one discrete rv Y = 3X + 2 and X = {1, 2, 3}.
12
10
8
Y
0
0 0.5 1 1.5 2 2.5 3 3.5 4
X
0.6 0.6
0.4 0.4
fX(x)
fY(y)
0.2 0.2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4 5 6 7 8 9 10 11 12
X y
84 / 89
Example for Function of Continuous RV – 1
fX (x) fX ( y −b
a )
fY (y ) = =
|a| |a|
Linear transformation of rv
Linear transformation of a random variable does not change the
type of distribution (ie. uniform, Gaussian etc.). It may change the
parameters such as mean, variance etc.
I If X has uniform distribution in [x1 , x2 ] range, then Y has also
uniform distribution in [ax1 + b, ax2 + b] range.
85 / 89
Example for Function of Continuous RV – 2
1
g 0 (X ) = −
X2
1
=− = −Y 2
(1/Y )2
I Hence
1 1
fY (y ) = 2 fX
Y Y
86 / 89
Example for Function of Continuous RV – 3
I Consider Y = g (X ) = aX 2 where a > 0 ∈ R is a constant.
Find fY (y ) in terms of fX (x).
I This is not a one-to-on function
I For y < 0 there is no x p
I For y >p0 there are two values of X: x1 = y /a and
x1 = − y /a
that satisfies Y = g (X ) = aX 2 .
I g 0 (X ) = 2aX and
√
1/|g 0 (X )| = 1/(2aX ) = 1/(2a y /) = 1/(2 ay ) for both
p
p p
x1 = y /a and x1 = − y /a when y > 0
I Then
( q q
y
√1
2 ay (fX ( a) + fX (− ya )) if y > 0
fY (y ) =
0 if y < 0
87 / 89
Example for Function of Continuous RV – 4
I Consider Y = g (X ) = e X . Find fY (y ) in terms of fX (x).
I This is not a one-to-on function with a single solution at
x = ln y
I
1 1
=
|g 0 (x)| ex
1
= ln y
e
1
=
y
I Note that y > 0 for all values of x
I Then (
1
y fX (ln y ) if y > 0
fY (y ) =
0 if y ≤ 0
88 / 89
PDF Conversion
89 / 89