Академический Документы
Профессиональный Документы
Культура Документы
Data Science
Statistical Models
Hanspeter Pfister, Joe Blitzstein, and Verena Kaynig
data y
(observed)
statistical inference
probability
model parameter
(unobserved)
This Week
source: http://extensionengine.com/drowning-in-data-the-biggest-hurdle-for-mooc-proliferation/
data y
(observed)
statistical inference
probability
model parameter
(unobserved)
All models are wrong, but some models are useful. George Box
0.8
0.4
cdf
0.6
0.2
0.4
0.0
0.2
0.0
,x > 0
0.8
1.0
1.2
f (x) = e
0.0
0.5
1.0
1.5
x
2.0
2.5
3.0
0.0
0.5
1.0
1.5
x
2.0
2.5
3.0
Exponential Distribution
f (x) = e
,x > 0
Bin
(Bern)
Limit
Conjugacy
Conditioning
Beta
Pois
Limit
Poisson process
(Unif)
Conjugacy
Normal
Gamma
Limit
(Expo, Chi-Square)
NBin
Limit
(Geom)
Limit
Student-t
(Cauchy)
Binomial Distribution
Figure 3.6 shows plots of the Binomial PMF for various values of n and p. Note that
the PMF of the Bin(10, 1/2) distribution is symmetric about 5, but when the success
probability is not 1/2, the PMF is skewed. For a fixed number of trials n, X tends to be
larger when the success probability is high and lower when the success probability is low,
as we would expect from the story of the Binomial distribution. Also recall that in any
PMF plot, the sum of the heights of the vertical bars must be 1.
0.30
Bin(10,1/2)
0.25
0.1
10
Bin(100,0.03)
Bin(9,4/5)
10
0.3
0.25
0.4
0.30
0.20
0.10
0.2
pmf
0.15
0.1
6
x
10
0.0
0.00
0.05
0.0
0.05
0.00
pmf
0.2
pmf
0.15
0.10
pmf
0.20
0.3
6
x
10
Binomial Distribution
story: X~Bin(n,p) is the number of successes in n
independent Bernoulli(p) trials.
Example: # votes for candidate A in election with n voters,
where each independently votes for A with probability p
(Doonesbury)
Poisson
Poisson Distribution
e
k!
k = 0, 1, 2, . . . .
P1
k=0 k!
=e .
1.0
1.0
0.4
0.4
CDF
PMF
0.6
0.6
0.2
0.2
Pois(2)
0.8
0.8
0.0
3
x
3
x
1.0
1.0
0.0
CDF
PMF
0.4
0.4
6
x
FIGURE 4.7
0.2
10
0.0
0.2
0.0
Pois(5)
0.6
0.6
0.8
0.8
6
x
10
P (N 2 B)| min 1,
X
n
2
pj .
j=1
if X is the number of events that occur, where the ith event has probability
provides
anNupper
much
error
is incurred
from using
pi , and
Pois( bound
), with on
thehow
average
number
of events
that occur.
a
ximation, not only for approximating the PMF of X, but also for appr
e probability that
X
is
any
set.
Also,
it
makes
more
precise
how
sma
P
2 to be very
in pmatching
problem,
theatprobability
of
d be: weExample:
want nj=1
small,
or
least
very
small comp
j
no match
is approximately
1/e. known as the Stei
e result can be shown
using an
advanced technique
d.
Poisson paradigm is also called the law of rare events. The interpret
is that the pj are small, not that is small. For example, in the email e
w probability of getting an email from a specific person in a particular
by the large number of people who could send you an email in that hou
Pr( Y i = y|q
q = 1.83)
0.02 0.04 0.06 0.08
0.30
0.00
0.00
Pr(Y i = y i )
0.10
0.20
Poisson model
empirical distribution
2
4
6
number of children
10
20
number
symmetry
central limit theorem
characterizations (e.g., via entropy)
68-95-99.7% rule
Wikipedia
Wikipedia
http://www.etsy.com/shop/NausicaaDistribution
reps
3.142
2.718
1.414
0.693
1.618
1.414
2.718
0.693
0.693
2.718
1.618
3.142
1.618
1.414
3.142
1.618
0.693
2.718
2.718
1.414
0.693
1.414
3.142
1.618
3.142
2.718
1.618
3.142
2.718
0.693
1.414
0.693
1.618
3.142
3.142
Bootstrap World
source: http://pubs.sciepub.com/ijefm/3/3/2/
which is based on diagram in Efron-Tibshirani book