Академический Документы
Профессиональный Документы
Культура Документы
Data Science
Bayesian Methods
Hanspeter Pfister & Joe Blitzstein
pfister@seas.harvard.edu / blitzstein@stat.harvard.edu
Freq Bayes
FB
This Week
HW3 due next Thursday (Oct 17) at 11:59 pm
start now!
http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-
for-Hackers/master/Prologue/Prologue.ipynb
Full Probability Modeling
The process of Bayesian data analysis can be idealized by
dividing it into the following three steps:
1.Setting up a full probability model a joint probability
distribution for all observable and unobservable quantities
in a problem...
2.Conditioning on observed data: calculating and
interpreting the appropriate posterior distribution the
conditional probability distribution of the unobserved
quantities of ultimate interest, given the observed data.
3.Evaluating the fit of the model and the implications of the
resulting posterior distribution...
-- Gelman et al, Bayesian Data Analysis
Conjugate Priors
http://www.johndcook.com/conjugate_prior_diagram.html
Ranking Reddit Comments:
Example from Probabilistic Programming and Bayesian Methods for Hackers
http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-
Methods-for-Hackers/master/Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb
Ranking Reddit Comments: A Simple Model
http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/
Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb
Ranking Reddit Comments by Posterior Quantiles
http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/
Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb
Bayesian Bandits
Example from Probabilistic Programming and Bayesian Methods for Hackers
http://research.microsoft.com/en-us/projects/bandits/
http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/
Chapter6_Priorities/Priors.ipynb
to set to zero is the Bayesian approach described abov
ethod is the lasso, LASSOintroduced by Tibshirani (1996) a
and Sparsity
by many others. The lasso estimate is the value lasso
In a linear regression model, in place of minimizing the sum
SSR( : ),ofasquaredmodified version of the sum
residuals, LASSO says to minimize
of squared re
n
X p
X
SSR( : ) = (yi T
xi ) +
2
| j| .
i=1 j=1
Bayesian interpretation: posterior mode, with independent
Laplace priors on the parameters.
1.0
0.8
0.6
f(x)
0.4
0.2
-3 -2 -1 0 1 2 3
Kidney Cancer Example from Bayesian Data Analysis (Gelman et al)
The issue is sample size. Consider a county of population 1000. Kidney cancer is a rare
disease, and, in any ten-year period, a county of 1000 will probably have zero kidney
cancer deaths, so that it will be tied for the lowest rate in the country and will be shaded
in Figure 2.8. However, there is a chance the county will have one kidney cancer
Figure 2.7 The counties of the United States with the highest 10% age-
standardized death rates for cancer of kidney/ureter for U.S. white
males, 19801989. Why are most of the shaded counties in the middle
of the country? See Section 2.8 for discussion.
each based on different data but with a common prior distribution. In addition to
illustrating the role of the prior distribution, this example introduces hierarchical
modeling, to which we return in Chapter 5.
Figure 2.7 The counties of the United States with the highest 10% age-
simple model: y Pois(10n )
standardized death rates for cancer of kidney/ureter for U.S. white
j j j
males, 19801989. Why are most of the shaded counties in the middle
of the country? See Section 2.8 for discussion.
j Gamma(, )
each based on different data but with a common prior distribution. In addition to
yj this example introduces hierarchical
illustrating the role of the prior distribution,
E( |yj )in Chapter
modeling, to which we jreturn
= w 5. + (1 w)E(j )
10nj
A puzzling pattern in a map
weighted combination of the data and the prior mean
Figure 2.7 shows the counties in the United States with the highest kidney cancer death
ay, 319 were 0s, 141 were 1s, 33 were 2s, and 5
Kidney Cancer Example from Bayesian Data Analysis (Gelman et al)
death ratesraw
yj/(10 nj)small
data: vs. population size nj.for
counties account (b)almost
all of the high
le of log10 population andthe
to see lowdata
deathmore
rates
come from the discreteness of the data (nj=0, 1,
clearly. The patterns come from the discreteness o
Kidney Cancer Example from
2, ). Bayesian Data Analysis (Gelman et al)
Figure 1:
The problem was to decode these messages. Marc guessed that the code was a
simple substitution cipher, each symbol standing for a letter, number, punctuation
mark or space. Thus, there is an unknown function f
f : {code space} {usual alphabet}.
PERSI DIACONIS
MCMCryptography
statistics, Marc matrix
Get a transition downloaded a standard
M(x,y) for English (thetext (e.g., Wa
probability
the first-orderof transitions: the xproportion
going from letter to letter y) of consecutive
This gives a matrix M (x, y) of transitions. One may th
f via Define plausibility
!
Pl(f ) = M (f (si ), f (si+1 )) ,
i
The text was scrambled at random and the Monte Carlo algorithm was run.
Figure 3 shows sample output.
Figure 2:
The text was scrambled at random and the Monte Carlo algorithm was ru
igure 3 shows sample output.
Figure 3:
Figure 4: