Академический Документы
Профессиональный Документы
Культура Документы
Computational Journalism
Columbia Journalism School
Week 6: Drawing Conclusions from Data
October 21, 2016
This class
Interpretation
There may be more than one defensible interpretation
of a data set.
Our goal in this class is to rule out indefensible
interpretations.
Margin of Error
Given:
R = 49%, O=47%
MOE(R) = MOE(O) = 5.5%
P(Obama ahead)
P(Romney ahead)
How likely is it that the temperature won't increase over next decade?
P-value
p(your data | null hypothesis)
Whats it good for? Whats it bad for?
Statistical Evidence
Evidence
Information that justifies a belief.
Presented with evidence E for X, we should believe X "more."
In terms of probability, P(X|E) > P(X)
7
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
5
6
8
9
7
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
Likelihood
model
Probability of
seeing E
if H is true
Prior model
Evidence model How likely was
How commonly
H to begin with?
do we see E at all?
0
H0
H1
H2
Parameter Estimation
Computing probability for a continuum of hypotheses
P(|E) = Pr(E|)/Pr(E) * Pr()
Strength of Evidence
Can we find a p-value equivalent?
What about Bayes factor
Pr(H1|E)/Pr(H2|E)
= [Pr(E|H1)Pr(H1)/Pr(E)] / [Pr(E|H2)Pr(H2)/Pr(E)]
= Pr(E|H1)/Pr(E|H2) * Pr(H1)/Pr(H2)
Bayes Factor
Causal Models
Occupational Group
Farmers, foresters, and fisherman
Smoking
Mortality
77
84
137
116
117
123
94
128
116
155
102
101
111
118
Woodworkers
93
113
Leather workers
88
104
Textile workers
102
88
91
104
104
129
107
86
112
96
Clothing workers
Y causes X
X causes Y
Z
X
Z causes X and Y
random chance!
if a woman is beautiful,
1) she'll respond less
2) people will tell her that
Z
X
if a woman is beautiful,
1) she'll respond less
2) people will tell her that
Analysis
of Competing Hypotheses
Cognitive biases
Availability heuristic: we use examples that come to mind,
instead of statistics.
Preference for earlier information: what we learn first has a much
greater effect on our judgment.
Memory formation: whatever seems important at the time is what
gets remembered.
Confirmation bias: we seek out and give greater importance to
information that confirms our expectations.
Confirmation bias
Comes in many forms.
...unconsciously filtering information that doesn't fit
expectations.
...not looking for contrary information.
...not imagining the alternatives.
A difficult example
NYPD performs ~600,000 street stop and frisks per year.
What sorts of conclusions could we draw from this
data? How?