Академический Документы
Профессиональный Документы
Культура Документы
Computational Journalism
Columbia Journalism School
Week 5: Quantification and Statistics
October 13, 2016
This class
Quantification
Stats 101 Freestyle
Conditional probability
Analyzing discrimination
Quantification
Definition of data?
My Definition of data
a collection of related pieces of
recorded information
structured data
unstructured data
Data
Quantification
!
#
#
#
#
#
#
#
"
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
continuous
countable
bounded?
units of measurement?
Categorical
o
o
o
o
Likert Scale
Discrete scale, no fixed origin , abstract units,
comparative, non-uniform
Issues withquantitative
Where did the data come from?
o physical measurement
o computer logging
o human recording
measurement error
missing data
ambiguity in human classification
process errors
intentional bias / deception
Intelligence
Academic performance
Gender
Race, ethnicity, nationality
Number of sexual harassment incidents
Income
Political Ideology
...
GDP = C + I + G + (X - M)
Conditional Probability
Taxi Accidents
Imagine you live in a city where one in every ten rides
ends in an accident, and last year there were
- 75 accidents involving yellow cabs
- 25 accidents involving blue cabs
Which taxi company is more dangerous?
Accident
No accident
Pr(Accident) = 0.10
P(Yellow) = 0.6
Blue
Yellow
Accident
No Accident
Blue
Yellow
Accident
No Accident
P(Accident|Blue) = 0.6
Blue
Yellow
Definition
Pr(B|A) = Pr(AB)/Pr(A)
Bayes Theorem
Tells us how to go from Pr(A|B) to Pr(B|A)
Pr(B|A) = Pr(A|B)Pr(B) / Pr(A)
cancer
no cancer
positive
negative
cancer
no cancer
Pr(positive|cancer) = 0.75
= N(positive & cancer) / N(cancer)
N(cancer) = 4
N(positive & cancer) = 3
positive
negative
cancer
no cancer
Pr(positive|no cancer) = 0.1
= N(positive & no cancer) / N(positive)
N(no cancer) = 1000
N(positive & no cancer) = 100
positive
negative
cancer
no cancer
Pr(cancer) 0.0014
= N(cancer) / N
positive
negative
cancer
no cancer
Pr(cancer|positive)
= 9.6%
positive
negative
Conditional probabilities
Pr(positive|cancer) = 75%
Pr(positive|no cancer) = 10%
What is Pr(cancer|positive)?
Bayesian Mammograms
Pr(cancer|positive) =
Pr(positive|cancer) Pr(cancer) / Pr(positive)
Pr(positive|cancer) = 0.75
Pr(cancer) = 0.014
Pr(positive) =
Bayesian Mammograms
Pr(cancer|positive) =
Pr(positive|cancer) Pr(cancer) / Pr(positive)
= (0.75 * 0.014) / (0.1091)
= 0.0962
= 9.6% chance she has cancer
if mammogram is positive
Model of H
Probability of
seeing E
if H is true
Prior
Model of E
How likely was
How commonly
H to begin with?
do we see E at all?
Analyzing Discrimination