Академический Документы
Профессиональный Документы
Культура Документы
Computational Journalism
Columbia Journalism School
Week 1: Introduction
September 11, 2015
Lecture 1: Basics
Computer Science and Journalism
Course Structure
Interpreting High Dimensional Data
Computational Journalism:
Denitions
Broadly defined, it can involve changing how stories
are discovered, presented, aggregated, monetized,
and archived. Computation can advance journalism
by drawing on innovations in topic detection, video
analysis, personalization, aggregation, visualization,
and sensemaking.
- Cohen, Hamilton, Turner, Computational Journalism, 2011
Computational Journalism:
Denitions
Stories will emerge from stacks of financial disclosure
forms, court records, legislative hearings, officials' calendars
or meeting notes, and regulators' email messages that no
one today has time or money to mine. With a suite of
reporting tools, a journalist will be able to scan, transcribe,
analyze, and visualize the patterns in these documents.
- Cohen, Hamilton, Turner, Computational Journalism, 2011
Reporting
User
Computer
Science
CS for presentation /
interaction
CS
Data
CS
Reporting
User
Data
Reporting
CS
Data
CS
Reporting
CS
Filtering
Reporting
CS
Data
CS
CS
User
Examples of lters
http://snap.stanford.edu/nifty
CS in Journalism
CS
Data
Reporting
CS
Data
Reporting
CS
CS
CS
Reporting
CS
Data
CS
Eects
Filtering
CS
User
Message Machine
Jeff Larson, Al Shaw, ProPublica, 2012
Computer Science in
Journalism
Reporting
Presentation
Filtering
Tracking
Algorithmic accountability
Quantication
Data
Journalism as a cycle
CS
Eects
Data
CS
Reporting
User
CS
CS
Filtering
Computational Journalism:
Denitions
the application of computer science to the problems
of public information, knowledge, and belief, by
practitioners who see their mission as outside of both
commerce and government.
- Jonathan Stray, A Computational Journalism Reading List,
2011
Course Structure
Information Retrieval
Visualization
Clustering
Natural Language
Processing
Text Analysis
Filter Design
Social Network Analysis
Articial
Intelligence
Sociology
Knowledge Representation
Graph Theory
Drawing Conclusions
Cognitive Science
Statistics
Epistemology
Administration
Assignment after each class
Four assignments require programming, but
your writing counts for more than your code!
Course blog
http://compjournalism.com
Final project
for 6-pt students only
Grading
Dual degree students
Pass/Fail.
Final project: paper, story, or software.
Non-journalism students
80% assignements
20% class participation
Definition of data?
My Definition of data
a collection of related pieces of
recorded information
structured data
unstructured data
Quantication
!
#
#
#
#
#
#
#
"
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
Intelligence
Academic performance
Gender
Race, ethnicity, nationality
Number of sexual harassment incidents
Income
Political Ideology
...
continuous
countable
bounded?
units of measurement?
Categorical
o
o
o
o
measurement error
missing data
ambiguity in human classification
process errors
intentional bias / deception
!
#
#
#
#
#
#
#
"
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
Examples of features
number of claws
latitude
color {red, yellow, blue}
number of break-ins
1 for bought X, 0 for did not buy X
time, duration, etc.
number of times word Y appears in document
votes cast
Feature selection
Technical meaning in machine learning etc.:
which variables matter?
Were journalists, so were interested in an earlier
process:
how to describe the world in numbers?
Choosing Features
!
#
#
#
#
#
#
#
"
Journalism
How do we
represent the
world
numerically?
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
! x
f (1)
#
# x f (2 )
#
#
# x f (k )
"
$
&
&
&
&
&
%
where k N
Machine learning
Which variables
carry the most
information?