Вы находитесь на странице: 1из 28

Fron%ers

of Computa%onal Journalism
Columbia Journalism School Week 9: Social Network Analysis November 12, 2012

Week 9: Social Network Analysis


Why Study Social Networks? Analysis Algorithms SNA in journalism in prac%ce

Network
A set of people

and a set of connec%ons between pairs of them

Types of connec%ons
Social network analysis: only one type of connec%on between individuals (e.g. "friend") Link analysis: mul%ple types of connec%ons
friend brother employer went to university with sold a car to

Link analysis is much more relevant to journalism, because it allows representa%on of much more detail and context.

Why study social networks?


Iden%fy people or communi%es Understand spread of informa%on and behavior Illustrate complex stories

Week 9: Social Network Analysis


Why Study Social Networks? Analysis Algorithms SNA in journalism in prac%ce

We'll look at three tasks


Iden%fy "inuen%al" people Detect "communi%es" Track spread of info/behavior

Degree centrality: highest number of edges

Closeness centrality: lowest average shortest distance to all other nodes

Betweenness centrality: highest frac%on of shortest paths that pass through node

Eigenvector centrality: how likely you are to end up at a node on a random walk (akin to PageRank)

Journalism centrality: how important is this person to this story?

Who is "important"?
What type of person do you want to iden%fy in the network? OZen assumed we're aZer "inuen%al." But Sociology says "power" is a complicated thing and dicult to dene and measure. Network analysis has mostly ignored this problem. I know of no successful use of centrality metrics in journalism maybe you'll be the rst.

Finding Communi%es
For our purposes, a community is "a group of people who think or act collec%vely." In social network analysis, that translates into clusters in the graph.

Friends/followers

Co-consump%on Network of poli-cal book sales, Orgnet.com

Communica%ons network Exploring Enron, Jeery Heer

Web link structure Map of Iranian Blogosphere, Berkman Center

Individual %me/loca%on trails CitySense, Sense Networks

Mathema%cal deni%ons of "cluster"


You've already seen several! If you can compute distance between any two items, you can cluster. But in social networks, not everyone is connected to everyone else...

Modularity

Are there more intra-group edges than we would expect randomly?

Modularity
n = number of ver%ces ki = degree of vertex i Aij = 1 if edge between i,j, 0 otherwise gij = 1 if i,j in same group, 0 otherwise There are m = k total edges in the graph. If they go between random ver%ces then number of edges between i,j is ki k j / 2m
1 2 i

Modularity
n = number of ver%ces ki = degree of vertex i Aij = 1 if edge between i,j, 0 otherwise gij = 1 if i,j in same group, 0 otherwise Modularity Q = ( Aij ki k j / 2m)gij ij If Q>0 then there are "excess" edges inside the groups (and fewer edges between them.)

Modularity algorithm
Look for a division of nodes into two groups that maximizes Q Can nd this through eigenvector technique Possible that no division has Q>0, in which case the graph is a single community If a division with Q>0 found, split Recursively split sub-graphs

Week 9: Social Network Analysis


Why Study Social Networks? Analysis Algorithms SNA in journalism in prac%ce

SNA in journalism
ICIJ human %ssue inves%ga%on WSJ "Galleon's Web" insider trading story SCMP "Who Runs Hong Kong" Muckety

SNA in journalism
Visualiza%on widely used I am not aware of successful applica%on of centrality metrics or automated community detec%on. This may change as the graphs journalism examines get bigger... Would it be possible to use community detec%on to nd the "right" audience for a story?

Вам также может понравиться