Вы находитесь на странице: 1из 23

Connecting Audiences to News:

Understanding Links Among a


Community of Sites

Research by
Rich Gordon & Syndio Social, LLC.

Miami Herald Online news team


plus special guest

1
A key question by the late ’90s:
How do you build the online audience?
• Data showed clearly that what news sites were
doing clearly wasn’t enough
• Blogs offered some clues
o Individual bloggers generated huge audiences
o News Web sites found that a single link from an “A
list” blogger could drive enormous traffic
• News sites found that half or more of their
traffic arrived from search engines or inbound
links.
© Rich Gordon / Syndio Social 2010

And then …
… the Facebook phenomenon

2
2007: For MMC and NAA,
I began exploring “new communities”
• Wrote the “Online Community
Cookbook” under contract to
the NAA
• Took on a part-time
assignment as director of new
communities for MMC
• Discovered Linked, by Albert-
Laszlo Barabasi, and the
“groundbreaking science of
networks”
© Rich Gordon / Syndio Social 2010

A brief introduction
the science of networks

3
The science of networks:
a brief historical overview
• Its roots: 18th century mathematics
• Understanding of networks has exploded in
the past decade – with applications to many
disciplines:
o psychology, sociology, biology, neurology, ecology,
business, marketing, political science and more
o Northwestern researchers: leaders in this field
• The first 50 years of network research focused
on interpersonal networks (or social networks)
© Rich Gordon / Syndio Social 2010

Social networks:
What researchers found

Alice
Mary

1 2
Bob
Ann
Joe
Joan

Phil
3
Cindy

Mike

© Rich Gordon / Syndio Social 2010 ♦

4
What we know
about social networks
1. People cluster together: if I am your friend, and you
are Mike’s friend, there is a very good chance I am
Mike’s friend, too
2. Interpersonal networks are “small worlds” – in
general you can connect any two individuals
through a small number of “hops” or “handshakes”
3. Connectors are the reason that “small worlds” exist.
They are likely to have a much larger group of
friends and acquaintances than most people.
4. These connectors are network hubs that connect
clusters to one another.
o Without connectors, interpersonal networks would have more
than “six degrees of separation”
Source: Albert-Laszlo Barabasi, Linked

The second wave of network science:


The Web as an information network
• Researchers were limited in their ability to
understand networks
o few can be fully mapped and analyzed as data
• The World Wide Web, for the first time, created
a content network that can be captured (with a
Web crawler) and mapped
• In the late 1990s, researchers began analyzing
the Web network and comparing it to
interpersonal networks
© Rich Gordon / Syndio Social 2010

5
What researchers learned
about the World Wide Web
1. Web sites cluster together: if my site links to yours, and your
site links to Mike’s, there is a very good chance my site links
to Mike’s, too
2. The Web is a “small world” – in general you can connect any
two sites through a small number of “hops” or “handshakes”
3. Connectors are the reason that the Web is a “small world.”
These are the Web sites that are most likely to be linked to
other sites – they are “shortcuts” across the Web network
o 80% of Web links go to 15% of Web pages*
4. These connectors are network hubs that connect network
clusters to one another.
o These sites also tend to get a disproportionate share of Web traffic

* Albert-Laszlo Barabasi, Linked

The digital consequences:


Greater concentration of attention
• You might think that the amazing content
choices offered by the Internet would distribute
attention more widely than in traditional media
• But surprisingly, online attention is even more
concentrated than traditional media usage
o The most linked-to sites get a disproportionate share
of links and traffic

6
In a world of infinite choice,
how can attention be more concentrated?
Top 10 11-100

Among top 100 radio


stations
20%

Among top 100


newspapers
32%

Among top 100


magazines
39%

Among top 100 Web


sites
62%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Source: Matthew Hindman, “A Mile Wide and an Inch Deep: Measuring Media Diversity Online and Offline”

Proposition: Network theory explains …


how online audience attention is aggregated
• Content links between Web pages guide people to
relevant content.
• Search engines rely on links to build their algorithms
to deliver relevant search results.
• Interpersonal networks (online communities, social
networking sites) on the Web:
o guide or alert people to content
o strengthen bonds between people, nurturing common
interests
o build “buzz” about content, products, services
• Network science helps explain why “the rich get
richer”: why some sites have huge popularity
© Rich Gordon / Syndio Social 2010

7
How digital content is different:
the network perspective
• Content networks (links) are:
o Transparent (they are easily visible to others)
o Persistent (they remain live for an extended
period – perhaps forever)
• Interpersonal networks now can also be
transparent and persistent
o Bloggers who frequently cite and comment on
one another
o Content sharing on social networking sites like
Twitter, Facebook, etc.
© Rich Gordon / Syndio Social 2010

Building digital audience, attention


by applying the science of networks
• Think of content and
consumers as network nodes
• Links are established:
o When consumers pay attention
to, discuss content
o When people refer content to
others
o When people connect with
others through the publication
• To gain audience, media (and
journalists) should build
connections – and become
hubs

© Rich Gordon / Syndio Social 2010

8
In the mass media age (1950-1995?),
newspapers & TV had it pretty easy
 Because of technology
constraints …
 Cost of presses
 Limited broadcast spectrum
 ... media choice was
limited ...
 … mass media were the
hubs ...
 it was relatively easy to
capture attention, large
audiences, handsome
profits
© Rich Gordon / Syndio Social 2010

The top 100 news and media Web sites:


a graphical look

Source: Matthew Hindman, matthewhindman.com


9
Why attention is more concentrated:
the mathematics of networks

Many
links
8’4”-
Height: Links in a network:
Linear power law

Few
links
1’10”-
1 6.7 billion 1 ∞
Rank among people in the world Rank among Web sites

Mapping the hyperlinks


of Chicago’s “new news ecosystem”

10
Research objectives
Discover the network of links connecting
the news websites in Chicago’s news ecosystem

Build a virtual representation of this network

Diagnose the overall health of the network

Reveal patterns and trends in the hyperlinks

Identify key players

Collecting hyperlink data:


WebCrawlers
• WebCrawlers “crawl” a list
of seed sites and the sites
with which they link

• WebCrawlers expose the


community of sites
surrounding the seeds

• They follow a set of rules to


find communities of
websites

11
Choosing our seed sites:
two primary sources

• 121 Sites that responded to CCT’S New


News Chicago 2010 survey

• 247 Chicago websites found through FWIX, a


website that compiles news feeds by location
http://fwix.com/chicago/browse/sources

In total, we crawled a comprehensive list


of 368 seed sites as a starting point for

Crawl demonstration

S1

S2 S3

Start with a set of seed sites

12
Crawl demonstration

S1

S2 S3

Crawl these sites and record their


outlinks to other sites

Crawl demonstration

S1

S2 S3

Crawl these sites and record their


outlinks to other sites

13
Crawl demonstration

S1

S2 S3

Retain only those sites which receive at least 2 links from


the original seeds, to ensure relevance and community

Crawl demonstration

S1

S2 S3

These sites are the new ‘core’ of important sites, relative


to the initial seed sites

14
Crawl demonstration

S1

S3

The new core acts as a second list of seed sites; it is crawled again to identify the
next level of the community. After 3 iterations, the final network is exposed

IssueCrawler findings

 Each iteration of the crawl produces an even more specific


set of sites -- sites must receive links from sites which in turn
received links from the previous crawl

 This final network – 277 sites -- is our best approximation of


the core of the Chicago news ecosystem

 Remember: This includes sites that we would clearly


categorize as news (chicagotribune.com), but also a variety
of other kinds of sites that are part of the ecosystem.

15
General network overview

Key statistics
Total Nodes: 277 Total Links: 24,598
Density: 0.1273 Total Site Relationships:
1,232

Site categorizations
To simplify analysis, sites were coded by Category
Category Description
Legacy Web publications corresponding to a mainstream or traditional
media brand [Ex: ChicagoTribune.com, CBS2Chicago.com]
Legacy- Web publications/brands owned by a mainstream or traditional
Affiliated media brand [Ex: Chicagonow.com, Vocalo.org]
Micropublisher Web-only/Web-first publishers focused on a particular topic,
audience or geographic area [Ex: GapersBlock.com]
Organization/ Organizations, companies, institutions or non-profits that
Institution historically would have needed media intermediaries but now
publish online [Ex: FieldMuseum.org, CityofChicago.org]
National Brand Websites of national scope with local presence [Ex:
CitySearch.com, HuffingtonPost.com, SBNation.com]
Service Websites that provide services to Web publishers [Ex:
WordPress.com, Twitter.com, Quantcast.com]

16
Site categorizations
The Legacy, Legacy-affiliated, Micropublisher, and
Organization/Institution categories were further coded by Scope

Scope Description
Geo-Publisher Web publications focused on one or more specific
geographic areas within the Chicago region (Ex:
DailyHerald.com, EvanstonNow.com,
AdentroDePilsen.org)
Niche Web publications focusing on a topic or audience
Publisher segment (Ex: ChicagoBusiness.com,
TheExpiredMeter.com, BleedCubbieBlue.com,
NewCity.com, Catalyst-Chicago.org)
Mass Media Websites branded with a major mass media outlet (Ex:
ChicagoTribune.com, NBCChicago.com, Newshour.org)

Degree centrality: authorities & hubs


•A site’s degree is the total number of links it has to or from other sites – it is
a way of measuring “popularity”

Authorities: Sites that many Hubs: Sites that link to many


other sites link to other sites

A F

D H

B G

17
Organizations/Institutions,
a mix of other sites are top authorities

Top 5 Authorities
Transitchicago.com
Chicagotribune.com
Gapersblock.com
Mcachicago.org
Metrarail.com

Larger circles = more links in

Micropublishers, Organizations/ Institutions


are the hubs

Top 5 Hubs
Gapersblock.com
Badatsports.com
Saic.edu
Uchicago.edu
Macfound.org

Larger circles = more links out

18
Betweenness:
Intermediaries and switchboards
•Flow betweenness measures the number of ‘paths’ passing through
each site; betweenness measures the number of ‘shortest paths’
passing through each site
Intermediaries: sites that are deeply embedded between
otherwise unconnected sites (high flow betweenness)
Switchboards: sites that connect readers in the fastest
possible way to otherwise unconnected communities (high
betweenness)

A F K M

D H I J N

B G L O

One Service, Organizations/Institutions are


the most prominent intermediaries

Top 5 Intermediaries
Addthis.com
Windycitizen.com
Ravinia.org
Chicagoartistsresource.org
Cityofchicago.org

19
Gapersblock.com, Windycitizen.com,
Organizations/Institutions are switchboards

Top 5 Intermediaries
Gapersblock.com
Transitchicago.com
Windycitizen.com
Saic.edu
Chicagoartistsresource.org

Eigenvector centrality:
the key ingredient in search algorithms

“… Generally, highly linked pages are more “important”


than pages with few links… backlinks provide a kind of
peer review.—1998 Stanford Technical Paper”

20
Eigenvector centrality
is a measure of prestige
A
• A site’s centrality is a function of the centralities
of the sites it links to B C
• Sites with a high eigenvector centrality tend to
be considered “prestigious”

• These sites are linked to by the “most linked-to D


sites” in the network
F

Organization/Institutions
tend to be the most prestigious

Top 5 Authorities
Transitchicago.com
Metrarail.com
Newcity.com
Rtachicago.com
Nictd.com

21
When we set a ‘link volume’ threshold, we
see clustering by content and affiliation
NewCityChicago

Sports

Tribune Co.
Music
Micropublisher core

A handful of sites bridge otherwise disconnected


regions of the niche-publisher community
NewCityChicago

Periphery

Niche publishers
central cluster

Periphery

22
Thank you!!
Rich Gordon
richgor@northwestern.edu

23

Вам также может понравиться