Академический Документы
Профессиональный Документы
Культура Документы
HORIZON
CRISIS:
A
SOCIAL
NETWORK
ANALYSIS
NetworkThink Team A
Kaitlin
Donohue
Yilin
Wei
Yunjing
Yao
Subramanian
Vellaiyan
Jetson
Guy
1 Introduction
On
April
20,
2010
the
BP-operated
Mobile
Offshore
Drilling
Unit
(MODU)
Deepwater
Horizon
experienced
a
loss
of
well
control.
The
events
that
followed
led
to
11
deaths,
fires,
explosions
and
ultimately
sinking
of
the
unit.
As
a
result
of
this
disaster,
high
levels
of
liquid
and
gaseous
hydrocarbon
leaked
into
the
Gulf
of
Mexico
until
the
well
was
finally
stopped
on
July
15,
2010.1
During
the
three
months
prior
to
closure,
the
Deepwater
Horizon
Crisis,
as
it
has
come
to
be
known,
is
estimated
to
have
caused
discharge
of
4.9
billion
barrels
of
oil
into
the
waters
of
the
Gulf. 2
The
event,
including
its
environmental,
financial,
and
political
repercussions,
was
the
topic
of
much
debate
on
the
micro-blogging
social
network,
Twitter.
To
complete
the
task
put
forth
to
us
by
NetworkThink,
we
have
examined
the
set
of
70,000
tweets
related
to
the
crisis
from
April
2010-July
2010.
We
have
broken
our
analyses
into
segments
that
spend
time
and
effort
examining
the
relationships
within
and
between
the
various
stakeholders
in
the
Deepwater
Horizon
Twitter
network.
The
results
of
our
analyses
can
be
found
below.
1
Republic
of
the
Marshall
Islands
Office
of
the
Maritime
Administrator
(2011)
Deepwater
Horizon
Marine
Casualty
Investigation
Report.
Accessed
12/11/2014
at
<
http://www.register-
iri.com/forms/upload/Republic_of_the_Marshall_Islands_DEEPWATER_HORIZON_M
arine_Casualty_Investigation_Report-Low_Resolution.pdf>
2
On
Scene
Coordinator
Report
Deepwater
Horizon
Oil
Spill
(2011)
Accessed
12/11/14
at
<
http://www.uscg.mil/foia/docs/dwh/fosc_dwh_report.pdf>
2
Mention
Network
2.1
Create
the
mention
network
The
above
pictures
represent
the
mention
network.
The
vertex
color
represents
the
closeness
centrality.
The
vertex
shape
represents
eigenvector
centrality.
If
the
eigenvector
centrality
is
above
0.22
(average
eigenvector
centrality)
then
it
is
solid
triangle
else
it
is
solid
diamond.
The
vertex
size
represents
the
out
degree
that
is
the
person
who
has
mentioned
the
most
will
have
a
higher
size.
The
first
picture
represents
the
tweeters
who
have
mentioned
the
most.
Tweeters
who
have
mentioned
the
most:
seachele420
whodat35
winterthur
oceanshaman
Endrunlv
Zbleumoon
The
second
picture
represents
the
tweeters
who
has
been
mentioned
the
most
Tweeters
who
has
been
mentioned
the
most:
nwf
ibrrc
whodat35
bpamerica
therightblue
gohsep
The
above
picture
represents
the
most
influential
tweeters
in
all
the
groups.
The
average
betweenness
centrality
is
taken
and
if
it
is
above
the
average
then
it
is
a
solid
diamond
else
it
is
a
solid
square.
Similarly,
if
the
average
eigenvector
centrality
is
above
average
it
is
greenish
else
it
is
orange.
The
most
influential
tweeters
are
TWEETER
GROUP
whodat35
GRASSRT
seachele420
SOCMOV
Nwf
CELEB
Winterthur
GRASSRT
digiphilE
MEDIA
Ibrrc
GOV
Bpamerica
CORP
humidcity
OTHER
oil_leaks
SOCMOV
CELEB
group
has
the
highest
aggregate
of
influential
tweeters.
Degree
Distribution
1000
900
800
Number
700
600
500
400
Frequency
300
200
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
100
Degree
There
is
an
evidence
of
power
law
distribution
as
you
can
see
from
the
above
histogram.
The
following
represents
the
shapes
for
each
group
in
the
above
picture.
GRASSRT
OTHER
SOCMOV
MEDIA
GOV
CELEB
CORP
Solid
Diamond
Solid
Square
Sphere
Solid
Triangle
Disk
Diamond
Circle
3
Hashtag
Network
3.1
Create
the
Hashtag
Network
Network
1
Network 2
Nodes:
hashtages
Edges:
When
two
hashtages
appeared
in
the
same
tweet,
there
is
an
edge
between
the
hashtages.
Nodes
size:
times
the
hashtage
mentioned
Nodes
color:
Modularity
class
This
shows
the
network
after
grouping.
The
pink
group
counts
for
50.27%
nodes
in
this
graph.
#enbridge
#tarsands
#chevron
#wildlife
#corexit
#louisiana
9013
8914
7852
7621
7366
7310
Industry
News
Industry
Environment
Measures
News
This
picture
shows
the
popular
hashtages
in
the
first
half
of
period
that
were
also
mentioned
totally
above
7000
time.
We
can
see
many
familiar
hashtages
that
we
discussed
in
3.2.
10
In
the
second
half
of
period,
we
just
saw
#enbridge.
All
those
popular
environmental
hashtages
disappeared.
11
The
two
pictures
above
also
substantiates
that
there
is
a
shift
in
hashtag
usage.
It
appears
that
the
hashtags
grew
increasingly
negative
the
longer
the
spill
went
on.
4
Affiliation
Network
4.1
Choices
for
nodes,
edges
and
their
attributes
select
h.hashtag,e.type
from
hashtag
as
h,
tweeter
as
e,
tweet
as
t
where
h.tweetid=t.tweetid
and
t.tweeter=e.tweeter
We
used
this
code
to
link
the
hashtag
table
and
tweeter
table
through
tweet
table.
The
nodes
are
hashtag
and
type.
There
are
seven
different
types
CELEB,
CORP,
GOV,
GRASSRT,
MEDIA,
Other,
SOCMOV
and
total
4139
hashtags.
select
type,hashtag,count(hashtag)
from
joint
group
by
type,hashtag
We
used
this
code
to
count
every
hashtag
in
every
type.
Edges
are
the
relationship
between
hashtag
and
tweeter.
Edges
indicate
the
hashtags
used
by
Twitter
users.
Edges
can
be
weighted
by
the
number
of
times
each
hashtag
was
used.
We
can
use
edge
width
to
represent
the
number
of
times
a
hashtag
was
used.
Attributes
of
type
are
name,
size,
location
and
functions
of
different
originations.
Attributes
of
hashtag
are
information,
combination,
and
key
words
of
searching
information.
12
According
to
the
data,
we
can
see
that
hashtag
#bp
is
used
the
most
times
in
the
groups
CELEB,
CORP,
SOCMOV,
GRASSRT,
MEDIA.
Hashtag
#bp
is
the
second
most
used
hashtag
in
the
group
GOV.
#gulf
also
appears
many
time
is
CELEB,
GOV,
SOCMOV,
OTHER
types.
BP
is
the
British
multinational
oil
and
gas
company
which
operates
in
over
80
countries.
It
reportedly
produces
some
3.4
million
barrels
of
oil
equivalent
per
day.
4.3 Visualization
13
Other:
yellow
Hashtags
are
black
dots.
From
the
visualization
above,
we
can
see
that
some
hashtags
form
clusters,
which
show
frequently
in
different
types.
5
Sentiment
Analysis
5.1
Computation
method
Export
the
TWEET
table
as
an
Excel
file
and
save
the
CONTENT
column
in
TWEET
table
as
contentwincsv.csv.
Use
the
following
code
to
calculate
the
polarity
and
print
lists:
Programming
error
pointed
some
unorganized
data
in
the
dataset.
Find
this
unorganized
data
in
excel
and
clear
them:
The
program
can
run
smoothly.
The
output
is
as
follows:
14
Copy
the
programming
output
into
Excel
as
the
ORIGINAL
LIST
column
in
the
following
picture.
And
use
formulae
to
extract
CONTENT
and
POLARITY
from
ORIGINAL
LIST:
The
polarity
of
each
tweet
is
as
follows:
Top
10
negative
sentiments
by
tweeter:
Tweeter
Type
cnneditorchuck
OTHER
cnnireport
OTHER
forbesintellect
OTHER
greenprogress
OTHER
cnygreg
OTHER
joenbc
OTHER
sfkarenmc
OTHER
politicolnews
MEDIA
datelinenbc
OTHER
wcpblog
SOCMOV
Polarity
-4
-4
-4
-4
-3
-3
-3
-2.5
-2.5
-2.322580645
15
According
to
TYPE:
Top
3
negative
sentiments
by
type:
Type
CORP
GOV
MEDIA
Polarity
0.292035398
0.203401843
-0.104506232
We
used
the
following
SQL
to
select
desirable
dataset
and
export
it
to
Excel.
Then
we
sorted
data
and
obtained
the
hashtags
associated
with
most
negative
and
positive
sentiments
as
follows:
Hashtag
Polarity
#3g
-9
#att
-9
#mtr
-9
#lives
-8
#p2#hcr
-8
#wayofliving
-8
Hashtag
Polarity
#failedeconomy
8
#okaloosaisland
8
#pain
8
#random
8
#hebrewnational
7.5
#goodheartandsmart
7
#jimmybuffett
7
#paulwatson
7
16
The
following
picture
presents
the
popular
hashtags.
17
In
this
picture,
the
red
nodes
represent
positive
sentiment
(high
polarity);
the
blue
nodes
represent
negative
sentiment
(low
polarity);
and
the
yellow
nodes
represent
mild
sentiment.
Based
on
this
graph,
we
can
find
that
peoples
sentiment
are
likely
to
be
affected
by
their
neighbors.
An
individual
that
receives
positive
information
is
likely
to
be
positive
or
neither
too
positive
nor
too
negative.
On
the
other
hand,
someone
that
receives
both
positive
reviews
and
negatives
review
is
likely
to
hold
a
neutral
attitude
towards
this
event.
We
used
SQL
to
generate
the
dataset.
And
we
created
a
TIME
INTERVAL
column,
which
began
from
the
date
the
content
was
posted
to
ten
days
later.
Then
we
imported
this
dataset
into
Gephi
as
follows.
Because
we
just
wanted
to
observe
the
change
over
time,
we
did
not
add
any
edges
to
this
dataset.
18
We
still
use
blue
to
represent
negative
sentiment
and
red
to
represent
positive
sentiment.
The
whole
picture
is
as
follows:
Then
we
use
the
TIMELINE
function
in
Gephi
to
observe
the
change
as
the
following
picture.
At
the
beginning,
we
have
just
4
nodes.
It
means
that
only
a
few
tweets
cover
this
event.
Then
the
number
of
tweets
about
this
event
increased
(Video
record
at
https://www.youtube.com/watch?v=G69nA6_HwpM&feature=youtube_gdata_playe
r).
19
We
found
that
the
sentiment
was
not
simply
changing
from
negative
to
positive.
The
sentiment
fluctuated.
So
we
used
SQL
to
calculate
the
average
daily
polarity
and
made
a
chart
in
Excel
as
follows:
The
polarity
fluctuates
between
-1
and
0.6.
This
substantiated
our
observation
in
Gephi.
It
is
easy
to
observe
that
after
June
10th
2010,
the
polarity
reached
its
lowest
point.
The
reason
for
this,
we
believe,
is
that
during
the
observed
time
period,
TIME
publicized
that
Oil-spill
estimate
upped
again.
on
June
10th
,
and
The
Guardian
declared,
Obama
compares
the
BP
oil
spill
to
9/11
on
June
14th
.
These
influential
people
and
media
companies
affected
peoples
sentiment
towards
this
event.
Then
we
used
SQL
to
export
the
polarity
of
different
types
of
tweeters
on
different
dates.
We
then
utilized
Excel
to
draw
following
line
charts.
20
The
graph
above
shows
that
celebrities
have
higher
polarity
when
the
average
polarity
is
positive;
they
also
have
lower
polarity
when
the
average
polarity
is
negative.
This
graph
presents
that
in
most
of
time,
corporations
polarity
was
higher
than
average
polarity.
This
picture
indicates
that
in
the
first
half
of
period,
government
hold
less
stronger
polarity
than
other
groups,
such
as
celebrities.
During
this
period,
the
trend
of
governments
polarity
is
opposite
to
the
average
polarity.
Besides,
in
the
second
half
of
period,
government
hold
strong
positive
attitude
in
most
of
time.
21
The
polarity
of
grassroot
fluctuates
between
-1.2
and
0.8,
which
is
smaller
than
the
scale
of
other
groups
polarities.
In
most
of
time,
the
average
trend
is
very
close
to
media
trend.
It
is
possible
that
media
opinion
leads
common
opinion.
At
April
28
and
June
18,
medias
polarity
is
different
form
the
average
polarity.
At
April
28,
it
was
indicated
that
the
flow
of
oil
was
five
times
larger
than
first
estimation.
At
June
17
and
18,
Hayward
was
accused,
and
Moodys
decreased
BPs
credit
rate3.
In
most
of
time,
the
polarity
of
social
movement
was
lower
than
the
average
polarity.
3
The
Guardian:
BP
oil
spill
timeline
22
Our
examination
of
the
Deepwater
Horizon
dataset
allowed
us
to
gain
additional
insight
into
the
crisis,
which
we
may
not
have
been
able
to
glean
otherwise.
For
instance,
while
the
majority
of
the
press
surrounding
the
crisis
focused
on
the
federal
and
corporate
response
to
the
event,
these
two
groups
had
a
rather
small
social
media
footprint
on
Twitter.
Another
lesson
learned
came
from
our
sentiment
analysis
of
the
dataset.
Using
negative/positive
connotations
to
examine
tweets
in
a
chronological
fashion
gave
us
a
uniquely
different
view
of
the
overall
network,
and
one
which
we
had
not
seen
before.
By
examining
this
information
over
the
course
of
time,
we
were
also
able
to
see
how
individual
tweeters
were
affected
by
the
sentiment
of
their
neighbors.
For
example,
if
an
individual
was
connected
to
others
who
shared
equally
opposing
views
(e.g.,
one
very
negative
and
one
very
positive),
that
individual
tended
to
maintain
a
more
neutral
position.
Using
sentiment
analysis
over
a
small,
but
distinct
period
of
time,
allowed
us
to
see
how
individual
users
on
social
network
can
be
swayed
by
the
information
that
they
receive
from
others.
Interestingly
enough,
individuals
with
some
of
the
most
polarizing
accounts,
tended
to
be
heavily
followed
individuals
(i.e.,
those
with
significant
influence),
such
as
media
correspondents
and
corporations.
Information
produced
by
these
Twitter
accounts
reached
a
very
large
audience,
allowing
them
the
opportunity
to
influence
many
within
their
network.
One
lesson
learned,
which
made
us
question
the
validity
and
accuracy
of
our
analyses,
involved
our
implicit
trust
in
the
organization
of
the
dataset.
Upon
closer
examination,
we
realized
that
much
of
the
labeling
of
the
dataset
was
incorrect.
For
example,
Twitter
handles
which
should
have
belonged
to
the
Media
category
were
23
labeled
Other
and
vice
versa.
Below
is
a
table
that
shows
just
a
handful
of
these
instances:
Twitter
Account
Original
Label
Revised
Label
@CBSRadioNews
Other
Media
@HuffPostHill
Other
Media
@NBCNightlyNews
Other
Media
@CDCEmergency
Other
Gov
Based
on
these
discrepancies,
we
felt
it
important
to
caveat
our
analyses
by
stating
that
we
did
not
scrub
the
data
to
correct
any
labeling
inaccuracies.
Instead,
we
completed
our
work
based
on
the
data
provided
by
Topsy.
That
being
said,
based
on
our
overall
analyses
of
the
dataset,
we
used
betweenness
as
a
measure
for
determining
the
most
influential
actors
in
the
Deepwater
Horizon
network.
We
did
this
two
different
ways:
1. We
affixed
the
type
from
the
tweeter
table
to
determine
the
maximum
betweenness
centrality
for
each
group.
2. Second,
we
separated
the
mention
network
into
different
groups
and
then
measured
betweenness
centrality
to
determine
which
group
was
most
influential.
Based
on
these
two
methods,
we
found
that
@nvf
is
the
most
influential
tweeter
in
the
SOCMOV
group.
Overall,
the
most
influential
tweeters
were:
whodat35
seachele420
Nwf
Winterthur
digiphilE
Ibrrc
Bpamerica
humidcity
oil_leaks
In
addition
to
our
knowledge
gained
about
the
dataset,
our
analyses
also
allowed
us
to
learn
a
great
deal
about
Twitter
and
its
utility.
As
part
of
our
analysis,
we
examined
whether
Twitter
can
be
considered
a
medium
for
companies
to
disseminate
information
or
whether
it
is
a
platform
for
the
masses
to
express
their
ideas.
Of
the
675
unique
tweeters
in
this
network,
131
of
these,
approximately
19%,
represent
Media
outlets
and
the
remaining
544
represent
Celebrities
(3),
Grassroot
Organization
(180)
Social
Movement
Organizations
(74),
Corporations
(4),
24
Government
(12)
and
Other
(268)
which
is
composed
mainly
of
the
average
twitter
user.
Tweeter
Groups
Media
19%
40%
0%
Celebrities
Grassroot
27%
11%
Social
Movement
Corportations
Government
2% 1%
Other
Figure
:
An
overall
breakdown
of
the
individual
groups
within
the
Deepwater
Horizon
Twitter
Network.
However,
while
this
breakdown
suggests
that
the
average
twitter
user
may
be
the
dominant
user
of
this
social
media
service,
as
discussed
earlier,
a
closer
examination
of
the
Tweeter
table
shows
that
there
may
be
some
error
in
the
classification
scheme
used
to
organize
these
data.
As
a
result,
it
would
be
difficult
to
base
our
answer
off
of
these
categories.
Instead,
we
will
use
the
overall
number
of
tweets
vs.
retweets
as
a
very
general
breakdown
of
information
dissemination
(retweets)
vs.
expression
of
new
ideas
(tweets),
an
idea
put
forth
by
researchers
Macksassy
&
Michelson.
4
In
the
Deepwater
Horizon
Twitter
network,
there
were
29,888
instances
of
original
tweets
and
42,828
instances
of
retweets.
This
breakdown,
which
shows
almost
double
the
number
of
retweets
compared
to
original
tweets,
suggests
that
Twitter
is
mainly
being
used
as
a
means
of
information
dissemination
as
opposed
to
the
expression
of
individual
ideas.
4
Macskassy,
S.
&
Michelson,
M.
(2011)
Why
Do
People
Retweet?
Anti-Homophily
Wins
the
Day!
Association
for
the
Advancement
of
Artificial
Intelligence.
Accessed
at:
<http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2790/32
91>
25
41%
Tweet
59%
Retweet
Figure
:
Using
retweets
as
a
metric
for
information
dissemination,
we
can
see
that
the
majority
of
the
messages
from
the
Deepwater
Horizon
network
accomplish
this
purpose.
Using
a
similar
strategy,
we
sought
to
determine
whether
Social
Movement
Organizations
(SOCMOV)
were
benefitting
from
Twitter
for
a
call
to
action.
As
you
can
see
from
the
chart
below,
the
majority
of
the
SOCMOV
accounts
in
the
Deepwater
Network
relied
on
information
dissemination.
This
suggests
that
Social
Movements
were
not
using
Twitter
as
their
main
platform
for
expressing
new
ideas,
but
rather
to
ensure
that
a
larger
target
audience
could
be
exposed
to
their
existing
ideologies
and
practices.
42%
58%
Tweets
Retweets
Figure
:
Using
retweets
as
a
metric
for
information
dissemination,
we
can
once
again
see
that
SOCMOV
are
relying
on
Twitter
as
a
means
of
perpetuating
their
message.
26
In
addition
to
SOCMOVs,
one
of
the
other
big
players
in
the
Deepwater
Horizon
crisis
was
the
Government.
Although
they
did
not
have
as
many
accounts
within
their
group,
we
used
the
existing
group
members
and
their
respective
tweets
to
determine
which
political
party
was
most
active.
Within
the
GOV
group,
we
picked
out
the
following
three
Twitter
accounts
since
they
were
the
only
accounts
present
that
represented
an
individual
political
figure:
David
Vitter:
a
Junior
US
Senator
from
Louisiana
and
a
member
of
the
Republican
Party.
64
Unique
Tweets
and
21
retweets.
Senator
Bob
Menendez:
Senior
US
Senator
from
New
Jersey
and
a
member
of
the
Democratic
Party.
7
Unique
Tweets
and
4
retweets
Senator
Bernie
Sanders:
a
Junior
US
Senator
from
Vermont
and
a
member
of
the
Independent
Party.
9
Unique
tweets
(account
is
run
by
staff
and
not
the
senator)
and
4
retweets.
While
there
is
a
clear
split
in
party
representation
(1/3
each)
amongst
the
individual
actors,
other
accounts
such
as
the
Senate_GOPS
handle
was
very
active
during
the
Oil
Spill.
This
account,
which
provides
News
updates
from
Senators
and
their
Staff,
had
43
distinct
tweets
and
17
retweets
during
the
timeframe
in
question.
In
comparison,
the
White
House
(lead
by
Pres.
Obama
of
the
Democratic
Party)
had
only
19
total
tweets,
of
which
14
were
retweets.
Number of Tweets
Tweets
Retweets
Account Name
Figure
:
Twitter
activity
in
terms
of
tweets
and
retweets
per
most
active
GOV
Twitter
accounts.
As
a
result
of
this
breakdown,
we
can
see
that
the
Republican
Party
seemed
to
be
the
most
active.
However,
the
next
step
was
to
examine
whether
this
activity
was
being
used
as
a
means
of
engaging
in
debate
or
simply
relaying
the
partys
pre-existing
frames.
27
For
the
most
part,
tweets
from
GOV
handles
were
largely
retweets.
These
retweets
were
used
to
announce
TV
show
appearances
or
to
publicize
articles
that
featured
quotes
from
Senators
and
other
members
from
each
party.
However,
two
accounts
in
particular,
David
Vitter
and
the
Senate_GOPS,
were
both
very
active
in
producing
original
tweets
that
challenged
the
response
from
both
the
President
as
well
as
BP.
One
possible
reason
for
David
Vitters
large
social
media
footprint
could
have
been
his
close
geographic
relationship
to
the
spill.
As
a
junior
Senator
from
Louisiana,
one
of
the
regions
most
affected
by
the
oil
spill,
he
was
very
vocal
to
ensure
his
constituents
that
he
was
working
to
enact
an
appropriate
and
timely
cleanup
procedure.
Similarly,
one
possible
reason
for
the
activity
on
the
Senate_GOPS
handle,
could
have
been
related
to
critique
of
the
White
House
which
was
being
led
by
a
Democratic
president.
In
conclusion,
this
project
allowed
us
to
learn
a
great
deal
about
the
use
and
impact
of
Twitter
in
documenting
an
event
such
as
an
environmental
disaster.
However,
in
the
future,
in
order
to
ensure
that
the
results
of
data
analyses
pulled
from
a
social
media
network
are
accurate,
it
is
important
to
have
faith
in
the
data
being
used.
This
means
that
multiple
rounds
of
data
cleaning
and
peer
review
should
occur
before
an
analysis
can
be
considered
useful
for
reporting
purposes.
Once
this
has
occurred,
data
from
platforms
such
as
Twitter
provide
a
real-time
snapshot
of
public
sentiment
and
can
be
incredibly
beneficial
in
disseminating
information.
28