Вы находитесь на странице: 1из 28

DEEPWATER

HORIZON CRISIS:
A SOCIAL NETWORK ANALYSIS

NetworkThink Team A
Kaitlin Donohue
Yilin Wei
Yunjing Yao
Subramanian Vellaiyan
Jetson Guy

1 Introduction


On April 20, 2010 the BP-operated Mobile Offshore Drilling Unit (MODU)
Deepwater Horizon experienced a loss of well control. The events that followed
led to 11 deaths, fires, explosions and ultimately sinking of the unit. As a result of
this disaster, high levels of liquid and gaseous hydrocarbon leaked into the Gulf of
Mexico until the well was finally stopped on July 15, 2010.1 During the three
months prior to closure, the Deepwater Horizon Crisis, as it has come to be known,
is estimated to have caused discharge of 4.9 billion barrels of oil into the waters of
the Gulf. 2 The event, including its environmental, financial, and political
repercussions, was the topic of much debate on the micro-blogging social network,
Twitter.

To complete the task put forth to us by NetworkThink, we have examined the set of
70,000 tweets related to the crisis from April 2010-July 2010. We have broken our
analyses into segments that spend time and effort examining the relationships
within and between the various stakeholders in the Deepwater Horizon Twitter
network. The results of our analyses can be found below.


1 Republic of the Marshall Islands Office of the Maritime Administrator (2011)
Deepwater Horizon Marine Casualty Investigation Report. Accessed 12/11/2014 at
< http://www.register-
iri.com/forms/upload/Republic_of_the_Marshall_Islands_DEEPWATER_HORIZON_M
arine_Casualty_Investigation_Report-Low_Resolution.pdf>
2 On Scene Coordinator Report Deepwater Horizon Oil Spill (2011) Accessed
12/11/14 at < http://www.uscg.mil/foia/docs/dwh/fosc_dwh_report.pdf>

2 Mention Network
2.1 Create the mention network


The above pictures represent the mention network. The vertex color represents the
closeness centrality. The vertex shape represents eigenvector centrality. If the
eigenvector centrality is above 0.22 (average eigenvector centrality) then it is solid
triangle else it is solid diamond. The vertex size represents the out degree that is the
person who has mentioned the most will have a higher size.

The first picture represents the tweeters who have mentioned the most.
Tweeters who have mentioned the most:

seachele420

whodat35


winterthur

oceanshaman

Endrunlv

Zbleumoon


The second picture represents the tweeters who has been mentioned the most
Tweeters who has been mentioned the most:

nwf
ibrrc
whodat35
bpamerica
therightblue
gohsep

2.3 Most influential tweeters




The above picture represents the most influential tweeters in all the groups. The
average betweenness centrality is taken and if it is above the average then it is a
solid diamond else it is a solid square. Similarly, if the average eigenvector centrality
is above average it is greenish else it is orange.
The most influential tweeters are

TWEETER
GROUP
whodat35
GRASSRT
seachele420
SOCMOV
Nwf
CELEB
Winterthur
GRASSRT
digiphilE
MEDIA
Ibrrc
GOV
Bpamerica
CORP
humidcity
OTHER
oil_leaks
SOCMOV
CELEB group has the highest aggregate of influential tweeters.

Degree Distribution
1000
900
800
Number

700
600
500
400

Frequency

300
200
0

1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101

100

Degree

There is an evidence of power law distribution as you can see from the above
histogram.

Communication is more concentrated across the groups.



The following represents the shapes for each group in the above picture.

GRASSRT
OTHER
SOCMOV
MEDIA
GOV
CELEB
CORP

Solid Diamond
Solid Square
Sphere
Solid Triangle
Disk
Diamond
Circle


3 Hashtag Network
3.1 Create the Hashtag Network
Network 1

Network 2


Nodes: hashtages
Edges: When two hashtages appeared in the same tweet, there is an edge between
the hashtages.
Nodes size: times the hashtage mentioned
Nodes color: Modularity class


This shows the network after grouping. The pink group counts for 50.27% nodes in
this graph.

3.2 Most frequent hashtags


We used the following SQL to count the times of a hash tag mentioned:

The hashtags mentioned above 7000 times are as follows:


Hashtag
Times
Themes
#bp
122249 Industry
#gulf
62614 Environment
#oil
30866 Environment
#p2
21839 Opinion
#environment
21552 Environment
#blacktide
19467 Environment
#boycottbp
17139 Opinion
#eco
16958 Environment
#tcot
16908 Industry
#news
15331 News
#ocean
13080 Environment
#cdnpoli
13006 Opinion
#oceans
12438 Environment
#green
9832 Environment
#bigoil
9685 Environment
#bpoilspill
9472 News

#enbridge
#tarsands
#chevron
#wildlife
#corexit
#louisiana

9013
8914
7852
7621
7366
7310

Industry
News
Industry
Environment
Measures
News

3.3 Different hashtages by time


In the following two graphs, the size and color of labels shows the times that a
hashtage was mentioned.

This picture shows the popular hashtages in the first half of period that were also
mentioned totally above 7000 time. We can see many familiar hashtages that we
discussed in 3.2.

10


In the second half of period, we just saw #enbridge. All those popular environmental
hashtages disappeared.

11

The two pictures above also substantiates that there is a shift in hashtag usage. It
appears that the hashtags grew increasingly negative the longer the spill went on.

4 Affiliation Network
4.1 Choices for nodes, edges and their attributes

select h.hashtag,e.type
from hashtag as h, tweeter as e, tweet as t
where h.tweetid=t.tweetid and t.tweeter=e.tweeter

We used this code to link the hashtag table and tweeter table through tweet table.
The nodes are hashtag and type.
There are seven different types CELEB, CORP, GOV, GRASSRT, MEDIA, Other,
SOCMOV and total 4139 hashtags.

select type,hashtag,count(hashtag)
from joint
group by type,hashtag

We used this code to count every hashtag in every type.
Edges are the relationship between hashtag and tweeter. Edges indicate the
hashtags used by Twitter users. Edges can be weighted by the number of times each
hashtag was used. We can use edge width to represent the number of times a
hashtag was used.

Attributes of type are name, size, location and functions of different originations.
Attributes of hashtag are information, combination, and key words of searching
information.

4.2 Insights from the affiliation network


select type,hashtag,count(hashtag)
from joint
group by type,hashtag
order by type,count(hashtag)

We used this code to find the number of times each hashtag was used within the
tweeter groups in order.
Below are some of the hashtags that were used the most times by the tweeter
groups:

CELEB #bp 29, #gulf 21, #p2 11, #ocean 7, #cnnhelpgulf 7
CORP #bp 179, #39top 5, #oilspill2010 4
GOV #louisiana 78, #bp 67, #gulf 33, #nola 29, #uk 23, #weather 19,
#usfws 19, #nationalguard 19, #hurricane 19
GRASSRT #bp 3250, #p2 2073, #tcot 1691, #gulf 685, #blacktide
587, #bpoilspill 484, #teaparty 395, #ocra 341, #eco 304, #tlot 248,

12

#ocean 232, #topprog 231, #boycottbp 221, #news 219,


#oilpocalypse 204, #oil 202, #gop 166, #wildife 159, #environment
159
SOCMOV #bp 957, #ocean 675, #gulf 272, #environment 209,
#oilpocalypse 188, #p2 169, #blacktide 146, #peakoil 117
MEDIA #bq 898, #nola 187, #news 157, #p2 140, #cnn 113, #tcot 110
OTHER #bp 562, #p2 224, #blacktide 150, #gulf 149, #tcot 126,
#bpoilspill 118


According to the data, we can see that hashtag #bp is used the most times in the
groups CELEB, CORP, SOCMOV, GRASSRT, MEDIA. Hashtag #bp is the second most
used hashtag in the group GOV. #gulf also appears many time is CELEB, GOV,
SOCMOV, OTHER types.

BP is the British multinational oil and gas company which operates in over 80
countries. It reportedly produces some 3.4 million barrels of oil equivalent per day.

4.3 Visualization

The labels show the seven types.


CELEB: orange
CORP: gray
GOV: pink
GRASSART: red
MEDIA: green
SOCMOV: blue

13

Other: yellow
Hashtags are black dots.
From the visualization above, we can see that some hashtags form clusters, which
show frequently in different types.

5 Sentiment Analysis
5.1 Computation method
Export the TWEET table as an Excel file and save the CONTENT column in TWEET
table as contentwincsv.csv.
Use the following code to calculate the polarity and print lists:


Programming error pointed some unorganized data in the dataset.


Find this unorganized data in excel and clear them:


The program can run smoothly. The output is as follows:

14


Copy the programming output into Excel as the ORIGINAL LIST column in the
following picture. And use formulae to extract CONTENT and POLARITY from
ORIGINAL LIST:


The polarity of each tweet is as follows:

5.2 Stakeholders expressing negative sentiments


According to TWEETER:


Top 10 negative sentiments by tweeter:

Tweeter
Type
cnneditorchuck OTHER
cnnireport
OTHER
forbesintellect OTHER
greenprogress OTHER
cnygreg
OTHER
joenbc
OTHER
sfkarenmc
OTHER
politicolnews
MEDIA
datelinenbc
OTHER
wcpblog
SOCMOV

Polarity
-4
-4
-4
-4
-3
-3
-3
-2.5
-2.5
-2.322580645
15


According to TYPE:


Top 3 negative sentiments by type:

Type
CORP
GOV
MEDIA

Polarity
0.292035398
0.203401843
-0.104506232

5.3 Hashtags associated with negative/positive sentiments

We used the following SQL to select desirable dataset and export it to Excel.


Then we sorted data and obtained the hashtags associated with most negative and
positive sentiments as follows:

Hashtag
Polarity
#3g
-9
#att
-9
#mtr
-9
#lives
-8
#p2#hcr
-8
#wayofliving
-8

Hashtag
Polarity
#failedeconomy

8
#okaloosaisland

8
#pain

8
#random

8
#hebrewnational

7.5
#goodheartandsmart
7
#jimmybuffett

7
#paulwatson

7

16


The following picture presents the popular hashtags.

5.4 Visualize mention network


Use SQL to create two csv files. One contains all edges (TWEET.TWEETER and
MENTION.MENTION); another contains nodes (TWEET,TWEETER and
POLARITY.POLARITY).
In TWEET table, several tweets are likely to have the same tweeter name, but we
can only know which tweeter connects with which tweeter, so we replace the
MENTION.TWEETID with TWEET.TWEETER.
Then we use Gephi to visualize this dataset as follows:

17

In this picture, the red nodes represent positive sentiment (high polarity); the blue
nodes represent negative sentiment (low polarity); and the yellow nodes represent
mild sentiment.

Based on this graph, we can find that peoples sentiment are likely to be affected by
their neighbors. An individual that receives positive information is likely to be
positive or neither too positive nor too negative. On the other hand, someone that
receives both positive reviews and negatives review is likely to hold a neutral
attitude towards this event.

5.5 Change of sentiments

We used SQL to generate the dataset. And we created a TIME INTERVAL column,
which began from the date the content was posted to ten days later. Then we
imported this dataset into Gephi as follows. Because we just wanted to observe the
change over time, we did not add any edges to this dataset.

18


We still use blue to represent negative sentiment and red to represent positive
sentiment. The whole picture is as follows:


Then we use the TIMELINE function in Gephi to observe the change as the following
picture. At the beginning, we have just 4 nodes. It means that only a few tweets
cover this event. Then the number of tweets about this event increased (Video
record at
https://www.youtube.com/watch?v=G69nA6_HwpM&feature=youtube_gdata_playe
r).

19


We found that the sentiment was not simply changing from negative to positive. The
sentiment fluctuated. So we used SQL to calculate the average daily polarity and
made a chart in Excel as follows:




The polarity fluctuates between -1 and 0.6. This substantiated our observation in
Gephi. It is easy to observe that after June 10th 2010, the polarity reached its lowest
point. The reason for this, we believe, is that during the observed time period, TIME
publicized that Oil-spill estimate upped again. on June 10th , and The Guardian
declared, Obama compares the BP oil spill to 9/11 on June 14th . These influential
people and media companies affected peoples sentiment towards this event.

Then we used SQL to export the polarity of different types of tweeters on different
dates. We then utilized Excel to draw following line charts.

20

The graph above shows that celebrities have higher polarity when the average
polarity is positive; they also have lower polarity when the average polarity is
negative.

This graph presents that in most of time, corporations polarity was higher than
average polarity.


This picture indicates that in the first half of period, government hold less stronger
polarity than other groups, such as celebrities. During this period, the trend of
governments polarity is opposite to the average polarity. Besides, in the second half
of period, government hold strong positive attitude in most of time.

21

The polarity of grassroot fluctuates between -1.2 and 0.8, which is smaller than the
scale of other groups polarities.


In most of time, the average trend is very close to media trend. It is possible that
media opinion leads common opinion. At April 28 and June 18, medias polarity is
different form the average polarity. At April 28, it was indicated that the flow of oil
was five times larger than first estimation. At June 17 and 18, Hayward was accused,
and Moodys decreased BPs credit rate3.

In most of time, the polarity of social movement was lower than the average
polarity.

3 The Guardian: BP oil spill timeline

22

There is no special connection between other polarity and average polarity.



To summarize, the polarities of different types tweeter were consistently related to
average polarity, but the relations were not similar.

6 Implications & Conclusion


Our examination of the Deepwater Horizon dataset allowed us to gain additional
insight into the crisis, which we may not have been able to glean otherwise. For
instance, while the majority of the press surrounding the crisis focused on the
federal and corporate response to the event, these two groups had a rather small
social media footprint on Twitter.

Another lesson learned came from our sentiment analysis of the dataset. Using
negative/positive connotations to examine tweets in a chronological fashion gave us
a uniquely different view of the overall network, and one which we had not seen
before. By examining this information over the course of time, we were also able to
see how individual tweeters were affected by the sentiment of their neighbors. For
example, if an individual was connected to others who shared equally opposing
views (e.g., one very negative and one very positive), that individual tended to
maintain a more neutral position. Using sentiment analysis over a small, but distinct
period of time, allowed us to see how individual users on social network can be
swayed by the information that they receive from others. Interestingly enough,
individuals with some of the most polarizing accounts, tended to be heavily followed
individuals (i.e., those with significant influence), such as media correspondents and
corporations. Information produced by these Twitter accounts reached a very large
audience, allowing them the opportunity to influence many within their network.

One lesson learned, which made us question the validity and accuracy of our
analyses, involved our implicit trust in the organization of the dataset. Upon closer
examination, we realized that much of the labeling of the dataset was incorrect. For
example, Twitter handles which should have belonged to the Media category were

23

labeled Other and vice versa. Below is a table that shows just a handful of these
instances:

Twitter Account
Original Label
Revised Label
@CBSRadioNews
Other
Media
@HuffPostHill
Other
Media
@NBCNightlyNews
Other
Media
@CDCEmergency
Other
Gov

Based on these discrepancies, we felt it important to caveat our analyses by stating
that we did not scrub the data to correct any labeling inaccuracies. Instead, we
completed our work based on the data provided by Topsy.

That being said, based on our overall analyses of the dataset, we used betweenness
as a measure for determining the most influential actors in the Deepwater Horizon
network. We did this two different ways:

1. We affixed the type from the tweeter table to determine the maximum
betweenness centrality for each group.
2. Second, we separated the mention network into different groups and then
measured betweenness centrality to determine which group was most
influential.

Based on these two methods, we found that @nvf is the most influential tweeter in
the SOCMOV group. Overall, the most influential tweeters were:

whodat35
seachele420
Nwf
Winterthur
digiphilE
Ibrrc
Bpamerica
humidcity
oil_leaks


In addition to our knowledge gained about the dataset, our analyses also allowed us
to learn a great deal about Twitter and its utility. As part of our analysis, we
examined whether Twitter can be considered a medium for companies to
disseminate information or whether it is a platform for the masses to express their
ideas. Of the 675 unique tweeters in this network, 131 of these, approximately 19%,
represent Media outlets and the remaining 544 represent Celebrities (3), Grassroot
Organization (180) Social Movement Organizations (74), Corporations (4),

24

Government (12) and Other (268) which is composed mainly of the average twitter
user.

Tweeter Groups
Media
19%

40%

0%

Celebrities
Grassroot

27%
11%

Social Movement
Corportations
Government

2% 1%

Other


Figure : An overall breakdown of the individual groups within the Deepwater
Horizon Twitter Network.

However, while this breakdown suggests that the average twitter user may be the
dominant user of this social media service, as discussed earlier, a closer examination
of the Tweeter table shows that there may be some error in the classification
scheme used to organize these data. As a result, it would be difficult to base our
answer off of these categories. Instead, we will use the overall number of tweets vs.
retweets as a very general breakdown of information dissemination (retweets) vs.
expression of new ideas (tweets), an idea put forth by researchers Macksassy &
Michelson. 4

In the Deepwater Horizon Twitter network, there were 29,888 instances of original
tweets and 42,828 instances of retweets. This breakdown, which shows almost
double the number of retweets compared to original tweets, suggests that Twitter is
mainly being used as a means of information dissemination as opposed to the
expression of individual ideas.


4 Macskassy, S. & Michelson, M. (2011) Why Do People Retweet? Anti-Homophily
Wins the Day! Association for the Advancement of Artificial Intelligence. Accessed
at:
<http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2790/32
91>

25

Tweets vs. Retweets

41%
Tweet

59%

Retweet


Figure : Using retweets as a metric for information dissemination, we can see that
the majority of the messages from the Deepwater Horizon network accomplish this
purpose.

Using a similar strategy, we sought to determine whether Social Movement
Organizations (SOCMOV) were benefitting from Twitter for a call to action. As you
can see from the chart below, the majority of the SOCMOV accounts in the
Deepwater Network relied on information dissemination. This suggests that Social
Movements were not using Twitter as their main platform for expressing new ideas,
but rather to ensure that a larger target audience could be exposed to their existing
ideologies and practices.

Social Movement Organization


Twitter Activity

42%
58%

Tweets
Retweets


Figure : Using retweets as a metric for information dissemination, we can once again
see that SOCMOV are relying on Twitter as a means of perpetuating their message.

26

In addition to SOCMOVs, one of the other big players in the Deepwater Horizon
crisis was the Government. Although they did not have as many accounts within
their group, we used the existing group members and their respective tweets to
determine which political party was most active.

Within the GOV group, we picked out the following three Twitter accounts since
they were the only accounts present that represented an individual political figure:

David Vitter: a Junior US Senator from Louisiana and a member of the
Republican Party. 64 Unique Tweets and 21 retweets.

Senator Bob Menendez: Senior US Senator from New Jersey and a
member of the Democratic Party. 7 Unique Tweets and 4 retweets

Senator Bernie Sanders: a Junior US Senator from Vermont and a member
of the Independent Party. 9 Unique tweets (account is run by staff and
not the senator) and 4 retweets.

While there is a clear split in party representation (1/3 each) amongst the
individual actors, other accounts such as the Senate_GOPS handle was very active
during the Oil Spill. This account, which provides News updates from Senators and
their Staff, had 43 distinct tweets and 17 retweets during the timeframe in
question. In comparison, the White House (lead by Pres. Obama of the Democratic
Party) had only 19 total tweets, of which 14 were retweets.

Number of Tweets

GOV Account Activity on Twitter


70
60
50
40
30
20
10
0

Tweets
Retweets

Account Name



Figure : Twitter activity in terms of tweets and retweets per most active GOV
Twitter accounts.

As a result of this breakdown, we can see that the Republican Party seemed to be the
most active. However, the next step was to examine whether this activity was being
used as a means of engaging in debate or simply relaying the partys pre-existing
frames.

27


For the most part, tweets from GOV handles were largely retweets. These retweets
were used to announce TV show appearances or to publicize articles that featured
quotes from Senators and other members from each party. However, two accounts
in particular, David Vitter and the Senate_GOPS, were both very active in producing
original tweets that challenged the response from both the President as well as BP.
One possible reason for David Vitters large social media footprint could have been
his close geographic relationship to the spill. As a junior Senator from Louisiana, one
of the regions most affected by the oil spill, he was very vocal to ensure his
constituents that he was working to enact an appropriate and timely cleanup
procedure. Similarly, one possible reason for the activity on the Senate_GOPS
handle, could have been related to critique of the White House which was being led
by a Democratic president.

In conclusion, this project allowed us to learn a great deal about the use and impact
of Twitter in documenting an event such as an environmental disaster. However, in
the future, in order to ensure that the results of data analyses pulled from a social
media network are accurate, it is important to have faith in the data being used. This
means that multiple rounds of data cleaning and peer review should occur before an
analysis can be considered useful for reporting purposes. Once this has occurred,
data from platforms such as Twitter provide a real-time snapshot of public
sentiment and can be incredibly beneficial in disseminating information.

28

Вам также может понравиться