Вы находитесь на странице: 1из 8

Data Analytics using Twitter Social Media

Author(s):
Saikat Chatterjee

Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page 1 of 8

Contents
Contents!
1 "igh #evel Overview$
11 Intro%uction $
1& Solution Overview $
& Detaile% Description'
&1 (rchitecture Overview )
&& Macro %esign *
! +nvironment ,ee%s-
Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page & of 8

1. High Level Overview
1.1 Introduction
Twitter is a massive social networking site tuned towards fast communication. More than 140
million active users publish over 400 million 140-character Tweets ever! da!. Twitter"s speed
and ease of publication have made it an important communication medium for people from all
walks of life. Twitter has pla!ed a prominent role in socio-political events# such as the $rab
%pring and the &ccup! 'all %treet movement. Twitter has also been used to post damage
reports and disaster preparedness information during large natural disasters# such as the
(urricane %and!.
This document provides a ver! high level overview of the proposed solution and the
software)hardware re*uirements necessar! for building it.
1.2 Solution Overview
This application showcases some of the data anal!tical works that can be achieved using the
Twitter +,%T based $-. -
/ollecting# storing# and anal!0ing Twitter data
%tore this data in a tangible wa! for use in real-time applications
1ocus on common measures and algorithms that are used to anal!0e social media data
2isual anal!tics# an approach which helps humans inspect the data through intuitive
visuali0ations
Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page ! of 8

2. Detailed Description
Collecting, storing, and analyzing Twitter data
3sers on Twitter generate over 400 million Tweets ever!da!1. %ome of these Tweets are
available to researchers and practitioners through public $-.s at no cost. .n this chapter we will
learn how to e4tract the following t!pes of information from Twitter5
.nformation about a user#
$ user"s network consisting of his connections#
Tweets published b! a user# and
%earch results on Twitter.
$-.s to access Twitter data can be classi6ed into two t!pes based on their design and access
method5
+,%T $-.s are based on the +,%T architecture7 now popularl! used for designing web
$-.s. These $-.s use the pull strateg! for data retrieval. To collect information a user
must e4plicitl! re*uest it.
%treaming $-.s provides a continuous stream of public information from Twitter. These
$-.s use the push strateg! for data retrieval. &nce a re*uest for information is made# the
%treaming $-.s provide a continuous stream of updates with no further input from the
user. The! have di8erent capabilities and limitations with respect to what and how much
information can be retrieved. The %treaming $-. has three t!pes of endpoints5
-ublic streams5 These are streams containing the public tweets on Twitter.
3ser streams5 These are single-user streams# with to all the Tweets of a user.
%ite streams5 These are multi-user streams and intended for applications which
access Tweets from multiple users.
Storing Twitter Data
There has been an e4plosion in the si0e of data generated on social media. This data e4plosion
calls for a new data storage paradigm. $t the forefront of this movement is 9o%:;# which
promises to store big data in a more accessible wa! than the traditional# relational model. There
are several 9o%:; implementations. .n this book# we choose Mongo<= as an e4ample 9o%:;
implementation. 'e choose it for its adherence to the following principles5
<ocument-&riented %torage. Mongo<= stores its data in >%&9-st!le ob?ects. This makes
it ver! eas! to store raw documents from Twitter"s $-.s.
.nde4 %upport. Mongo<= allows for inde4es on an! 6eld# which makes it eas! to create
inde4es optimi0ed for !our application.
%traightforward :ueries. Mongo<="s *ueries# while s!ntacticall! much di8erent from
%:;# are semanticall! ver! similar. .n addition# Mongo<= supports Map+educe# which
allows for eas! lookups in the data.
Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page $ of 8

Analyzing Twitter Data
Man. of the /uestions that we as0 of our Twitter %ata can 1e answere% through networ0 anal.sis 2uestions
such as 3who is important456 3who tal0s to whom456 an% 3what is important45 can all 1e answere% through a
networ0 7sing proper networ0 measures6 we can fin% these important actors or topics in a networ0
/entralit! - 'ho is important@
<egree /entralit! - 'ho gets the most retweets@
,igenvector /entralit! - 'ho is the most inAuential@
=etweenness /entralit! - 'ho controls the Aow of information@
1inding +elated .nformation with 9etworks
1inding Topics in the Te4t
;<$B;atent <irichlet allocationC /alculation with M$;;,T
%entiment $nal!sis
Visualizing Twitter Data
'hen users interact on Twitter# network information is generated# and when the! publish
Tweets# te4tual information is generated. Tweets themselves have other embedded information#
such as location information. .n addition# users have pro6les where the! describe themselves
through 6elds# such as their name and website. 2isuali0ation techni*ues can help us eDcientl!
anal!0e and understand how and wh! users interact on Twitter. .n this chapter# we discuss
techni*ues to create visuali0ations for the four t!pes of information5
network# temporal# geo-spatial# and te4tual information.'hile discussing the techni*ues# we
follow the visuali0ation mantra5 &verview 6rst# then 0oom and 6lter. <etails on demand. 'e
will focus our discussion on two t!pes of networks5
.nformation Aow networks# and
1riend-1ollower networks.
2.1 Architecture Overview

Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page ' of 8

2.2 Macro design
Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page ) of 8


Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page * of 8

3. nviron!ent "eeds
ID: !et"eans #
$anguage and Tools: %D& '(#, %) '(*, A+ache Ant latest ,ersion, -it '(.(/(0
1e" Ser,er: A+ache To2cat .
De,elo+2ent 3S: 1indows # (Internet 2ust "e accessi"le as our a++lication calls Twitter4s 5ST A6I),
Ad2inistrator access le,el
De+loy2ent 3S: $inu7 (Internet 2ust "e accessi"le as our a++lication calls Twitter4s 5ST A6I),
Ad2inistrator access le,el
Data"ase: 8ongoD9 latest ,ersion
Document: Techathon Solution Overview Template
Owner: IBM Status: Draft
Page 8 of 8

Вам также может понравиться