Академический Документы
Профессиональный Документы
Культура Документы
By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.
CHAPTER
11
With th e barriers of data volu m e an d costs som ewh at elim in ated, th ere
are still sign i can t obstacles for SMB en tities to leverage Big Data.
Th ose obstacles in clu de th e pu rity of th e data, an alytical kn owledge,
con su m er su rplu s. Bu sin ess leaders in every sector are goin g to h ave to
deal with th e im plication s of Big Data, eith er directly or in directly.
Fu rth erm ore, th e in creasin g volu m e an d detail of in form ation
acqu ired by bu sin esses an d govern m en t agen cies paired with th e rise
of m u ltim edia, social m edia, in stan t m essagin g, e-m ail, an d oth er
In tern et-en abled tech n ologies will fu el expon en tial growth in data
for th e foreseeable fu tu re. Som e of th at grow th can be attribu ted to
in creased com plian ce requ irem en ts, bu t a key factor in th e in crease in
data volu m es is th e in creasin gly sen sor-en abled an d in stru m en ted
world. Exam ples in clu de RFID tags, veh icles equ ipped with GPS sen -
sors, low-cost rem ote sen sin g devices, in stru m en ted bu sin ess pro-
cesses, an d in stru m en ted web site in teraction s.
Th e qu estion m ay soon arise of wh eth er Big Data is too big, leadin g
to a situ ation in wh ich determ in in g valu e m ay prove m ore dif cu lt.
Th is will evolve in to an argu m en t for th e qu ality of th e data over th e
qu an tity. Neverth eless, it will be alm ost im possible to deal with ever-
growin g data sou rces if bu sin esses don t prepare to deal with th e
m an agem en t of data h ead-on .
Before 2010, m an agin g data was a relatively sim ple ch ore: On lin e
tran saction processin g system s su pported th e en terprise s bu sin ess
processes, operation al data stores accu m u lated th e bu sin ess tran sac-
tion s to su pport operation al reportin g, an d en terprise data wareh ou ses
accu m u lated an d tran sform ed bu sin ess tran saction s to su pport both
operation al an d strategic decision m akin g.
Th e typical en terprise n ow experien ces a data growth rate of 40 to
60 percen t an n u ally, wh ich in tu rn in creases n an cial bu rden s an d
data m an agem en t com plexity. Th is situ ation im plies th at th e data
th em selves are becom in g less valu able an d m ore of a liability for m an y
bu sin esses, or a low-com m odity elem en t.
Noth in g cou ld be fu rth er from th e tru th . More data m ean m ore
valu e, an d cou n tless com pan ies h ave proved th at axiom with Big Data
an alytics. To exem plify th at valu e, on e n eeds to look n o fu rth er th an at
h ow vertical m arkets are leveragin g Big Data an alytics, wh ich leads to
a disru ptive ch an ge.
For exam ple, sm aller retailers are collectin g click-stream data from
web site in teraction s an d loyalty card data from tradition al retailin g
operation s. Th is poin t-of-sale in form ation h as tradition ally been u sed
by retailers for sh oppin g basket an alysis an d stock replen ish m en t, bu t
m an y retailers are n ow goin g on e step fu rth er an d m in in g th e data for
a cu stom er bu yin g an alysis. Th ose retailers are th en sh arin g th ose data
(after n orm alization an d iden tity scru bbin g) with su ppliers an d
wareh ou ses to brin g added ef cien cy to th e su pply ch ain .
An oth er exam ple of n din g valu e com es from th e world of sci-
en ce, wh ere large-scale experim en ts create m assive am ou n ts of data
for an alysis. Big scien ce is n ow paired with Big Data. Th ere are far-
reach in g im plication s in h ow big scien ce is workin g with Big Data; it is
h elpin g to rede n e h ow data are stored, m in ed, an d an alyzed. Large-
scale experim en ts are gen eratin g m ore data th an can be h eld at a lab s
data cen ter (e.g., th e Large Hadron Collider at CERN gen erates over 15
petabytes of data per year), wh ich in tu rn requ ires th at th e data be
im m ediately tran sferred to oth er laboratories for processin g a tru e
m odel of distribu ted an alysis an d processin g.
Oth er scienti c quests are prime examples of Big Data in action ,
fueling a disru ptive change in h ow experim en ts are performed and
data in terpreted. Th an ks to Big Data m eth odologies, contin en tal-scale
experiments h ave become both politically and techn ologically feasible
(e.g., th e Ocean Observatories Initiative, the National Ecological Obser-
vatory Network, and USArray, a con tin en tal-scale seism ic observatory).
Mu ch of th e disru ption is fed by im proved in stru m en t an d sen sor
tech n ology; for in stan ce, th e Large Syn optic Su rvey Telescope h as a
3.2-gigabyte pixel cam era an d gen erates over 6 petabytes of im age
data per year. It is th e platform of Big Data th at is m akin g su ch lofty
goals attain able.
Th e validation of Big Data an alytics can be illu strated by advan ces
in scien ce. Th e biom edical corporation Bioin form atics recen tly
an n ou n ced th at it h as redu ced th e tim e it takes to sequ en ce a gen om e
from years to days, an d it h as also redu ced th e cost, so it will be feasible
to sequ en ce an in dividu al s gen om e for $1,000, pavin g th e way for
im proved diagn ostics an d person alized m edicin e.
Th e n an cial sector h as seen h ow Big Data an d its associated
an alytics can h ave a disru ptive im pact on bu sin ess. Fin an cial services
THE FUTURE IS N O W
New developm en ts for processin g u n stru ctu red data are arrivin g on
th e scen e alm ost daily, with on e of th e latest an d m ost sign i can t
com in g from th e social n etworkin g site Twitter. Makin g sen se of its
m assive database of u n stru ctu red data was a h u ge problem so h u ge,
in fact, th at it pu rch ased an oth er com pan y ju st to h elp it n d th e valu e
in its m assive data store. Th e su ccess of Twitter revolves arou n d h ow
well th e com pan y can leverage th e data th at its u sers gen erate. Th is
am ou n ts to a great deal of u n stru ctu red in form ation from th e m ore
th an 200 m illion accou n ts th e site h osts, wh ich gen erates 230 m illion
Twitter m essages a day.
To address th e problem , th e social n etworkin g gian t pu rch ased
BackType, th e developer of Storm , a software produ ct th at can parse
live data stream s su ch as th ose created by th e m illion s of Twitter feeds.
Twitter h as released th e sou rce code of Storm , m akin g it available to
oth ers wh o wan t to pu rsu e th e tech n ology. Twitter is n ot in terested in
com m ercializin g Storm .
Storm h as proved its valu e for Twitter, wh ich can n ow perform
an alytics in real tim e an d iden tify tren ds an d em ergin g topics as th ey
develop. For exam ple, Twitter u ses th e software to calcu late h ow
widely Web addresses are sh ared by m u ltiple Twitter u sers in real tim e.
With th e capabilities offered by Storm , a com pan y can process Big
Data in real tim e an d garn er kn owledge th at leads to a com petitive
advan tage. For exam ple, calcu latin g th e reach of a Web address cou ld
take u p to 10 m in u tes u sin g a sin gle m ach in e. However, with a Storm