Академический Документы
Профессиональный Документы
Культура Документы
By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.
CHAPTER
ssem blin g a Big Data solu tion is sort of like pu ttin g togeth er an
erector set. Th ere are variou s pieces an d elem en ts th at m u st be pu t
togeth er in th e proper fash ion to m ake su re everyth in g works
adequ ately, an d th ere are alm ost en dless com bin ation s of con gu ra-
tion s th at can be m ade with th e com pon en ts at h an d.
With Big Data, th e com pon en ts in clu de platform pieces, servers,
virtu alization solu tion s, storage arrays, application s, sen sors, an d rou t-
in g equ ipm en t. Th e righ t pieces m u st be picked an d in tegrated in a
fash ion th at offers th e best perform an ce, h igh ef cien cy, affordability,
ease of m an agem en t an d u se, an d scalability.
Big Data con sists of data sets th at are too large to be acqu ired, h an dled,
an alyzed, or stored in an appropriate tim e fram e u sin g th e tradition al
in frastru ctu res. Big is a term relative to th e size of th e organ ization
an d, m ore im portan t, to th e scope of th e IT in frastru ctu re th at s in
place. Th e scale of Big Data directly affects th e storage platform th at
m u st be pu t in place, an d th ose deployin g storage solu tion s h ave to
u n derstan d th at Big Data u ses storage resou rces differen tly th an th e
typical en terprise application does.
47
Most of th e
existin g platform s for processin g data were design ed for h an -
dlin g tran saction al Web application s an d h ave little su pport
for bu sin ess an alytics application s. Th at situ ation h as driven
Hadoop to becom e th e de facto stan dard for h an dlin g batch
processin g. However, real-tim e an alytics is altogeth er differen t,
requ irin g som eth in g m ore th an Hadoop can offer. An even t-
processin g fram ework n eeds to be in place as well. Fortu n ately,
several tech n ologies an d processin g altern atives exist on th e
m arket th at can brin g real-tim e an alytics in to Big Data plat-
form s, an d m an y m ajor ven dors, su ch as Oracle, HP, an d IBM,
are offerin g th e h ardware an d software to brin g real-tim e pro-
cessin g to th e forefron t. However, for th e sm aller bu sin ess th at
m ay n ot be a viable option becau se of th e cost. For n ow, real-
tim e processin g rem ain s a fu n ction th at is provided as a service
via th e clou d for sm aller bu sin esses.
Transform ing Big Data application
development into something more m ainstream may be the best
way to leverage what is offered by Big Data. This means creating
a built-in stack that integrates with Big Data databases from
the NoSQL world and creating MapReduce frameworks such as
Hadoop and distributed processing. Developm ent should account
for the existing transaction-processing and event-processing
semantics that come with the handling of the real-time analytics
that t into the Big Data world.
In its n ative form at, a large pile of u n stru ctu red data h as little valu e. It
is bu rden som e in th e typical en terprise, especially on e th at h as n ot
adopted Big Data practices to extract th e valu e.
However, extractin g valu e can be akin to n din g a n eedle in a
h aystack, an d if th at h aystack is spread across several farm s an d th e
n eedle is in pieces, it becom es even m ore dif cu lt. On e of th e prim ary
jobs of Big Data an alytics is to piece th at n eedle back togeth er an d
organ ize th e h aystack in to a sin gle en tity to speed u p th e search . Th at
can be a tall order with u n stru ctu red data, a type of data th at is growin g
in volu m e an d size as well as com plexity.
Un stru ctu red (or u n catalogu ed) data can take m an y form s, su ch
as h istorical ph otograph collection s, au dio clips, research n otes,
gen ealogy m aterials, an d oth er rich es h idden in variou s data libraries.
Th e Big Data m ovem en t h as driven m eth odologies to create dyn am ic
an d m ean in gfu l lin ks am on g th ese cu rren tly u n stru ctu red in form a-
tion sou rces.
For th e m ost part, th at h as resu lted in th e creation of m etadata an d
m eth ods to brin g stru ctu re to u n stru ctu red data. Cu rren tly, two dom i-
n an t tech n ical an d stru ctu ral approach es h ave em erged: (1) a relian ce
on search tech n ologies, an d (2) a tren d toward au tom ated data cate-
gorization . Man y data categorization tech n iqu es are bein g applied
across th e lan dscape, in clu din g taxon om ies, sem an tics, n atu ral lan -
gu age recogn ition , au to-categorization , wh at s related fu n ction ality,
data visu alization , an d person alization . Th e idea is to provide th e
in form ation th at is n eeded to process an an alytics fu n ction .