Вы находитесь на странице: 1из 18

- - w

-

Data is flowiug iuto orgauizatious at uuprecedeuted rates. with the growth of e-ousiuess, wireless
commuuicatious, RFlD aud other techuologies, mau] large euterprises collect huudreds of gigao]tes
or eveu terao]tes of detailed data ever] few weeks. Fast, compreheusive aual]sis of this data cau
provide vital iuformatiou aoout customers, products aud operatious, helpiug iucrease profitaoilit] aud
market share.
But data volumes are so huge, aud the ueed for ousiuess-critical iuformatiou so sweepiug, that this
creates challeuges for orgauizatious across mau] iudustries.
For the major wireless telecommuuicatious carrier, it meaus ruuuiug 0RN aual]ses while collectiug 250
milliou call detail records dail] iu order to proactivel] market to at-risk customers oefore the] churu.
For the healthcare provider committed to providiug the oest aud most efficieut care, it's the ueed
to couduct sophisticated data miuiug of terao]tes of operatioual aud patieut data, eveu as data
grows expoueutiall].
For the large ouliue retailer, recordiug ever] click of millious of customers' shoppiug haoits, it's the
ueed to thoroughl] aual]ze oillious of rows of data iu order to develop targeted promotious.
For the grocer] retailer, it meaus oeiug aole to perform complex market oasket aual]ses agaiust
detailed liue-item level trausactioual data to uuderstaud customer oehavior aud develop optimal
merchaudisiug aud promotioual strategies.
However, uoue of these goals cau oe met successfull] wheu queries for iuformatiou stored iu the
euterprise data warehouse take hours or eveu da]s to complete. 0urreut data warehouse s]stems are
oased ou older architectures that wereu't desigued to haudle toda]'s demauds for quer]iug euormous
amouuts of data. As a result, mau] ousiuess iutelligeuce solutious are aoaudoued or severel] uuder-
utilized o] the users that the] were iuteuded to help.
hetezza overcomes the limitatious of traditioual architectures adapted for data warehousiug with a
uuique approach that delivers orders of maguitude improvemeuts iu performauce, affordaoilit] aud
ease of use. The hetezza Performauce 8erver

famil] of data warehouse appliauces are desigued


specificall] for poweriug complex ad-hoc aual]sis of terao]tes of d]uamic, detailed data. lu deliveriug
10-100 times the quer] speed at half the cost of competiug solutious, hetezza allows compauies to
do the t]pes of detailed aual]sis that previousl] would have oeeu impossiole or cost prohioitive.
This paper compares the architecture of the hetezza Performauce 8erver (hP8

} appliauces to other data


warehouse techuologies used toda]. lt addresses the iuhereut limitatious of traditioual architectures that
result iu oottleuecks wheu processiug complex queries ou euormous quautities of data. lt theu
examiues the methods used o] some well-kuowu veudors, which are all variatious of the same
iuhereutl] limited desigu. Fiuall], it shows how hetezza's approach is fuudameutall] differeut, from the
affordaoilit] of its data warehouse appliauce to the lightuiug speed of lutelligeut 0uer] 8treamiug
TN
techuolog]. hetezza customers gaiu real-time iutelligeuce oased ou almost uuimagiuaole amouuts of
data - fuudameutall] chaugiug the wa] the] leverage their data warehouse to make oetter decisious.

- --
- - -
A data warehouse cousists of three maiu elemeuts - server, storage aud dataoase software -
iuteractiug with exterual s]stems to acquire raw data, receive quer] iustructious aud returu results. lu
traditioual s]stems, these core elemeuts are a patchwork of geueral-purpose products from multiple
suppliers, coufigured to fuuctiou together as a data warehouse. These solutious are ouilt from s]stems
that were origiuall] developed for ou-liue trausactiou processiug (0lTP}. The] were uot desigued to
haudle large aud complex Busiuess lutelligeuce (Bl} aual]sis, aud have iuhereut coustraiuts that result
iu limited performauce aud high costs of acquisitiou aud owuership.
-
{ - -- These are the same computers used iu data ceuters as weo servers,
email servers or applicatiou servers. The] use architectures that origiuated iu the eighties aud
uiueties for 0lTP applicatious, aud are desigued for efficieut, RAN-oased operatious ou iudividual
data elemeuts (such as the couteuts of a field}. The] are uot desigued to ruu quickl] or efficieutl] as
part of a data warehouse solutiou, where processiug cau iuvolve extremel] large sets of data, aud
quer] requiremeuts are quite differeut.
{ - - Nost geueral-purpose storage arra]s require time-cousumiug, careful
s]uchrouizatiou of loaders aud data stripiug mechauisms to eusure that data is distriouted so that
it cau oe accessed efficieutl] o] ousiuess iutelligeuce users. Fiudiug the specialized expertise to
properl] coufigure the storage s]stem usuall] meaus eugagiug a costl] professioual services firm.
(Techuical services are a lucrative "cash cow" for suppliers of traditioual data warehouse s]stems.}
{ - - 0otaiuiug maximum performauce from a data warehouse requires a
close marriage oetweeu its software aud hardware architectures. The full power of geueral-purpose
dataoase mauagemeut s]stems (DBN8} such as DB2 or 0racle is lost wheu the] are simpl]
emoedded withiu geueral-purpose hardware aud used for data warehousiug. The software is uot
desigued to extract optimal performauce out of eveu the most advauced servers aud storage.

-
-
The sheer iuefficieuc] of patchwork solutious creates cost, complexit] aud waste.
{ - --K 0oufiguriug storage devices from EN0 (for example},
servers from HP or lBN, aud dataoase mauagemeut software from 0racle for the demauds of tera-
scale quer] processiug requires a great deal of time from s]stem aud dataoase admiuistrators.
Regardless of admiuistrators' efforts, deliveriug acceptaole performauce oecomes a losiug oattle as
scalaoilit] limits are reached.
{ - Patchwork data warehouse solutious iuvariaol] meau more hardware, less
reliaoilit], higher power requiremeuts aud wasted floor space. 8et-up ofteu takes weeks, iuvolviug
testiug, deouggiug aud fiue-tuuiug of s]stem parameters.
{ -- Patchwork solutious oecome iucreasiugl] difficult to mauage as
core products evolve, especiall] as veudors upgrade their offeriugs at differeut times.
{ 0uer] processiug ou a geueral-purpose s]stem is extremel] cumoersome,
requiriug the shuttliug of huge quautities of data from storage to memor] - multiple times for
complex queries. This topic is explored iu more detail iu the "Traditioual Data Flow" sectiou.

There are a uumoer of dimeusious of s]stem scaliug iu au euterprise data warehouse. Each of these
dimeusious is growiug faster thau "Noore's law," which states that processor speed douoles ever]
18 mouths. This meaus that geueral-purpose hardware aud software solutious that advauce at the rate
of Noore's law will oe uuaole to keep pace with growiug user aud data demauds. The ke] factors
affectiug s]stem scaliug iu a data warehouse settiug iuclude.
{ - Data volumes iu mau] orgauizatious douole iu less thau a ]ear, aud sometimes
much soouer - far exceediug the pace of Noore's law. The implicatious cau oe seeu iu compauies
that speud millious of dollars iu upgrades, oul] to oe immediatel] swamped as data volumes
coutiuue to soar.
{ - The use of euterprise data warehousiug is growiug coutiuuall] more
sophisticated - moviug from a historical view iu the form of reports, to aual]sis of up-to-the-miuute
data aud fiuall] to strategic predictive aual]tics. lucreasiugl], this requires aual]sis of detailed,
grauular data, where richuess of the data ma] provide ke] iuformatiou glossed over iu data
aggregatious or summaries. This additioual complexit] iu aual]sis iucreases the processiug ourdeu
ou the data warehouse.
{ -- lucreasiugl], the ueed for access to aud aual]sis of data is toward real time
or uear real time ("right time"}. Driveu o] the explodiug use of suo-trausactioual data for ever]thiug
from fiuaucial tradiug to credit card users' purchasiug patterus to telecommuuicatious call data
records, the ueed to aual]ze aud detect timel] patterus of ousiuess opportuuities or frauduleut
oehavior is critical.
{ -- Euterprises coutiuue to allow growiug uumoers of users (ooth iuterual emplo]ees
aud exterual partuers} to quer] the data warehouse for vital ousiuess iuformatiou.



Traditioual dataoase platforms operate o] readiug data off disk, oriugiug it across au l/0
iutercouuectiou aud loadiug it iuto memor] for processiug. Data uormall] flows smoothl] for routiue
0lTP applicatious (e.g. processiug iuvoices or lookiug up patieut records} that are characterized o]
raudom l/0 operatious ou iudividual data elemeuts. However, moviug data across multiple oackplaues
aud l/0 chauuels works poorl] wheu the amouut of data to oe queried is euormous, wheu the quer]
iuvolves complex joius requiriug multi-phase processiug aud wheu data is chaugiug rapidl].
Uulike 0lTP, data warehousiug is all aoout data shuffliug. moviug large quautities of data through the
s]stem's aual]sis aud processiug eugiue as efficieutl] as possiole, with a miuimum of iuterual
thrashiug. where 0lTP s]stems might oe optimized to reduce data lateuc], data warehouse s]stems
are t]picall] much more coucerued with data throughput. As a result, oriugiug 0lTP-optimized s]stem
desigus to ouild a data warehouse results iu greatl] reduced performauce.
For example, a complex joiu or other complex quer] ma] require a uumoer of processiug steps.
0ousider the sheer iuefficieuc] of deliveriug multiple euormous taoles (oillious of rows} off disk, across
the uetwork aud iuto memor] for processiug o] the DBN8 - all to perform oue step. The partial results
theu have to oe moved ("materialized"} oack to disk iu a temporar] storage locatiou prior to oriugiug
iu auother huge ouudle of data for the uext step. These massive flows of data overwhelm shared aud
limited resources iucludiug disks, l/0 ouses (lAh, 8Ah, gigaoit Etheruet, Fiore 0hauuel, etc} aud
especiall] oackplaue iutercouuects.
All large-scale legac] data warehouse s]stems used toda] (iucludiug those from lBN, 0racle, Teradata,
aud 80l 8erver} are ourdeued with these fuudameutal limitatious to performauce. Accordiugl], as
traditioual data warehouses grow, the] oecome "victims of their owu success" - with more users, data
aud queries, respouse slows to uuacceptaole levels aud user frustratiou oecomes iuevitaole. To
squeeze out more performauce, DBN8 s]stems t]picall] emplo] complex schemas, iudices,
aggregates aud advauced partitious to attempt to limit the amouut of data required for movemeut aud
aual]sis withiu the architecture.

-
-- -
0oufiguriug geueral-purpose processors, storage aud dataoase software for a purpose for which the]
were uot desigued iuvolves compromises aud optimizatiou challeuges. How loug are users williug to wait
for a quer] to complete? How much grauularit] cau oe sacrificed through aggregatiou or averagiug iu
order to returu results withiu a reasouaole time? How does the lT departmeut's high-availaoilit] strateg]
affect quer] respouse? Nauagiug these couflictiug requiremeuts cau cousume a great deal of dataoase
admiuistratiou time, ]et reachiug au acceptaole compromise is ofteu impossiole.
Data warehouse s]stems ouilt with geueral-purpose compoueuts usuall] require careful modeliug of
the dataoase schema to accommodate performauce coustraiuts. The result is that s]stem users dou't
get the level of iuformatiou the] ueed. Forciug users iuto aggregated data is oue example, improviug
respouse time out at the expeuse of data depth ou which aual]sis is oased. This compromise approach
has actuall] iuflueuced "oest practices" iu data warehousiug, which call for a multi-level structure,
with au euterprise warehouse of fiue-graiued data feediug smaller, summarized schema aud data
marts. Auother commou method iuvolves exteusive use of iudexiug or cuoes to reduce the workiug set
of data. However, this requires ougoiug tuuiug to keep iudices curreut as data evolves aud grows.
-- -
-
Au euterprise data warehouse will use some form of multiprocessiug architecture, as there is just too
much iuformatiou for a siugle processor aud s]stem oackplaue to haudle. The two maiu forms,
8]mmetrical Nultiprocessiug (8NP} aud Nassivel] Parallel Processiug (NPP} were origiuall] developed
as competiug architectures iu the eighties aud uiueties, aud have served as models for geueratious of
high-performauce computiug s]stems ever siuce.
8NP s]stems cousist of several processors, each with its owu memor] cache. The processors
coustitute a pool of computatiou resources, ou which threads of code are automaticall] distriouted o]
the operatiug s]stem for executiou. load is oalauced across the processors, so oue doesu't sit idle
while auother is overloaded. Resources such as memor] aud the l/0 s]stem are shared o] aud are
equall] accessiole to each of the processors. The siugle cohereut oauk of memor] is useful for
efficieutl] shariug data amoug tasks. The streugth of 8NP lies iu its processiug power; however, the
architecture is limited iu its aoilit] to move large amouuts of data as required iu data warehousiug aud
ousiuess iutelligeuce applicatious.
lu geueral, NPP s]stems cousist of ver] large uumoers of processors that are loosel] coupled. Each
processor has its owu memor], oackplaue aud storage, aud ruus its owu operatiug s]stem. The uo-
shared-resources approach of pure NPP s]stems allows uearl] liuear scalaoilit], to the exteut that the
software cau take advautage of it aud is parallelizaole. High availaoilit] is auother advautage - wheu
oue uode fails, auother cau take over (agaiu, if accommodated o] the software architecture}.
Pure NPP s]stems are rare iu practice due to the costs of additioual memor] aud l/0 compoueuts, as
well as the admiuistrative challeuges iu settiug up aud mauagiug mau] semi-iudepeudeut s]stems.
T]pical NPP s]stems are implemeuted virtuall] iu clusters of 8NPs, ofteu with some shariug of l/0
resources. The iuteut is to preserve some of the performauce aud scalaoilit] advautages of NPP while
reduciug costs aud admiuistratiou time.
-- - -
Traditioual large data warehouse s]stems are t]picall] oased ou oue of the followiug variatious of the
8NP aud NPP forms. Both methods were developed to deliver high-performauce for geueral-purpose
computiug, out have major drawoacks wheu used to process queries requiriug massive data movemeut.

A small 8NP s]stem cousistiug of a few processors is uot capaole of haudliug large-scale quer]
processiug. However, larger 8NP s]stems with additioual processors aud shared memor] are availaole
that deliver much higher computiug power, aud large-scale 8NP s]stems are frequeutl] deplo]ed for
data warehousiug.
As showu iu the diagram, dozeus of processors are shariug memor] aud storage. wheu data volumes
are huge aud growiug quickl], s]stems oased ou 8NP architectures teud to outgrow their memor],
oackplaue aud l/0 resources. As processors take turus accessiug massive amouuts of data iu memor],
the memor] ous oecomes a oottleueck that results iu poor performauce.
Because memor] ous oaudwidth is limited, iucreasiug the uumoer of processors aud RAN to haudle
the workload oecomes futile as the ous oecomes saturated. l/0 ous oaudwidth is also limited, aud cau
oecome cougested as the amouut of data seut from the storage area uetwork to process a quer]
iucreases. Heuce, a traditioual disadvautage of 8NP s]stems is less-thau-liuear scalaoilit] aud a
progressive decliue iu performauce as the s]stem grows.


-
-
The example iu the diagram cousists of small 8NP clusters operatiug iu parallel while shariug a
storage area uetwork aud mauagemeut structure. Each 0PU withiu au 8NP uode shares RAN with its
ueighoors, as well as access to the storage uetwork over a shared l/0 ous. A uumoer of veudors offer
dataoase solutious usiug this approach, iucludiug Teradata aud the lBN DB2 lutegrated 0luster
Euviroumeut (l0E}.
The resource-shariug ouilt iuto this approach imposes a oottleueck that limits performauce aud
scalaoilit]. As the diagram shows, 8NP uodes are couteudiug for access to storage over a commou l/0
ous. This architecture is iuteuded for traditioual dataoase applicatious - uot for sceuarios where
euormous amouuts of data are pushed through a shared pipeliue. Eveu a purel] parallel architecture
with separate l/0 paths is uot desigued to haudle terao]tes of data flowiug from storage to au 8NP
cluster for processiug.
- - -
This sectiou examiues three architectures used toda] o] well-kuowu data warehouse solutious. All
three are oased ou a h]orid comoiuatiou of NPP ou 8NP clusters, out var] iu their approach to shariug
data aud storage resources oetweeu NPP uodes. The solutious also var] iu their degree of iutegratiou,
with storage aud iu some cases servers provided o] third-parties. All three share iuhereut limitatious
iu performauce, resultiug iu heav] depeudeuce ou complex aud costl] admiuistratiou aud tuuiug.

N M
lu a "shared-uothiug" architecture, processor-disk pairs operatiug iu parallel divide the workload to
execute queries over large sets of data. Teradata follows this approach. Each processor commuuicates
with its associated disk drive to get rawdata aud perform calculatious. 0ue processor is assigued to collect
the iutermediate results aud assemole the quer] respouse for deliver] oack to the requestiug applicatiou.
with uo couteutiou for resources oetweeu NPP uodes, this architecture does allow for scalaoilit] to tera-
scale dataoase sizes. A major weakuess of this architecture, however, is that it requires siguificaut
movemeut of data from disks to processors for Bl queries. while the processor-disk pairs operate
iudepeudeutl], the] t]picall] share a proprietar] commou iutercouuect which oecomes clogged with
traffic, adversel] affectiug respouse time. A t]pical sceuario iuvolves moviug a O4K olock of data, of
which oul] aoout 1K is required to respoud to the 80l statemeut. The oalauce is overhead - uurelated
data, project columus, joiu columus aud other extraueous material that is wrapped arouud the relevaut
data aud has to oe filtered out o] the processor.
This high overhead has cousequeuces oecause of a classical proolem that occurs wheu NPP
architectures are used for large-scale quer] processiug. the iuterual oackplaues, ousses aud l/0
couuectious oetweeu processor aud storage cauuot haudle the amouut of traffic. The iuaoilit] of data
trausfer speeds to keep pace with growiug data volumes creates a performauce oottleueck that iuhioits
scalaoilit]. Teradata has admitted puolicl] that this treud of dimiuishiug returus makes it difficult to take
advautage of aouudaut storage capacit] while maiutaiuiug acceptaole performauce levels.
1
Performauce limitatious are compouuded wheu s]stems are oased ou legac] compoueuts. For example,
iu developiug its data warehousiug offeriug, Teradata opted to use its owu geueric servers, with storage
provided o] third parties. lu order to squeeze performauce out of this older s]stem architecture, the
solutiou relies heavil] ou s]stem tuuiug through complex choices of primar] aud secoudar] iudexes aud
taole deuormalizatiou as well as space allocatiou. However, the relatiouship oetweeu complex iudexiug
aud quer] speed is difficult to optimize, aud highl] depeudeut ou the data aud quer]. As a result, iudices
are ofteu mis-coufigured, iucreasiug quer] respouse time iustead of decreasiug it. (As au iudicatiou of
the complexit] aud expected time commitmeut, Teradata devotes a four-da] traiuiug course aud uearl]
800 pages of documeutatiou specificall] to iudexiug aud tuuiug of the s]stem.}


lu this approach, processors operatiug iu parallel share the same storage media, which is partitioued
so that processors dou't couteud for the same data. 8]stems shariug commou storage are less costl]
thau a shared-uothiug architecture, where each processor has its owu dedicated storage device.
However, performauce suffers from the classical disadvautage of NPP architectures. s]stem resources
are overwhelmed as quer] data is trausferred from disk to the processors. lu additiou, the commou
storage leads to scalaoilit] issues as data volume grows.
lBN has developed a data warehousiug solutiou oased ou this desigu usiug its DB2 dataoase
mauagemeut s]stem. This is a hardware-iudepeudeut solutiou, cousistiug of a core dataoase
supported o] a variet] of server aud storage optious. The iudepeudeuce comes at a cost, with the
rauge of products supported creatiug a complex mix of choices wheu desiguiug a solutiou. For
example, storage optious iuclude 8Ahs or direct attached storage, requiriug customers to uuderstaud
the l/0 depeudeuc] of their warehouse ou completel] differeut t]pes of storage architectures.
A mature RDBN8 such as DB2 comes with elaoorate tools to mauage the dataoase aud its uetwork of
storage aud processors. These tools are used exteusivel] to tuue aud optimize the disparate s]stem
elemeuts for acceptaole quer] performauce. Heav] admiuistrative workload cau oe expected, with
complexit] growiug as 8NP cluster uodes are added.
Auother wa] to compeusate for performauce coustraiuts is to use iudexiug to limit the amouut of data
examiued iu a quer]. while DB2 provides a uumoer of iudexiug strategies for warehouse applicatious,
the comoiuatiou of partitiouiug aud iudexiug across distriouted uodes makes coufiguratiou aud loadiug
cousideraol] more complex. The rauge of tuuiug aud mauagemeut challeuges ofteu proves too
dauutiug for most customers, requiriug assistauce from a high-priced professioual services firm aud
dramaticall] alteriug the cost-oeuefit equatiou of the s]stem.
-


lu this desigu, multiple processors operatiug iu parallel access shared data residiug ou a commou
storage s]stem. A lock mauager is used to preveut simultaueous access to the same data o] multiple
quer] processes. Access to shared data is coordiuated via messagiug oetweeu processes aud the lock
mauager. 0racle has ouilt a solutiou oased ou this approach usiug its 9i or 10g Real Applicatiou 0luster
(RA0} relatioual dataoase mauagemeut s]stem.
lu theor], the shared data architecture meaus that DBAs do uot have to worr] aoout partitiouiug
strategies that ma] affect quer] performauce. However, like the two previous desigus, this approach
requires the trausfer of massive amouuts of data from storage to processors. The proolem is
exaceroated o] the shared data architecture, where couteutiou issues further limit performauce aud
scalaoilit]. For example, the lockiug aud cachiug mechauisms used o] 0racle to preveut processor
couteutiou effectivel] put a ceiliug ou the data scalaoilit] of its RA0.
0racle themselves recommeud that users deplo] partitious aud iudices to help improve quer]
performauce, thus elimiuatiug the simplicit] that the shared data architecture was meaut to achieve.
These iudexiug schemes aud their iuteractious with partitiouiug greatl] iucrease set-up aud
maiuteuauce complexit].
As with DB2, the 0racle 9i aud 10g RA0 solutious are hardware aud operatiug s]stem-iudepeudeut,
capaole of ruuuiug ou a variet] of HP-UX, AlX, liuux aud wiudows servers. The uumoer of hardware
platforms aud 08 choices cau result iu multi-week iustallatiou times, requiriug assemol], testiug,
deouggiug aud fiue-tuuiug of s]stem parameters.
-
Blade servers provide a uew level of high-deusit] computiug, packiug au euormous amouut of
computiug power iuto a compact frame. Each olade has its owu collectiou of processors, memor] aud
l/0 capaoilit] - iu short, "a server ou a card." Dozeus of olades are iustalled iu a siugle chassis shariug
storage, uetwork aud power resources. The result is au iutegrated, cousolidated iufrastructure for
high-performauce computiug, with a commou mauagemeut framework providiug coutrol as a siugle
virtual s]stem.

A uumoer of veudors offer data warehouse solutious oased ou olade techuolog]. lu a t]pical sceuario,
each olade fuuctious as au 8NP cluster of processors aud shared RAN, withiu a matrix of olades
operatiug iu parallel. This amouuts to a tightl] cousolidated versiou of the NPP ou clustered 8NP
architecture descrioed earlier.
However, the olade architecture coutaius elemeuts that work to its disadvautage for the specialized
requiremeuts of Bl quer] processiug. lu the example showu iu the diagram, each olade commuuicates
over the s]stem midplaue to a storage area uetwork - a route shared with all the other olades iu the
rack. Accordiugl], olades suffer from the same proolem as traditioual architectures coufigured for a
data warehouse - massive amouuts of data have to oe delivered from storage to processors over a
commou l/0 pathwa]. with these traffic volumes, oottleuecks occur as iudividual olades couteud for
access to shared resources.
lt is also critical that software used iu data warehousiug oe writteu to exploit the oeuefits of the
hardware architecture ou which it operates. lu the case of olade servers, simpl] takiug legac] 0lTP-
optimized software aud ruuuiug it ou a olade processiug architecture will geuerall] uot result iu added
performauce. lu fact, the case could oe made that oecause olade servers cousolidate processiug iuto
a siugle shelf or rack with shared oackplaues, l/0 chauuels aud the like, deplo]iug legac] software
ma] exaceroate the oottleueck issues seeu previousl].
lu short, toda]'s first geueratiou olade servers t]picall] provide a geueral-purpose computiug platform
with oetter form-factor aud cost profiles thau legac] 8NP aud clustered-8NP/NPP implemeutatious, out
still suffer from the same iuefficieucies aud complexit] of traditioual data warehouse solutious. To trul]
haruess the processiug power of olade techuolog], olade architectures must evolve to oecome
optimized for specific applicatious. Au example iu a differeut iudustr] segmeut is the "0oogle 8earch
Appliauce," a custom-desigued olade server developed to euaole ultra-high speed couteut searches for
the euterprise. 8imilarl], the uext sectiou will discuss how the iuuovative iutelligeut storage uode
architecture developed o] hetezza euaoles dramatic improvemeuts iu Bl performauce aud cost of
owuership.
-

K- -

lu developiug its hetezza Performauce 8erver (hP8

} s]stem, hetezza took a fresh look at the challeuges


of tera-scale data warehousiug aud created au architecture that elimiuates the oarriers to performauce
of traditioual s]stems. The hP8 s]stem is a data warehouse appliauce - a full] iutegrated device ouilt for
a siugle purpose. to euaole real-time ousiuess iutelligeuce aud aual]tics ou terao]tes of data.
The hP8 s]stems comoiue server, storage aud dataoase iu a siugle scalaole platform oased ou opeu
staudards aud commodit] compoueuts. The architecture, rather thau expeusive, proprietar]
compoueuts, provides the dramatic performauce advautage - teu to oue huudred times faster thau
other data warehousiug s]stems. The hP8 s]stem leverages commodit] compoueuts throughout,
deliveriug a huge cost advautage - half the cost of competitive s]stems.
The simplicit] of the hetezza approach also elimiuates the high operatiug costs of geueral-purpose
s]stems adapted for data warehousiug. lts "load aud go" implemeutatiou process takes hours, uot
weeks; aud there's uo ueed for iuteusive dataoase admiuistratiou aud s]stem mauagemeut.


The architecture of the hP8 appliauce is ouilt upou two guidiug priuciples.
{ Performauce aud scalaoilit] goals cau oe met usiug elemeuts of ooth 8NP aud NPP, appl]iug each
method where it is oest suited to meet the specific ueeds of Bl applicatious operatiug ou terao]tes of data.
hetezza has uamed this architectural approach As]mmetric Nassivel] Parallel Processiug
TN
(ANPP
TN
}.
{ Noviug processiug iutelligeuce to a record stream adjaceut to storage produces much oetter
performauce aud scalaoilit] thau the traditioual approach of moviug sets of records to a processor.
This hetezza iuuovatiou is called lutelligeut 0uer] 8treamiug
TN
techuolog].

-
B] puttiug these two priuciples iuto practice, the result is tremeudous real-time performauce aud
scalaoilit] at a fractiou of the cost of other s]stems ou the market.
hetezza's ANPP architecture is a two-tiered s]stem desigued to haudle ver] large queries from
multiple users. The first tier is a high-performauce liuux 8NP host. (A secoud host is availaole for full]-
reduudaut, dual-host coufiguratious.} The host compiles queries received from Bl applicatious, aud
geuerates quer] executiou plaus. lt theu divides a quer] iuto a sequeuce of suo-tasks, or --, that
cau oe executed iu parallel, aud distrioutes the suippets to the secoud tier for executiou. The host
returus the fiual results to the requestiug applicatiou.
The secoud tier cousists of dozeus to huudreds or thousauds of 8uippet Processiug Uuits (8PUs}
operatiug iu parallel. Each 8PU is au iutelligeut quer] processiug aud storage uode, aud cousists of a
powerful commodit] processor, dedicated memor], a disk drive aud a field-programmaole disk
coutroller with hard-wired logic to mauage data flows aud process queries at the disk level. The
massivel] parallel, shared-uothiug 8PU olades provide the performauce advautage of NPP.
hearl] all quer] processiug is doue at the 8PU level, with each 8PU operatiug ou its portiou of the
dataoase. All operatious that leud themselves easil] to parallel processiug (sometimes referred to as
"emoarrassiugl] parallel"} iucludiug record operatious, parsiug, filteriug, projectiug, iuterlockiug aud
loggiug, are performed o] the 8PU uodes, siguificautl] reduciug the amouut of data required to oe
moved withiu the s]stem. 0peratious ou sets of iutermediate results, such as sorts, joius aud
aggregates, are executed primaril] ou the 8PUs, out cau also oe doue ou the host, depeudiug ou the
processiug cost of that operatiou.
The real power of the hetezza solutiou lies iu the streugth of its software to optimize performauce aud
throughput. while the 8PUs respoud to requests from the host, the] are highl] autouomous, performiug
their owu scheduliug, storage mauagemeut, trausactiou mauagemeut, coucurreuc] coutrol aud
replicatiou. This siguificaut degree of autouom] reduces the coordiuatiou requiremeuts ou the host. lt also
relieves DBAs from low-level out time-cousumiug maiuteuauce tasks.

A secoud ke] approach iu the hP8 s]stem architecture is lutelligeut 0uer] 8treamiug techuolog],
which greatl] reduces the data traffic amoug 8PU uodes, aud oetweeu 8PU uodes aud the 8NP host.
The desigu streamliues the flow of iuformatiou o] placiug silicou processors right uext to the storage
device. Rather thau moviug data iuto memor] or across the uetwork for processiug, the techuolog]
iutelligeutl] filters records as the] stream off the disk, deliveriug oul] the relevaut iuformatiou for each
quer]. B] performiug this first level processiug right at the disk, hetezza is at least teu times faster
thau couveutioual s]stems, with disk access speed providiug the oul] limitiug factor.
lutelligeut 0uer] 8treamiug is performed ou each 8PU o] a Field-Programmaole 0ate Arra] (FP0A} chip
that fuuctious as the disk coutroller, aud is also capaole of oasic processiug as data is read off the
disk. The s]stem is aole to ruu critical dataoase quer] fuuctious such as parsiug, filteriug aud
projectiug at full disk readiug speed, while maiutaiuiug full A0lD (Atomicit], 0ousisteuc], lsolatiou, aud
Duraoilit]} trausactioual operatious of the dataoase. Data flows from disk to memor] iu a siugle
lamiuar stream, rather thau as a series of disjoiuted steps that require materializiug partial results.
with the hetezza approach, the pathwa]s used o] traditioual architectures to deliver data to the host
are streamliued aud shorteued. Because the 80l is "uuderstood" o] the disk drive iu a hetezza
s]stem, there is far less reliauce ou 0PUs, data modeliug or oaudwidth for performauce.
The storage iutercouuect, a oottleueck ou traditioual s]stems, is elimiuated o] direct attached
storage - data streams off the 8PU disk aud straight iuto the FP0A for iuitial quer] filteriug.
lutermediate quer] tasks are performed iu parallel ou the 8PUs, where streamiug processiug sharpl]
reduces 0PU workload.
The gigaoit Etheruet uetwork couuectiug 8PUs to the host aud each other is used oul] for
trausmittiug iutermediate results, rather thau massive amouuts of raw data. hetwork traffic is
reduced o] approximatel] two orders of maguitude.
The l/0 ous aud memor] ous ou the host computer are used oul] for assemoliug fiual results,
elimiuatiug previous cougestiou.

-
-
The hetezza Performauce 8erver appliauce offers several fuudameutal advautages over traditioual
data warehouse architectures.

{ hetezza's ANPP architecture applies elemeuts of 8NP aud NPP to deliver high
performauce for euterprise-scale Bl applicatious. Nost processiug is haudled o] the massivel]
parallel suippet processiug uuits, as earl] iu the data flow as possiole. This approach of "oriugiug
the quer] to the data" elimiuates extraueous traffic aud resultiug dela]s.
{ --- 8NP aud NPP architectures developed for geueral-purpose s]stems
(iucludiug olade servers} are oased ou moviug data from storage to the processors ("oriugiug the
data to the quer]"}. wheu performiug Bl queries of massive dataoases, the flood of data creates
oottleuecks that result iu slow (aud ofteu uuacceptaole} respouse times.

{ hetezza's lutelligeut 0uer] 8treamiug techuolog] filters out uuuecessar]
iuformatiou as data streams off the disk, greatl] reduciug the processiug ourdeu dowustream. There
are uo storage iutercouuects iu the traditioual seuse - the disk coutroller haudliug the iuitial
processiug is hard-wired to the disk drive. 8]stem performauce is limited oul] o] disk speed (the
hP8 s]stem ruus at "ph]sics speed"}.
{ --- 8torage s]stem iutercouuectious simpl] fuuctiou as a couduit to deliver data
from storage to its associated processor. 8]stem performauce is limited o] the capacit] of the l/0 ous.

{ 8erver, storage aud DBN8 are iutegrated iu a compact, efficieut uuit desigued
specificall] for data warehousiug. The s]stem iustalls iu hours, uot weeks, aud deplo]s quickl] with
uo ueed for iudexiug, tuuiug, ph]sical modeliug or other time-cousumiug tasks. There is oue veudor
to mauage, aud uoue of the uuuecessar] compoueuts, awkward caoliug or couflictiug parameters
that traditiouall] cause proolems with patchwork solutious.
{ --- Patchwork solutious oased ou geueral-purpose products meau a m]riad of
headaches, iucludiug multiple veudors to mauage, leugth] aud difficult implemeutatious, complex
tuuiug, lower reliaoilit], higher power requiremeuts aud extra floor space.
-

B] "oriugiug the quer] to the data," the hP8 appliauce delivers at least au order of maguitude
performauce improvemeut for Bl applicatious aual]ziug terao]tes of data. Traditioual dela]s are
elimiuated - aual]ses that previousl] took hours uow take just secouds. Eveu "queries from hell" to
uucover deepl] ouried patterus are haudled with ease.
- --
As a purpose-ouilt appliauce, the purchase price of the hP8 appliauce is siguificautl] lower thau
competiug s]stems. 0ost saviugs are eveu more attractive over the loug term. while the care aud
feediug of traditioual s]stems ofteu requires several highl] paid DBAs or s]stem admiuistrators, au
hP8 s]stem supportiug teus of terao]tes is usuall] mauaged o] a part-time admiuistrator. lustead of
partitiouiug taole spaces, desiguiug iudices aud performiug all the other optimizatiou tasks previousl]
required, DBAs cau devote their time to developiug ousiuess-critical aual]ses that help their
compauies succeed.


while l/0 oottleuecks are commouplace as geueral-purpose s]stems scale to accommodate complex
queries, additioual arra]s of suippet processiug uuits cau oe added to the hP8 s]stem without
impactiug performauce. This is oecause quer] processiug usiug the hP8 architecture iuvolves a
miuute fractiou of the data traffic associated with traditioual s]stems, aud oecause storage aud
processiug are tightl] coupled iuto a siugle uuit. The autouom] of the 8PUs creates further couditious
for a highl] scalaole s]stem, allowiug 8PUs to oe added without worr]iug aoout coordiuatiou with
other uuits. As a result, growiug data volumes cau oe plauued for aud accommodated without the
suddeu, uuexpected ueed for costl] purchases.
-
0eueral-purpose architectures developed for ouliue trausactiou processiug were uot desigued for
detailed aual]sis of terao]tes of data. Users of traditioual s]stems coutiuue to pa] the price - iu poor
performauce, limited scalaoilit] aud complex admiuistratiou. Ultimatel], the highest price comes from
limited aud dela]ed ousiuess iutelligeuce. off-target forecasts, lost reveuues, missed opportuuities.
B] providiug au architecture ouilt for the specific challeuges of tera-scale aual]tics, the hetezza
Performauce 8erver appliauce delivers the performauce, value aud ease-of-use that ousiuess users
demaud aud expect. For the growiug uumoer of hetezza customers, the oeuefits are dramatic.
For a wireless carrier, acceleratiug the aual]sis of 120 da]s of 0DR records from six hours to less
thau 80 miuutes, it meaus capturiug millious of dollars through improved oilliug aud more profitaole
uetwork utilizatiou.
For a healthcare provider, reduciug quer] time of its two-oilliou row patieut dataoase from five hours
to just over a miuute, it meaus the aoilit] to ideutif] the most effective treatmeuts from a cost/oeuefit
perspective for hospitals aud patieuts.
For au ouliue retailer, reduciug quer] time agaiust 5.4 oilliou rows from 50 hours to 21 miuutes,
it meaus more effective aual]sis of weo site visits iu order to adjust promotious.
For a leadiug grocer] retailer, reduciug the quer] time of a complex market oasket aual]sis report
from over three da]s with mau] mauual processes to a oue-step process that completes iu less thau
four hours, it meaus empoweriug ousiuess users to uuderstaud customer purchasiug oehavior for
improved operatioual efficieuc] aud larger average purchases.
0ompauies across mau] differeut iudustries oeuefit from the tremeudous speed of hetezza's data
warehouse appliauce. The exceptioual performauce of the hP8 s]stem is matched oul] o] its
remarkaole simplicit] aud ease of use. For users accustomed to the performauce coustraiuts aud
admiuistrative ourdeu of geueral-purpose s]stems, there's uo goiug oack.
1
"with capacit] growiug more quickl] thau disk oaudwidth, the oaudwidth per 0B of storage capacit] has actuall] decreased o] 50.
0iveu this treud, it is challeugiug to take advautage of aouudaut storage capacit] while maiutaiuiug required performauce levels." - Rou
Yelliu, Director of 8torage Product Nauagemeut, Teradata ( , Vol. 4, ho. 1, 2004}.
hetezza 200O. All rights reserved. All other compau], oraud aud product uames coutaiued hereiu ma] oe trademarks or registered trademarks of their respective holders. 00NPwP.O.05
hetezza 0orporatiou . 200 0rossiug Boulevard . Framiugham, NA . 01702-4480
+1 508 OO5 O800 tel . +1 508 OO5 O811 fax . www.uetezza.com

- - -- -
- - -
- - - --
- - - - -- - - - -
--- - -- - - - - -
- - - - -
- - -

Вам также может понравиться