Вы находитесь на странице: 1из 5

MethodsofIncrementalLoadinginDataWarehouse

WrittenbyDWBIConceptsTeam

LastUpdated:18June2014

Incrementalloadinga.k.aDeltaloadingisanwidelyusedmethodtoloaddataindatawarehousesfromthe

respectivesourcesystems.Thistechniqueisemployedtoperformfasterloadinlesstimeutilizingless

systemresources.Inthistutorialwewillunderstandthebasicmethodsofincrementalloading.

WhatisIncrementalLoadingandwhyisitrequired

Inalmostall data warehousingscenario, we extractdata from one or more source systems andkeep storingtheminthedatawarehouseforfutureanalysis.ThesourcesystemsaregenerallyOLTPsystems whichstoreeverydaytransactionaldata.Nowwhenitcomestoloadingthesetransactionaldatatodata

warehouse,wehave2waystoaccomplishthis,FullLoadorIncrementalLoad.

Tounderstandthesetwoloadsbetter,considerasimplescenario.Let'ssaymysourcesysteminRDBMS­

thatis,adatabase­andIhave2tables,customerandSales.

InthecustomertableIhavedetailsofallmycustomersinthisformat:

CustomerID CustomerName Type

Entry Date

1 John

Individual

22‐Mar‐2012

2 Ryan

Individual

22‐Mar‐2012

3 Bakers'

Corporate

23‐Mar‐2012

Inthesalestable,Ihavethedetailsofproductsoldtocustomers.Thisishowthesalestablelookslike:

ID

CustomerID

ProductDescription

Qty

Revenue Sales Date

1

1

White sheet (A4)

100

4.00

22‐Mar‐2012

2

1

James Clip (Box)

1

2.50

22‐Mar‐2012

3

2

Whiteboard Marker

1

2.00

22‐Mar‐2012

4

3

Letter Envelop

200

75.00

23‐Mar‐2012

5

1

Paper Clip

12

4.00

23‐Mar‐2012

Asyoucansee,abovetablesstoredatafor2consecutivedays­22Marand23Mar.On22Mar,Ihadonly

2customers(JohnandRyan)whomade3transactionsinthesalestable.Nextday,Ihavegotonemore

customer(Bakers')andIhaverecorded2transactions­onefromBakers'and1frommyoldcustomer

John.

Alsoimagine,wehaveadatawarehousewhichisloadedeverydayinthenightwiththedatafromthis

system.

FULLLOADMETHODFORLOADINGDATAWAREHOUSE

Incasewearetooptforfullloadmethodforloading,wewillreadthe2sourcetables(Customerand

Sales)everydayinfull.So,

On22Mar2012:Wewillread2recordsfromCustomerand3recordsfromSalesandloadallofthemin

thetarget.

On23Mar2012:Wewillread3recordsfromcustomer(includingthe2olderrecords)and5recordsfrom

sales(including3oldrecords)andwillloadorupdatetheminthetargetdatawarehouse.

Asyoucanclearlyguess,thismethodofloadingunnecessarilyreadoldrecordsthatweneednotreadas

wehavealreadyprocessedthembefore.Henceweneedtoimplementasmarterwayofloading.

INCREMENTAL LOAD METHOD FOR LOADING DATA WAREHOUSE

Incaseofincrementalloading,wewillonlyreadthoserecordsthatarenotalreadyreadandloadedinto

ourtargetsystem(datawarehouse).Thatis,on22March,wewillread2recordsfromcustomerand3

recordsfromsales­however­on23March,wewillread1recordfromcustomerand2recordsfrom

sales.

Buthowdoweensurethatwe"only"readthoserecordsthatarenot"already"read?Howdoweknow

whichrecordsarealreadyreadandwhichrecordsarenot?

Thisisatrickyquestionbuttheansweris,fortunately,easy!

Wecanmakeuseof"entrydate"fieldinthecustomertableand"salesdate"fieldinthesalestabletokeep

trackofthis.Aftereachloadingwewill"store"thedateuntilwhichtheloadinghasbeenperformedin

somedatawarehousetableandnextdayweonlyextractthoserecordsthathasadategreaterthanour

storeddate.Let'screateanewtabletostorethisdate.Wewillcallthistableas"Batch"

Batch

Batch_ID Loaded_Until Status

1 22‐Mar‐2012

Success

2 23‐Mar‐2012

Success

Oncewehavedonethis,all wehavetodotoperformincremental or deltaloadingistoriteour data extractionSQLqueriesinthisformat:

CustomerTableExtractionSQL

SELECT t.* FROM Customer t WHERE t.entry_date > (select nvl(

max(b.loaded_until), to_date('01‐01‐1900', 'MM‐DD‐YYYY')

)

from batch b where b.status = 'Success');

SalesTableExtractionSQL

SELECT t.* FROM Sales t WHERE t.sales_date > (select nvl(

max(b.loaded_until), to_date('01‐01‐1900', 'MM‐DD‐YYYY')

)

from batch b where b.status = 'Success');

Okay,nowatthispointyoumaywonderandask

There wont be any record in our batch table since we have not loaded any batch yet. So "SELECT max(b.loaded_until)"willreturnNULL.ThatiswhywehaveputoneNVL()functiontoreplacetheNULLwith

averyoldhistoricaldate­01Jan1900inthiscase.

Sointhefirstday,weareaskingtheselectquerytoextractallthedatahavingentrydate(orsalesdate)

greaterthan01­Jan­1900.Thiswillessentiallyextracteverythingfromthetable.Once22Marloadingis

complete,wewillmakeoneentryinthebatchtable(entry1)tomarkthesuccessfulextractionofrecords.

SecondDay(23Mar):

Nextday,thequery"SELECTmax(b.loaded_until)"willreturnme22­Mar­2012.Soineffect,abovequeries

willreducetothis:

CustomerTableExtractionSQL

SELECT t.* FROM Customer t WHERE t.entry_date > '22‐Mar‐2012';

SalesTableExtractionSQL

SELECT t.* FROM Sales t WHERE t.sales_date > '22‐Mar‐2012';

As youcanunderstand, this will ensure thatonly 23­Mar records are extractedfrom the table thereby performingasuccessfulincrementalloading.Afterthisloadingiscompletesuccessfully,wewillmakeone

moreentryinthebatchtable(entrynumber2).

WhyMAX()isusedintheabovequery?

Whenwetrytoload23Mardata,therewasonlyoneentryinthebatchtable(thatof22nd).Butwhenwe

gotoload24thdataoranydataafterthat,therewillbemultipleentriesinthebatchtable.Wemusttake

themaxoftheseentries.

Whystatusfieldiscreatedinbatchtable?

Thisisbecauseitmightsohappenthat23rdloadhasfailed.Sowhenwestartloadingagainon24th,we

musttakeintoconsiderationboth23rddataand24thdata.

Batch_ID Loaded_Until Status

1 22‐Mar‐2012

Success

2 23‐Mar‐2012

Fail

3 24‐Mar‐2012

Success

Intheabovecase,23rdbatchloadwasafailure.Thatiswhynextdaywehaveselectedallthedataafter

22­Mar(including23rdand24thMar).

AboutUs

DataWarehousingandBusinessIntelligenceOrganization™­AdvancingBusinessIntelligence

DWBI.orgisaprofessionalinstitutioncreatedandendorsedbyveteranBIandDataAnalyticsprofessionals

fortheadvancementofdata­drivenintelligence

Copyright

| ContactUs(/contact) Copyright (https://creativecommons.org/licenses/by­nc­sa/4.0/)

Exceptwhereotherwisenoted,contentsofDWBI.ORGbyIntellipLLP(http://intellip.com)islicensedunder

aCreativeCommonsAttribution­NonCommercial­ShareAlike4.0InternationalLicense.

Getintouch

Getintouch (https://www.facebook.com/datawarehousing) (https://twitter.com/dwbiconcepts)
(https://twitter.com/dwbiconcepts) (https://www.linkedin.com/company/dwbiconcepts)
(https://twitter.com/dwbiconcepts) (https://www.linkedin.com/company/dwbiconcepts)
(https://www.linkedin.com/company/dwbiconcepts) (https://www.youtube.com/dwbiconcepts)

Security

(https://plus.google.com/b/105042632846858744029) Security