Вы находитесь на странице: 1из 21

The Shortcut Guide To

tm tm

Untangling the Differences Between High Availability and Disaster Recovery


Richard Siddaway

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

IntroductiontoRealtimePublishers
by Don Jones, Series Editor

Forseveralyearsnow,Realtimehasproduceddozensanddozensofhighqualitybooks thatjusthappentobedeliveredinelectronicformatatnocosttoyou,thereader.Weve madethisuniquepublishingmodelworkthroughthegeneroussupportandcooperationof oursponsors,whoagreetobeareachbooksproductionexpensesforthebenefitofour readers. Althoughwevealwaysofferedourpublicationstoyouforfree,dontthinkforamoment thatqualityisanythinglessthanourtoppriority.Myjobistomakesurethatourbooksare asgoodasandinmostcasesbetterthananyprintedbookthatwouldcostyou$40or more.Ourelectronicpublishingmodeloffersseveraladvantagesoverprintedbooks:You receivechaptersliterallyasfastasourauthorsproducethem(hencetherealtimeaspect ofourmodel),andwecanupdatechapterstoreflectthelatestchangesintechnology. Iwanttopointoutthatourbooksarebynomeanspaidadvertisementsorwhitepapers. Wereanindependentpublishingcompany,andanimportantaspectofmyjobistomake surethatourauthorsarefreetovoicetheirexpertiseandopinionswithoutreservationor restriction.Wemaintaincompleteeditorialcontrolofourpublications,andImproudthat weveproducedsomanyqualitybooksoverthepastyears. Iwanttoextendaninvitationtovisitusathttp://nexus.realtimepublishers.com,especially ifyouvereceivedthispublicationfromafriendorcolleague.Wehaveawidevarietyof additionalbooksonarangeoftopics,andyouresuretofindsomethingthatsofinterestto youanditwontcostyouathing.WehopeyoullcontinuetocometoRealtimeforyour educationalneedsfarintothefuture. Untilthen,enjoy. DonJones

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

IntroductiontoRealtimePublishers.................................................................................................................i Chapter1:WhatIsHighAvailability?..............................................................................................................1 WhatIsHighAvailability?................................................................................................................................1 WhatItIsnt.......................................................................................................................................................2 DefiningAvailability......................................................................................................................................3 Capacity...............................................................................................................................................................4 Servicevs.Data................................................................................................................................................5 DoYouStillNeedtoBackUp.....................................................................................................................6 BusinessRequirements....................................................................................................................................7 Ask,Ask,andAskAgain...............................................................................................................................7 BusinessProcesses........................................................................................................................................8 PeakPeriods.....................................................................................................................................................9 CriticalPeriods.................................................................................................................................................9 TheCostofHighAvailability..........................................................................................................................9 TheCostofDoingNothing.......................................................................................................................10 DoesYourBusinessDependonIT?.....................................................................................................10 DegreesofHighAvailability....................................................................................................................12 PayforWhatYouGet.................................................................................................................................13 DeliveringHighAvailability.........................................................................................................................13 ConsiderWholeInfrastructure..............................................................................................................13 NativeSolutions...........................................................................................................................................15 DataBasedSolutions.................................................................................................................................16 ApplicationLevelHighAvailability.....................................................................................................16 Monitoring......................................................................................................................................................17 Summary:MakingHighAvailabilityWorkforYou............................................................................17

ii

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Copyright Statement
2010 Realtime Publishers. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtime Publishers (the Materials) and this site and any such Materials are protected by international copyright and trademark laws. THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtime Publishers its web site sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials. The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice. The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties. Realtime Publishers and the Realtime Publishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. If you have any questions about these terms, or if you would like information about licensing materials from Realtime Publishers, please contact us via e-mail at info@realtimepublishers.com.

iii

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Chapter1:WhatIsHighAvailability?
Highavailabilityanddisasterrecoveryaretwotopicsthatareoftentangledinthoughtand action.Thisguidewilluntanglethedifferencesaswellasexplainthesimilaritiesandwhere thetwoareasconverge.Theguidestartswiththischapterexplaininghighavailability whatitisandwhatneedstobeconsideredwhenimplementingahighlyavailable infrastructure. Chapter2introducesdisasterrecovery,explainingtheconceptandcomparingitwithhigh availabilityplanning,implementation,andtestingarediscussed.Thechaptercloseswith alookathowthetechnologiesenablinghighavailabilityanddisasterrecoveryare producingaconvergenceinhowthetwoareimplemented. Chapter3returnstohighavailabilityandexamineshowyoucanconfigureyour environmenttobehighlyavailable.Thechapterexaminesthereasonssystemsbecome unavailableandlooksattraditionalhighavailabilitysolutionssuchasclustering.We explorehighavailabilityfromapplicationssuchasMicrosoftExchangeServerand MicrosoftSQLServeranddiscusshowvirtualizationbringsitsownavailabilitychallenges andsolutionstothemix. Highavailabilityisnotcreatedbytechnologyalone.Wealsoneedtoconsiderthepeople andtheprocesstheyoperate,whichweexploreinChapter4.Thischapterconsidersthe causesofdowntimeandhowyoucaneliminatethelargestcauseofunplanneddowntime; italsodiscussestheimpactofpeopleandprocessesonhighavailability.Aconsiderationof howyoucanapplythetechniquesandconceptsofhighavailabilityintothedisaster recoveryarenaclosesthechapter.Thefirstareaweneedtolookatishighavailability.

WhatIsHighAvailability?
Highavailabilityisoftendefinedassystemsthataredesigned,andimplemented,toprovide adesignatedlevelofoperationalavailabilityduringagivenperiod.Availabilityisoften definedastheabilityoftheuserstoaccess,andutilize,theirsystems.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Thisdefinitionoftenresultsinanarrowviewthatdoesntcoverthewholespectrumyou needtoconsider.Thefirstparagraphinthischapterclosedonthephrasehighlyavailable infrastructure.Thisphrasegetstotheheartofyourrequirements.Ahighlyavailable infrastructureincludes,amongothers: Serverandstoragehardware Data Network Maintenance Monitoring Configurations

Wellseehowtheseareasrelateasweproceedthroughthechapter.

WhatItIsnt
Highavailabilityismanythingstomanypeople,butthereareanumberofthingsthatit isnt.Thefirstthingtorememberisthathighavailabilityisnotasilverbullet.Creatinga clusterandusingitforyouremailordatabasesystemwillnotsolveyouravailability problemsiftheyareduetonetworkfailuresorpooradministrativeprocedures. Theotherimportantpointtorememberisthatthereisntasingleanswerthatwillsolve youravailabilityproblems.Assumeforamomentthatyouhaveanapplicationforwhich youneedtoensureavailability.Thereareanumberofcomponentsthatallhavetobe available: Oneormoreservers Databases Applications Storage Networklinks

Eachofthesecomponentswillrequiretheirownavailabilitysolutions.Theindividual availabilitysolutionswillhavetobeintegratedtoensurethewholesystemisconfigured forhighavailability.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Highavailabilityisnotatechnologysolution.Wewillcomebacktothispointlater,butan environmentcanonlybeconsideredashavinghighavailabilitywhenthreethingsarein place: Thecorrecttechnologies Thecorrectpeopleintermsofknowledge,skills,andmindset Thecorrectprocessestoensurethatadministrativeactionscontributetohigh availabilityratherthannegateit

Thesethreeitemsformthehighavailabilityframeworkthatisrequiredineverysituation. Wewillalsoseethatthisframeworkcarriesacrossintoourdisasterrecoverydiscussion.

DefiningAvailability
Availabilitycanbedefinedinanumberofwaysandbyanumberofpeople.Acommon measurementistousetheMeanTimeBetweenFailures(MTBF)andtheMeanTimeto Repair(MTR)todefineavailability.
Availability= MeanTimeBetweenFailures MeanTimeBetweenFailures+MeanTimeToRepair *100

Whendealingwithindividualcomponentssuchasadiskdrive,thismeasureisacceptable, buttheformulacanbedifficulttocalculatewhenthereareanumberofitemsforminga chainofavailability.Allthelinksinthechainmustbeavailablefortheservicetobe available.Asasubjectivemeasure,considerthischain: 1. Inthedatacenter,theserverisupandrunning,theservicehostingtheapplication hasloaded,andthestoragesystemsareonline. 2. Thenetworkadministratorsreportthatthenetworklinksbetweenthedatacenter andtheuserslocationareupandhavesufficientbandwidth. 3. TheusersPChasstartedandloadedtheclientapplication. Everythingseemstobeworkingbuttheusercannotaccesstheapplication.Asfarasthe userisconcerned,theapplicationhasfailed.Thatfailurecouldbeanywhereinthechain; forexample,itcouldoccur: Betweenthelinksandareasofresponsibilityoutlinedearlier Atthefirewallontheserverbecauseofamisconfiguration AttheusersPCbecauseofanerrorinthewayitissetuporeventhatthenetwork cablehasbeenknockedout

Theuserjustwantstoaccesstheapplication.Shedoesntcarethattheserverisuporthe networklinksarenotreportingaproblem.Asfarassheisconcerned,herapplicationis down.Ifoneuserreportsaproblem,youmightbeOK,butifalotofuserscannotaccessthe application,thedowntimecausesasignificantproblemfortheorganization.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Inremotelocationsandbranchoffices,theproblembecomesworse.Theremaynotbe anyonetohelpresolvethelocalpartsoftheproblem. CloudcomputingisarisingtrendinthesupplyofITservices.Theapplicationisprovided byathirdpartyhostedfromtheirdatacenter.Intheory,alltheuserneedsisanInternet connection.Thenumberoflinksinthechaingrowsveryrapidlyinthisscenariowiththe users,theprovider,andoneormoreISPsinvolved. Thefinaljudgeonavailabilityistheuser.Ifshecannotuseherapplicationtoperformher workingtasks,thenitisnotavailable.Itdoesnotmatterhowmanystatisticscanbe generatedprovingthatthispartorthatpartofthechainisavailableiftheusercannot accessandusetheapplication. ServiceLevelAgreements(SLAs)areusedtodefineanagreedlevelofservice.Thatcanbe equatedtoavailabilitywhenthinkingaboutapplications.SLAsareoftentalkedabout,but aretheyworththepapertheyarewrittenon?Unfortunately,aswithsomuchinIT,the onlypossibleanswerisitdepends.Itdependsonanumberoffactors: Isavailabilitybeingreportedfromtheusersperspective? WhatpenaltiesareinplaceforbreachingtheSLA,andaretheuserspreparedto invokethem? Dotheuserstrustthereports?

Inthesecases,abettermeasurementforavailabilityisbasedontheserviceavailability.

Availability=

AgreedServiceTimeDownTime AgreedServiceTime

100 *

Theseissuesarecompoundedbythefactthatdowntimeisnotjustcausedbyfailure.

Capacity
Everycomputersystemhasafinitecapacity.Thehardwarecapacitycanbemeasuredby: CPUcycles Networkbandwidth Memory Diskspace

ThisrelationshipisillustratedinFigure1.1.Ifanyofthesecomponentsreachcapacity(that is,theyarefullyconsumed),thesystemcanfail.Evenapproachingcapacitycanputenough strainontothesystemthatitappearstobeunavailablebecauseithasbecome unresponsive.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Figure1.1Hardwarecapacityconstraints. Thecorrectdesignofthesystemattheoutsetshouldalleviatetheseissues.Design sufficientcapacityintothesystemtomanagetheforeseeablecapacityrequirements. Specifythecorrectnumberandsizeofcomponents.Properplanningatthisstagecan preventmajorproblemsinthefuture. Capacityissuesarenotjustdesignissues.Youcandesignandimplementtheperfect system,butovertime,itscapacitywilldegrade.Thevariouscomponentsmustbe monitoredtoensurethatacapacityissuedoesnotmakeyoursystemunavailable.Trend analysiscanidentifywhenyouwillreachcapacityforagivencomponent.Thisdatagives youtimetoarrangeforadditionalcapacityorotherremedialworktoensureyoursystem remainsavailable. Whenwetalkaboutavailability,itisusuallyintermsoftheapplication.However,weneed toconsidertheapplication,orservice,andthedatatoobtainacompletepicture.

Servicevs.Data
Manydiscussionsaboutavailabilityconcentrateontheavailabilityoftheservice.Failover clusteringisatechnologydesignedtoprotecttheservicesrunningonthecluster.Itdoes nothingtoprotectthedata. Dataisimportant,asitisthedatathatdrivesthebusinessprocesses.Considerasystemto processpurchaseorders.Thedatawillinclude: CustomersWhoorderedsomething OrdersWhatthecustomersordered DeliveryinstructionsWhereandwhentheitemsaretobedelivered InvoicesHowmuchthecustomerhastopayandwhethertheyhavepaid

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Ifthisdataislostorcorrupted,thefinancialimpactontheorganizationcouldbehuge.The regulatoryrequirementssurroundingdata,especiallypersonaldata,arebecomingmore andmoreonerouswithstifffinancialpenaltiesfordatalossorevenincorrectdata. Askthebusinesswhattheywantyoutoprotectandtheimmediateanswerwillbe everything.Youcanapplytechnologiestoprotecttheserviceandyoucanapply technologiestoprotectthedata.Youshouldprotectboth.Theidealhighavailability solutionwouldprotectboth,besimpletosetupandadminister,andbecosteffective.In manycases,youhavetouseamixtureoftechniques.Oneproventechnologythatyouneed toconsideristhevenerablebackup.

DoYouStillNeedtoBackUp
Iwillanswerthisquestionandthenexplainwhy.Theanswerissimpleandconsistsofone word.YES. Backupswillbeafactoftheadministratorslifeforalongtimetocome.Highavailability doesnotnegatetheneedforabackup.ConsiderActiveDirectory(AD).Itcouldbeargued thatbecauseyouhavemultipledomaincontrollersinyourenvironmentthatallcontaina copyofthedata,youdontneedtobackup.Thatcanbetakenascorrectfordaytoday operationsbut,anditisahugebut,whatareyougoingtodoifyouifyourADdomainis wrecked? Acorruptionofthedatabaseononedomaincontrollerthenreplicatestoeverydomain controllerinthedomain.YourADhasgoneandsohasaccesstoeveryotheraspectofyour Windowsenvironment. Theonlywayyoucanrecoverfromthissituationisfromabackup.Aslightlylesstraumatic occurrenceisifasignificantnumberofobjectsaredeletedandyouneedtoperforman authoritativerestore.Youneedabackuptodothat. SingleDomainController Aquestionthatcomesuponvariousforumswithdistressingregularitytakes theformIhaveonedomaincontrollerinmyproductiondomainanddont dobackups.Themachinehasfailed.WhatcanIdo?Theonlyrealadvicethat canbegivenistostartapplyingforanewjob. Inthiscase,thetechnologyhasnotbeenusedproperly;thepeopledidnot havetheskillsandknowledgetounderstandthetechnology,andthecorrect processeswerenotapplied.Abusinesscriticaltechnologythatcanbe configuredinawaythatsupplieshighavailabilitywasallowedtofailbecause ofacompletebreakdownofallthreepartsofthehighavailability framework. Backupisthelastresortforrecovery.Insomecases,itmaybetheonlywayasystemcanbe recovered.Itmaynotbefinanciallyviabletohavestandbysystemsforalltheserversin yourorganization.Inthatcase,recoveryfrombackupistheonlyoption.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Theotherpurposeforbackupistoprovidealongtermarchivingsolution.Inmany businesssectors,thereareregulatoryrequirementsregardingdataretention.Backupsmay beonemethodofsatisfyingtheserequirements.Iknowoforganizationsthathavetokeep dataforaslongas100years.Backuptapeisnotgoingtosolvethatissue,butitshouldstill beviableforthe3to7yeartimespan. Backupswillbewithusforalongtimeandhavearecognizedplaceinhighavailabilityand disasterrecoverystrategies.Thosestrategiesmustbebasedonsolidbusiness requirements.Thenexttaskistolookathowyoucandiscoverthoserequirements.

BusinessRequirements
AprimereasonforthefailureofITrelatedprojectsisthattherequirementswerenot capturedcorrectly.Thisusuallycomesbacktothebusinessrequirementsthatis,whatis thebusinesstryingtoachieve?Highavailabilityanddisasterrecoveryprojectsare expensivepropositionswhereitisimperativethattherequirementsarecorrectly understood.

Ask,Ask,andAskAgain
Theonlywaytogetthiscorrectistoaskandkeepaskinguntilthefullsetofrequirements havebeencaptured.Thiswillinvolvetalkingtomultiplelevelsoftheorganization.The requirementsatvariousmanageriallevelsmaywellbedifferenttothoseexpressedby peopleactuallyusingthesystems.Theyallneedtobecapturedandanalyzed. Thinkcarefullyabouttheanswersyoureceive.Theywillallreflecttheactualneedbutmay notcapturethewholerequirement.Extrapolatefailurescenarios.Ifyouhavedevelopers, talktothemaboutusecasesthatdescribehowanapplicationwillbeused.Applythe concepttofailurestodetermineallthewaysthesystemcanfail,thenworkouthowtostop ithappeningorhowtorecoverfromthefailure. Whenthereisafailureofacriticalsystemthathasntbeenadequatelyprotected,thereis oftenakneejerkreactionthatsomethingmustbedone.Ifatallpossible,avoidrushing intoasolution.Followthecorrectthoughtprocess: Analyze Designandplan Implement

Throughoutthisprocess,ensurethatyou Definetheproblem Refinetheanswer Doublecheckyouaresolvingtherightproblem

Aboveallelse,ensurethatthebusinessprocessesareprotected.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

BusinessProcesses
Thereareanumberofdriversthatwillaffectallorganizations.Themaintwothataffectour discussionofhighavailabilityanddisasterrecoveryarefinancialandlegal. Allorganizationshavefinancialdrivers.Acommercialorganizationwillbeinterestedin boostingrevenuesandreducingcostssothatprofitsaremaximized.Nonprofitmaking organizationswillalsobelookingtoreducecostssothattheirdonationscanbeusedto maximumbenefit.Changesinthefinancialenvironmentforcetheintroductionorchangeof businessprocesses. Theregulatoryrequirementsorganizationsmustsatisfyformaconstantlyevolving landscape.Thesechangescanimpactbusinessprocessestodrivechangeinthe applicationsandinfrastructureanorganizationrequires. Therequirementsofthebusinesstosatisfytheirbusinessdriversleadustothehierarchy thatFigure1.2shows.Atthetopisthebusinessprocess.Thisiswhatthebusinessdoes howitgeneratesrevenueormakesefficienciestocutcosts.Inmanycases,thiswillrequire anapplication.Itmaybecreatedbytheorganizationorpurchasedfromasoftwarevendor. Theapplicationwillusuallyhavearequirementtostoredata.Thestoremaybeadatabase oranunstructuredstoresuchasthefilesystem. Theapplicationanddatabothrequireinfrastructureforsupport.Highavailabilityis concernedwithprotectingtheapplication,data,andinfrastructure.Businesscontinuity,of whichdisasterrecoveryisapart,isconcernedwithprotectingthewholehierarchy.

Figure1.2Relationshipbetweenbusinessprocessandinfrastructure.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

PeakPeriods
Everyorganizationhaspeakperiods;forinstance,retailorganizationshavepeaksat Christmasandfinancialservicesorganizationshavepeaksattheendofthefinancialyear. Theapplicationssupportingthebusinessprocessessufferanextralevelofloadatthese times.Thisloadcanoverpowertheapplications,causingdowntimeortheapplications becomesounresponsivethattheusersassumetheyaredown.Ineithercase,theusersdo nothaveaccesstotheirapplications. Therearereportsoforganizationssufferingdowntimeduringpeakperiodseveryyear. Whyshouldthishappen?Whyisitallowedtohappen?Peakperiodsarepredictable.They arepartofthebusinesscycleandneedtobetreatedassuch.Evenaneventthatfalls outsidethenormalbusinesscycle,suchasaspecialofferorthereleaseofnewsoftwarefor download,canbepredictedtocauseextrademand.Theplanningfortheseeventsshould includeextracomputingcapacitytohandletheload.NoticeIdonotsayexpectedload.In manycases,theloadispredictedbutthepredictionsprovetobewrong. Virtualizationcanhelpinthissituation.NeedtobringanothersetofWebserversonline?It isaquickandeasypropositiontocreateandstartupmorevirtualmachines.Theloadon theWebfarmshouldbemonitored,andwhenitreachesapredefinedlevel,one,ormore, extramachinesarebroughtonlinetospreadtheload. Plantopreventdowntimeduringpeakperiods.Asimilarsituationoccurswithcritical periods.

CriticalPeriods
Criticalperiodsarewheretheorganizationhastoperformataskwithatightlyconstrained timeframe.Ispentanumberofyearsworkinginafinancialservicescompany,andevery yearwehadtoproduceareportfortheregulatoryauthoritiesprovingourabilitytomeet ourliabilitiesincurredbythesaleoffinancialproducts.Ifthisreportwasnotdeliveredby thespecifiedtime,wecouldbestoppedfromsellingfurtherproducts.Thatisaboutas criticalasitgets. Thesereportingdeadlinesmaybelegislativeorimposedbyaparentcompany.Ineither case,thedeadlinesmustbemet.Thebusinessprocessesrequiredtomeetthesedeadlines mustremainavailableduringtheseperiods. Thecostofnothavinghighavailabilitycanbequitefearsomebutwhatisthecostof providingthatavailability?

TheCostofHighAvailability
ProvidingandrunningITsystemscreatesassociatedcosts.Themoresophisticatedthe systems,thehigherthosecostsrise.Highavailabilityaddssophisticationandthereforecost tothesystem,althoughthecostshavebeendroppinginrecentyearsasthetechnologyhas advanced.

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

TheCostofDoingNothing
Oneoptionisdoingnothing.Youcanassumethatfailureswillneverhappentoyouandthat ifyouhaveafailure,thesystemcanbebroughtbackonlineveryquickly.Thisassumesthat thepeople,processes,andtechnologyareinplacetobringasystembackonline. Isthatassumptioncorrect?Relyingonthisapproachcan,andwill,failforseveralreasons: Therightpeopleareunavailable.Whatareyougoingtodoiftheonlypersonthat reallyunderstandsthesystemisonvacationandnotreachable? Theprocessesareunavailable.Haveyoutriedtorestorethissysteminatest environment?Istherecoveryprocessdocumented? Thetechnologyisunavailable.Haveyoutestedyourbackups?Thisisnotagood timetodiscoveryoucannotreadthetapes.Doyouhaveaspareserverorcapacity onyourvirtualizationhosts?

Howmuchwillitcostyourorganizationwhilethatsystemisunavailable?Thatcostcould comefromanumberofsources: Lostorders Payingstafftodonothing Revenuecollectiondelays

Thatisjusttheshorttermfinancialcost.Whataboutthelongtermimpacttoyourbusiness reputationbecauseyoucouldnotdeliverasagreed? Challengetheseassumptions.Thecostoffailureishighandultimatelythepricemaybethe collapseoftheorganization.

DoesYourBusinessDependonIT?
TherearebusinessesthatobviouslydependonIT.Internetretailersareanobvious example.However,mostbusinessesrelyonITtoamuchgreaterextentthantheyrealize. Doyouuseemailatwork?ResearchinEuropehasshownthatmorethan60%ofperson topersoncommunicationisperformedbyemail.Thisisbothwithintheorganizationand outsidetheorganizationtocustomers,suppliers,regulators,andsoon. Whathappensifyouremailsystemfails?Itdoesnothavetobethewholesystemjustthe mailboxesforthosepeoplewhoneedtocommunicateexternally.Ifyoucannotreacha company,doyouassumethattheyhavegoneoutofbusinessorjustdontcareaboutdoing businesswithyou?AprolongedoutageleadstothesituationthatFigure1.3shows.

10

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Figure1.3:Impactofcommunicationfailure. Everyorganizationhasdata.Itmakestheorganizationwork.Figure1.2showedthe relationshipbetweendata,applications,andinfrastructure.Ifyoulosethedata,youlose theapplicationandthebusinessprocess.Thiscanleadtoalossofrevenue,whichisnota healthypositionfortheenterprise.Insomecases,thatrevenuemaynotberecoverable.If therecordofaninvoiceislost,thatrevenuemayneverbereceived. AWebpresenceisamustformostorganizations.Someorganizationsderivealloftheir businessfromtheInternet.IftheirWebpresencefails,competitorsarejustaclickaway. Brandloyaltymaybeadisappearingconcept,buttheconverseisaliveandwell.Imnot goingtouseXXXagainbecausetheirWebsiteisneveravailable. TheWebpresencedoesnothavetoprovidecommercialservices.Ifaninformationonly siteisnotavailable,itaffectsyourreputation.Areputationforunavailablesystemsisa definitewaytolosecustomers. Anorganizationasawholemaynotdependoncertainsystemsbutthosesystemscanstill becritical.Insomecases,thesesystemsmaybecriticaltopeopleslives: HealthcarehasspecificneedsPatientmonitoringsystemsneedtobeconstantly available,downtimeonXraystorageandprocessingsystemscanaffectpatient care,accesstopatientrecordscanberequiredona247basis ManufacturingorganizationshavecriticalsystemsChemicalplantmonitoringand control,printsystemcontrol,andautomatedmanufacturingsystems

Inaddition,youcanconsidersystemsasdiverseas Stockexchangetradingsystems Airtrafficcontrol Trafficmanagement Automatedrailwaysystems Dooraccessandothersecuritysystems

Inalltheseexamples,youneedyourapplicationsanddatatobeavailable.Thecostofitnot beingavailablecanscalefromthepurelyfinancialtothelossofhumanlife.Organizations needsystemstobeavailable,butdotheyneedalltheirsystemsallthetime?

11

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

DegreesofHighAvailability
Talktothebusiness,andtheywilltellyouthatalltheirapplicationsarebusinesscritical andhavetobeavailable247.Inmanycases,thisissimplynottrue.Many,ifnotmost, organizationshaveanumberofapplicationsforwhichtheyrequirehighavailability,butit israrethatanorganizationrequireseverythingtofallintothiscategory. Thereareseveralactionsyoucanperformtoboosttheavailabilityofyoursystemsevenif youarenotsupplyingfullhighavailability.Startingwiththehardwareyouuseforyour servers,youcanincreasetheavailabilitybyensuringyoubuildresiliencyintotheserverby specifying: Faulttolerantmemory Redundantpowersuppliesandfans DiskresiliencybyusingRAIDoraSAN Multiplediskcontrollers Multiplenetworkcardsconfiguredforfaulttolerance

Youcanensurethatoperatingsystems(OSs)areinstalledandconfiguredtobestpractice. Usinganautomatedbuildsystemensuresarepeatableandconsistentconfiguration. Applicationsshouldbeinstalledandconfiguredcorrectly. Basicgoodpracticewilldeliverasystemthatwillperformwellandprovideareasonable levelofavailability.Howcanyoumeasurethatavailability?Onemeasurethatisoftenused isthepercentageoftimeavailableasTable1.1shows.Thiscanalsobeexpressedasa numberofnines(forexample,fourninesmeansasystemis99.99percentavailable). %Availability 90.0 95.0 99.0 99.9 99.95 99.99 99.999 Downtimeperyear 36.5days 18.25days 3.65days 8.76hours 4.38hours 52.6minutes 5.26minutes Table1.1:Percentageoftimeavailable. Theonethingthatreallyleapsoutofthistableishowlittledowntimeisallowedatthe higherendofavailability.Fiveandaquarterminutesishardlyenoughtimetoreboota server!

12

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Thesefiguresshouldapplytounplanneddowntime.AwellwrittenSLAshouldinclude maintenancewindows,otherwisethesystemwillbecomeunstable.Ifyouhavean availabilitytargetof99.99%thatallows52.6minutesdowntimeperyear.Canyouperform themonthlypatchingcycleinthattime?Notifyouhavetorebootonceamonth.Applying servicepacksisanevenbiggeroutage,especiallyonapplicationssuchasSQLServeror Exchange.Arrangethedowntimetoensurethatthesystemismaintained.Makesurethat planneddowntimeiscommunicatedsothattheuserpopulationisawareofwhatis happeningandwhen.

PayforWhatYouGet
Highavailabilityimpliesqualityinanumberofareas.Thisisespeciallytrueinthequality oftheequipmentthatispurchased.Ionceworkedinanorganizationwherethenetwork equipmentwassooldthatournetworkadministratorswerescouringcomputerfairsfor second(thirdorfourth)handpartstokeepitworking.Didwehaveahighavailability environment?Inaword,No! Buyfromrecognizedvendorswhoareknowntobuildgoodservers,networkswitches,or whateveritisyouneed.SomehighavailabilityoptionssuchasclusteringinWindows Server2003requirethathardwarecomesfromarestrictedlistwherethecombinationsof serverandstoragehavebeentestedtogether. Makesurethatthequalityaspectsextendtothesmallpartssuchascableswhenspecifying, andbuying,equipment.Spendingthousandsofdollarsonserversandstorageonlytosee thesystemfailbecauseanattemptwasmadetosaveafewdollarsoncablingdoesntmake sense.Oneareatoavoidskimpingonqualityisbackupmedia.Imentionedearlierthata goodbackupisyourultimaterecoveryoption.Thisoptionwillnotbereliablewithpoor qualitymediathatcannotberelieduponwhenattemptingtorestore. Havingconsideredwhyyouneedhighavailability,youmustdeterminehowyouare actuallygoingtodeliverit.

DeliveringHighAvailability
ThistopicisconsideredinmoredetailinChapter3.Atthisstage,weneedanoverviewof theoptionsbeforeweturnourdiscussiontodisasterrecovery.

ConsiderWholeInfrastructure
Highavailabilityisnotaboltonpieceoftechnology.Youcannottakeyourexisting infrastructure,addaclustertosupportanapplication,andclaimthatyourenvironment provideshighavailability.Whatyouhavecreatedisaclusteredapplication.Ifyouwant truehighavailability,youneedtoconsiderthewholeenvironment:network,applications, data,infrastructure,andservers. Youneedtodesignhighavailabilityintotheenvironment.Asanexample,considerthe configurationthatFigure1.4showsforaWebbasedapplication.TheInternetbaseduser connectsthroughafirewalltotheWebserverintheDMZ.Aconnectionthroughthe internalfirewallenablestheWebservertotalktothedatabaseserver.

13

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Figure1.4:Standardinfrastructure.Networklinksremovedforclarity. Thisconfigurationcontainsanumberofsinglepointsoffailure:Internetlinks,firewalls, networkroutersandswitches,andadatabaseserver.Ifyouremoveallthesinglepointsof failure,youendupwithaninfrastructurethatlookslikeFigure1.5.

Figure1.5:Environmentdesignedforhighavailability.Networklinksremovedfor clarity.

14

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Thereisredundancyatalllevelsoftheenvironmentincluding: Dualdatacenters DualISPlinks Redundantfirewalls Redundantnetworkroutesandequipment Clustersandapplicationbasedhighavailabilitytechniquestoprotectthe applicationandthedata

Creatinganenvironmentconfiguredinthiswaywillcostalotofmoney.Itisalsoeasierto implementinagreenfieldsite.However,ifanapplicationisimportantenoughtothe business,anexistinginfrastructurecanbeconvertedtothislevelofhighavailability.Thisis notthetypeofprojecttowhichabigbangapproachshouldbeapplied.Approachitastep atatimeandtesteachchangethoroughly. Whatsortoftechnologiescanyouusetomeetyourhighavailabilityneeds?Wewilllookat theseareasingreaterdetailinChapter3,butaquickoverviewisinorderbeforewemove ontodisasterrecovery.

NativeSolutions
ThetwohighavailabilitysolutionsthatarenativetoWindowsareClusteringandNetwork LoadBalancing(NLB).ClusteringprovideshighavailabilityonlywhileNLBalsoprovidesa methodofbalancingtheworkloadacrossmultiplemachines.TheobviousquestionisWhy cantweuseNLBallthetimeandforgetclustering?NLBonlybalancesworkloadsthatare TCPbased,so,forinstance,youcanbalanceHTTPtraffictoaWebfarmbutyoucannot balancetraffictoadatabaseonSQLServer. Clusteringinvolvestwoormoremachinesconfiguredwithaccesstosharedstorage.A workloadsuchasSQLServer,Exchange,orevenfileandprintisinstalledonthecluster. Theworkloadishostedononenodeoftheclusterandintheeventofafailure,itwillfail overtothealternativenode. Mostorganizationsconfiguretwonodeclusters.However,thisisanexpensiveoption,as oneofthenodesisalwaysinapassivemodewaitingforthefailovertooccur.Abetter optionistousemultinodeclusterswithonlyoneortwonodesavailableforfailover.One systemIworkedonimplementedfournodeclustersforExchange.Threewereactivewith thefourthbeingpassive.Weimplementedtwoofthose.Insteadof12serversfora traditionalsetoftwonodeclusters,weonlyusedeight.Thatsasignificantcostsavingin hardware,licenses,andadministration. NLBisoftenusedforWebbasedapplications.Itmakesanumberofserverslooklikeoneas theNLBclusterhasasingleIPaddresstowhichtrafficisaddressed.Theclusterthen managesthedistributionofworkloadacrossitsmembers.Ifoneserverfails,theotherscan redistributethework.Thesesystemsprotecttheservicebutdontnecessarilyprotectthe data.

15

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

DataBasedSolutions
Dataisessentialtobusinessprocesses,aswesawearlier.Thereisnopointinensuringthat yourapplicationsareavailableiftheunderlyingdataisunavailable.Thebasicmethodsto protectthedatainvolvesometypeofreplicationormirroring.Youcanreplicatethedata betweenstoragesystems.Thiscanbeperformedby: Thestoragesystem,usuallyatthediskblocklevel Theapplication,usuallyatthedatabaselevel

Storagebasedreplicationwillproduceasecondcopyofthedataonanothersetofdisks. Thereplicationtargetisusuallyinanotherdatacentersothathighavailabilityanddisaster recoveryneedscanbesatisfiedwithasinglesolution.Thesetechniquescanaddsignificant extracosttothestoragesolution.Ifstoragebasedreplicationisusedtoprotectdatabases, testthetransactionconsistencytoensurethatthedatadoesntendupbeingcorruptedif failoveroccursduringatransaction. Itispossibletoreplicatedataviathedatabase.Thisinvolvesconfiguringasystemtocopy thedataatatransaction,orevendatabase,level.Thetargetdatabasemaynotbeavailable whilereplicationisoccurring.Failoverisnotautomatic,butitisasimpleprocess.Make surethefailoverispracticedbeforeitisneeded. Mirroringissimilarinthatthereisacopyofthedataonanothersystem.Themirroring processcanbeconfiguredtowritetobothcopiesofthedatasimultaneously,thoughthis mayaffectthetimetakentocompleteatransaction.Failoverbetweenthesystemscanbe automaticwithclientapplicationsbeingtransparenttothechange.Someapplications supplytheirownlevelofhighavailability.

ApplicationLevelHighAvailability
Someapplications,suchasADandDNS,haveaninherentlevelofhighavailabilitydueto theirdistributednature.Installingatleasttwodomaincontrollersateachlocationprotects theauthentication,authorization,andnameresolutionservices.Ifoneoftheonsite domaincontrollersshouldfail,theotherwillabsorbtheload.Theserviceswillstillbe availableevenifbothfailbecauseofthewayADwilldirectuserstootherdomain controllers.Dataisautomaticallyprotectedbecauseitisreplicatedbetweendomain controllers. Theultimateinprotectingtheapplicationanddataistototallysynchronizetwosystems. Thistechniqueinvolvestwocomputers,theapplicationsanddata,beingsynchronizedsuch thattheyarekeptinlockstep.Themachinesarepresentedtotheclientsasasinglesystem andallchangesoccuronboth.Intheeventofonesystemfailing,theotherstillservesthe applicationstotheclients.Itispossibleforthetwosystemstobeinseparatedatacenters tosupplyadisasterrecoverysolutionaswellashighavailability.

16

The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

Monitoring
Wehavequicklysurveyedanumberoftechniquesforsupplyinghighavailabilityforyour systems.Whichevertechniquesareapplied,itisnotpossibletoimplementthesolutionand forgetaboutit.Systemshavetobemonitored.Itisessentialthatfaultsarenotedand rectified.Consideratwonodeclusterasanexample.Ifonenodefails,theapplicationwill failoverontothesecondnode.However,youhavenowlostyourhighavailability.Ifthe systemsarentmonitored,youmightnotdiscoverthefailureuntilthesecondnodefails. Thatisanembarrassmentyoucandefinitelylivewithout. Theseconduseofmonitoringistodeterminecapacitytrends.Disksfillup,memory becomesabottleneck,andCPUsbecomeoverloaded.Allofthesescenariosleadtoa reductioninresponsetimeandunhappyusersandgivethesystemareputationfor unavailability.Monitorthetrendsinresourceusagesothatextraresourcescanbemade availableorupgradescanoccurbeforethesituationbecomescritical.

Summary:MakingHighAvailabilityWorkforYou
Creatingenvironmentswiththecorrecthighavailabilityishardworkandcanbe expensive.Thereisafourstepprocesstocreatingandmaintainingahighavailability environment: Identifyneeds.Makesuretheyarebasedonprotectingbusinessprocesses.Thedata andtheserviceneedtobeprotected. Designasolution.Wehavebrieflycoveredthetechniquesavailabletoyou.Choose themostappropriateonetomeetyourorganizationsneeds.Dontforgetthat virtualizedserversneedprotectingaswell. Implementthesolution.Thisincludestrainingforthepeopleadministeringthe systemandtheadoptionofprocessesthataugmentthehighavailability technologies. Monitorthesystems.Earlydetectionoffailuresorpotentialcapacityissuesmeans theycanbedealtwithbeforethesystembecomesunavailable.

Highavailabilityisnotaluxuryinthemodernbusinessworld.Itisabusinessimperative andthetimetostartisnow. Havinggainedaninsightintohighavailability,itisnowtimetolookatdisasterrecovery. Thenextchapterwillconcentrateonuntanglingthedifferencesbetweenhighavailability anddisasterrecoveryaswellasspendtimeconsideringwheretheactivitiesare converging.Insomecases,yourhighavailabilitysolutionalsosuppliesadisasterrecovery capabilitythatitwouldbefoolishtoignore.

17