Data Guard - Architecture

5/8/2015 DataGuardArchitecture
Architecture
Dataisone,ifnotthemainassetofacompany,protectionofdataisvitaltoaDBA,thisincludesbackingupandrestoringinatimelymanor.Iamnotgoingtoliebut
costdoesplayabigpartindecidingifyoucanimplementaparticularsolution,OracleDataGuardmaybeoverkillformostcompanies,havingauptotheminute
solutionisnotgoingcomecheap,manysmallercompaniesIhaveworkedforwerequitehappyahavingasimplecopytheredologstoastandalonemachinethen
scriptingasolutiontoapplythoselogs,maybenotthegreatestsolutionbutonethatworks.Thisdoeshaveitsdrawbacksasitisabespokesolutionandgenerallynot
easytomanageorhandovertootherDBA's.Oracle'sDataGuardisawelltestandsupportedsolutionbutitdoesnotcomecheapbutitisasolutionthatispracticallythe
sameacrossallcompaniesandonethatallDBA'sshouldbeabletograspveryquicklywhenstartingatanewcompany.
InthisseriesIcoverthearchitecture,installationandconfigurationofOracleDataGuard11g,Iwillalsocoverperformancetuningandhowtosetupasolutionto
guarantee'szerodataloss.Ialsocoverhowtousestandbydatabasesthatarenotused,thusgettinghemostoutofyoursystems.IwillalsocoverthenewEnterprise
ManagerandhowitintegratesintoDataGuard,moreandmoreadministrationapplicationsareusingGUIinterfaces,thereisnogettingawayfromit,myapproachisto
learnbothasaGUIalwayswillrunthecommandlineoptioninthebackgroundandagoodunderstandingofthecommandlinewillhelpyoutrackdownandresolve
difficultproblems.
Myownpersonaladviseishavelotsofpracticeatdoingsomething,themorepracticethebetteryouwillbecomeatdoingitanditwillbecomesecondnature,intoday's
virtualworldtherearemanyvirtualsolutions(VMware,HyperV,etc)thatyoucansetupandpracticewhateveryouwant,evensomeofthesevirtualsolutionsarefree
(VMwareESXi)allyouneedisaspareserverorPC(seemyvirtualsolution),someadditionalRAManddiskspace.IcansetupasimilarsolutiontoonethatIhaveata
companiessiteandtestandlearnhowthatsolutionworksandgetafeelonwhatIneedtodoifIhavetofailover,Icanpracticeanddocumenttheproducesbeforeit
happens,insomecasesIhavebeenwithacompanyandneverhadtofailoverbutIamalwaysgladthatIdidpracticeandhavethedocumentionreadytohand.
Okthat'senoughofmeramblingon,letsmakeastartonlearningOracleDataGuard.
DataGuardOverview
DataGuardisbasicallyashipredoandthenapplyredo,asyouknowredoistheinformationneededtorecoveradatabasetransaction.Aproductiondatabasereferredto
asaprimarydatabasetransmitsredotooneormoreindependentreplicasreferredtoasstandbydatabases.Standbydatabasesareinacontinuousstateofrecovery,
validatingandapplyingredotomaintainsynchronizationwiththeprimarydatabase.Astandbydatabasewillalsoautomaticallyresynchronizeifitbecomestemporary
disconnectedtotheprimaryduetopoweroutages,networkproblems,etc.
ThediagrambelowshowstheoverviewofDataGuard,firstlytheredotransportservicestransmitsredodatafromtheprimarytothestandbyasitisgenerated,secondly
servicesapplytheredodataandupdatethestandbydatabasefiles,thirdlyindependentlyofDataGuardthedatabasewriterprocessupdatestheprimarydatabasefilesand
lastlyDataGuardwillautomaticallyresynchronizethestandbydatabasefollowingpowerornetworkoutagesusingredodatathathasbeenarchivedattheprimary.
IfyouhavenotreadmyOracleseriesregardingredohereisarecap,aredorecord(alsoknowasaredoentry)ismadeupofagroupofchangevectors,eachofwhichisa
descriptionofachangemadetoasingleblockinthedatabase.Redorecordscontainalltheinformationneededtoreconstructchangesmadetoadatabase.During
recoverythedatabasewillreadthechangevectorsintheredorecordsandapplythechangestotherelevantblocks.
RedorecordsarebufferedinacircularfashionintheredologbufferoftheSGA,thelogwriterprocess(LGWR)isthebackgroundprocessthathandlesredologbuffer
management.TheLGWRatspecifictimeswritesredologentriesintoasequentialfile(onlineredologfile)tofreespaceinthebuffer,theLGWRwritesthefollowing
AcommitrecordWheneveratransactioniscommittedtheLGWRwritesthetransactionredorecordsfromthebuffertothelogfileandassignsasystemchange
number(SCN),onlywhenthisprocessiscompleteisthetransactionsaidtobecommitted.
Redologbuffersiftheredologbecomesathirdfullorif3secondshavepassedsinethelasttimetheLGWRwrotetothelogfile,allredoentriesinthebuffer
willbewrittentothelogfile.Thismeansthatredorecordscanbewrittentothelogfilebeforethetransactionhasbeencommittedandifnecessarymediarecovery
willrollbackthesechangesusingundothatisalsopartoftheredoentry.
RememberthattheLGWRcanwritetothelogfileusing"group"commits,basicallyentirelistofredoentriesofwaitingtransactions(notyetcommitted)canbewritten
todiskinoneoperation,thusreducingI/O.Eventhroughthedatabuffercachehasnotbeenwrittentodisk,Oracleguaranteesthatnotransactionwillbelostduetothe
redologhavingsuccessfullysavedanychanges.
DataGuardRedoTransportServicescoordinatethetransmissionofredofromtheprimarydatabasetothestandbydatabase,atthesametimetheLGWRisprocessing
redo,aseparateDataGuardprocesscalledtheLogNetworkServer(LNS)isreadingfromtheredobufferintheSGAandpassesredotoOracleNetServicesfrom
transmissiontoastandbydatabase,itispossibletodirecttheredodatatoninestandbydatabases,youcanalsouseOracleRACandtheydon'tallneedtobeaRACsetup.
TheprocessRemoteFileServer(RFS)receivestheredofromLNSandwritesittoasequentialfilecalledastandbyredologfile(SRL),theLNSprocesssupporttwo
modessynchronousandasynchronous.
Synchronous
http://www.datadisk.co.uk/html_docs/oracle_dg/architecture.htm 1/4
Synchronous transport (SYNC) is also referred to as "zero data loss" method because the
LGWR is not allowed to acknowledge a commit has succeeded until the LNS can confirm
that the redo needed to recover the transaction has been written at the standby site.
In the diagram to the right the phases of a transaction are
1. The user commits a transaction creating a redo record in the SGA, the LGWR reads
the redo record from the log buffer and writes it to the online redo log file and
waits for confirmation from the LNS
2. The LNS reads the same redo record from the buffer and transmits it to the
standby database using Oracle Net Services, the RFS receives the redo at the
standby database and writes it to the SRL
3. When the RFS receives a write complete from the disk, it transmits an
acknowledgment back to the LNS process on the primary database which in turns
notifies the LGWR that the transmission is complete, the LGWR then sends a
commit acknowledgment to the user
This setup really does depend on network performance and can have a dramatic impact
on the primary databases, low latency on the network will have a big impact on response
times. The impact can be seen in the wait event "LNS wait on SENDREQ" found in the
v$system_event dynamic performance view.
There is also a timeout value that can be adjusted in the event of a network failure, we
will discuss this in more detail in the installation section
Asynchronous
Asynchronous transport (ASYNC) is different from SYNC in that it eliminates the

requirement that the LGWR waits for a acknowledgment from the LNS, creating a "near
zero" performance on the primary database regardless of distance between the primary
and the standby locations. The LGWR will continue to acknowledge commit success even
if the bandwidth prevents the redo of previous transaction from being sent to the
standby database immediately. If the LNS is unable to keep pace and the log buffer is
recycled before the redo is sent to the standby, the LNS automatically transitions to
reading and sending from the log file instead of the log buffer in the SGA. Once the LNS
has caught up it then switches back to reading directly from the buffer in the SGA.
The log buffer ratio is tracked via the view X$LOGBUF_READHIST a low hit ratio indicates
that the LNS is reading from the log file instead of the log buffer, if this happens try
increasing the log buffer size.
The drawback with ASYNC is the increased potential for data loss, if a failure destroys the
primary database before the transport lag is reduced to zero, any committed transactions
that are part of the transport lag are lost. So again make sure that the network
bandwidth is adequate and that you get the lowest latency possible.
OraclerecentlyreleasedAdvancedCompressionoptionthisnewproductcontainsseveralcompressionfeatures,oneofwhichisredotransportcompressionforData
Guard,itsupportsbothSYNCandASYNC.LikeallcompressiontoolsitdoeshaveaimpactonCPUresourcesbutitwilllowernetworkbandwidthutilization.
AlogfilegapoccurswheneveraprimarydatabasecontinuestocommittransactionswhiletheLNSprocesshasceasedtransmittingredotothestandbydatabase(network
issues).Theprimarydatabasecontinueswritingtothecurrentlogfile,fillsit,andthenswitchestoanewlogfile,thenarchivingkicksinandarchivesthefile,beforeyou
knowitthereareanumberofarchiveandlogfilesthatneedtobeprocessedbythetheLNSbasicallycreatingalargelogfilegap.DataGuardusesanARCHprocesson
theprimarydatabasetocontinuouslypingthestandbydatabaseduringtheoutage,whenthestandbydatabaseeventuallycomesback,theARCHprocessqueriesthe
standbycontrolfile(viatheRFSprocess)todeterminethelastcompletelogfilethatthestandbyreceivedfromtheprimary.TheARCHprocesswillthentransmitthe
missingfilestothestandbydatabaseusingadditionalARCHprocesses,attheverynextlogswitchtheLNSwillattemptandsucceedinmakingaconnectiontothe
standbydatabaseandwillbegintransmittingthecurrentredowhiletheACHprocessesresolvethegapinthebackground.Oncethestandbyapplyprocessisabletocatch
uptohecurrentredologstheapplyprocessautomaticallytransitionsoutofreadingthearchiveredologsandintoreadingthecurrentSRL.Thewholeprocesscanbe
seeninthediagrambelow
ApplyServices
Therearetwomethodsinwhichtoapplyredo,RedoApply(physicalstandby)andSQLApply(logicalstandby).Theybothhavethesamecommonfeatures:
Bothsynchronizetheprimarydatabase
Bothcanpreventmodificationstothedata
Bothprovideahighdegreeofisolationbetweentheprimaryandthestandbydatabase
Bothcanquicktransitionthestandbydatabaseintotheprimarydatabase
Bothofferaproductiveuseofthestandbydatabasewhichwillhavenoimpactontheprimarydatabase
Redo apply is basically a blockbyblock physical replica of the primary database, redo apply uses media recovery to read records
from the SRL into memory and apply change vectors directly to the standby database. Media recovery does parallel recovery for
very high performance, it comprises a media recovery coordinator (MRP0) and multiple parallel apply processes(PR0?). The
coordinator manages the recovery session, merges the redo by SCN from multiple instances (if in a RAC environment) and parses
redo into change mappings partitioned by the apply process. The apply processes read data blocks, assemble redo changes from
mappings and then apply redo changes to the data blocks.
This method allows you to be able to use the standby database in a readonly fashion, Active Data Guard solves the read
consistency problem in previous releases by the use of a "query" SCN. The media recovery process on the standby database
advances the query SCN after all dependant changes in a transaction have been fully applied. The query SCN is exposed to the
user via the current_scn column of the v$database view. Readonly use will only be able to see data up to the query SCN and thus
the standby database can be open in readonly mode while the media recovery is active, which make this an ideal reporting
database.
Redo Apply
(physical standby)
You can use SYNC or ASYNC and is isolated from I/O physical corruptions, corruptiondetections checks occur at the following key
interfaces:
On the primary during redo transport LGWR, LNS, ARCH use the DB_UTRA_SAFE parameter
On the standby during redo apply RFS, ARCH, MRP, DBWR use the DB_BLOCK_CHECKSUM and DB_LOST_WRITE_PROTECT
parameters
If Data Guard detects any corruption it will automatically fetch new copies of the data from the primary using gap resolution
process in the hope of that the original data is free of corruption.
The key features of this solution are
Complete application and data transparency no data type or other restrictions

Very high performance, least managed complexity and fewest moving parts
Endtoend validation before applying, including corruptions due to lost writes
Able to be utilized for uptodate readonly queries and reporting while providing DR
Able to execute rolling database upgrades beginning with Oracle Database 11g
SQL apply uses the logical standby process (LSP) to coordinate the apply of changes to the standby database. SQL apply requires
more processing than redo apply, the processes that make up SQL apply, read the SRL and "mine" the redo by converting it to
logical change records and then building SQL transactions and applying SQL to the standby database and because there are more
moving parts it requires more CPU, memory and I/O then redo apply
SQL Apply
(Logical Standby)
SQL apply does not support all data types, such as XML in object relational format and Oracle supplied types such as Oracle spatial,
Oracle intermedia and Oracle text.
The benefits to SQL apply is that the database is open to readwrite while apply is active, while you can not make any changes to
the replica data you can insert, modify and delete data from local tables and schemas that have been added to the database, you
can even create materialized views and local indexes. This makes it ideal for reporting tools, etc to be used.
The key features of this solution are
A standby database that is opened for readwrite while SQL apply is active
A guard setting that prevents the modification of data that is being maintained by the SQL apply
Able to execute rolling database upgrades beginning with Oracle Database 11g using the KEEP IDENTITY clause
Notethatifyouhavemultiplestandbydatabasesyoucouldusebothsolutions.
ProtectionModes
Thereareanumberofprotectionmodesavailableanddependingonwhatyouwanttoachieve,itsevenpossibletohangyourprimarydatabaseifthestandbydatabaseis
uncontactableandyouwantmaximumprotection.DataGuardprotectionmodesimplementrulesthatgovernhowtheconfigurationwillrespondtofailures,enablingyou
toachieveyourspecificobjectivesfordataprotection,availabilityandperformance.DataGuardcansupportmultiplestandbydatabasesinasingleconfiguration,they
mayormaynothavethesameprotectionmodesettingsdependingonyourrequirements.
Theprotectionmodesare
This mode requires ASYNC redo transport so that the LGWR process never waits for acknowledgment from the standby database,
also note that Oracle no longer recommends the ARCH transport method in previous releases is used for maximum performance.
Maximum Performance Note that you will probably lose data if the primary fails and full synchronization has not occurred, the amount of data loss is
dependant on how far behind the standby is is processing the redo.
This is the default mode.
Its first priority is to be available its second priority is zero loss protection, thus it requires the SYNC redo transport. In the event
that the standby server is unavailable the primary will wait the specified time in the NET_TIMEOUT parameter before giving up on
the standby server and allowing the primary to continue to process. Once the connection has been reestablished the primary will
automatically resynchronize the standby database.
When the NET_TIMEOUT expires the LGWR process disconnects from the LNS process, acknowledges the commit and proceeds
without the standby, processing continues until the current ORL is complete and the LGWR cycles into a new ORL, a new LNS
Maximum Availability process is started and an attempt to connect to the standby server is made, if it succeeds the new ORL is sent as normal, if not
then LGWR disconnects again until the next log switch, the whole process keeps repeating at every log switch, hopefully the
standby database will become available at some point in time. Also in the background if you remember if any archive logs have
been created during this time the ARCH process will continually ping the standby database waiting until it come online.
You might have noticed there is a potential loss of data if the primary goes down and the standby database has also been down for
a period of time and here has been no resynchronization, this is similar to Maximum Performance but you do give the standby
server a chance to respond using the timeout.
The priority for this mode is data protection, even to the point that it will affect the primary database. This mode uses the SYNC
redo transport and the primary will not issue a commit acknowledgment to the application unless it receives an acknowledgment
Maximum Protection from at least one standby database, basically the primary will stall and eventually abort preventing any unprotected commits from
occurring. This guarantees complete data protection, in this setup it is advised to have two separate standby databases at different
locations with no Single Point Of Failures (SPOF's), they should not use the same network infrastructure as this would be a SPOF.
SwitchoverandFailover
DataGuardusestwotermswhencuttingoverthestandbyserver,switchoverwhichisaplannedandfailoverwhichaunplannedevent
Switchover is a planned event, it is ideal when you might want to upgrade the primary database or change the storage/hardware
configuration (add memory, cpu networking), you may even want to upgrade the configuration to Oracle RAC.
What happens during a switchover is the following
1. Notifies the primary database that a switchover is about to occur

2. Disconnect all users from the primary database
3. Generate a special redo record that signals the End of Redo (EOR)
4. Converts the primary database into a standby database
5. Once the standby database applies the final EOR record, guaranteeing that no data loss has been lost, converts the standby
switchover
database into the primary database.
The the new standby database (old primary) starts to receive the redo records and continues process until we switch back again. It
is important to remember that both databases receive the EOR record so both databases know the next redo that will be received.
Although you can have users still connecting to the primary database while the switchover occurs (which generally takes about 60
seconds) I personal have a small outage just to be on the safe side and just in case things don't go as smoothly as I hoped.
You can even switch over form a linux database to a windows database from a 64 bit to a 32 bit database which is great if you want
to migrate to a different O/S of 32/64 bit architecture, also your rollback option is very easy simply switchback if it did not work.
failover is a unplanned event, this is were the EOR was never written by the primary database, the standby database process what
redo it has then waits, data loss now depends on the protection mode in affect (see above for protection modes).
Maximum Performance possible chance of data loss

Maximum Availability possible chance of data loss
Maximum Protection no data loss
You have the option to manually failover or make the whole process automatic, manual gives you the DBA maximum control over
the whole process obliviously the the length time of the outage depends on getting the DBA out of bed and failing over. Otherwise
Oracle Data Guard FastStart Failover feature can automatically detect a problem and failover automatically for you. The failover
process should take between 15 to 25 seconds.
failover
(manual or automatic)
Onepointtomentionisregardingasplitbrainscenario,wheretheprimaryandstandbyboththinkthattheyaretheprimarydatabase,withDataGuardFastStartfailover
afailedprimarycannotopenwithoutfirstreceivingpermissionfromtheDataGuardobserverprocess.Theobserverwillknowthatafailoverhasoccurredandwillrefuse
toallowtheoriginalprimarytoopen.Theobserverwillautomaticallyreinstatethefailedprimaryasastandbyforthenewprimarydatabasemakingitimpossibletohave
asplitbraincondition.
DataGuardManagement
YouhavethreeoptionsonwhichtomanageDataGuard
SQL*Plus
DataGuardBrokerdistributedmanagementtoolthatcentralizesmanagement,usesDGMGRLcommandline.
EnterpriseManagerprovidesaGUItotheDataBrokerreplacingDGMGRL
IordertousetheEnterprisemanageryoumusthaveaDatabrokerinstalled,thebrokermaintainstheconfigurationfilesthatincludesprofilesforalldatabases.Change
canbepropagatedtoalldatabaseswithintheconfiguration,thebrokeralsoincludescommandstostartanobserver,theprocessthatmonitorsthestatusofaDataGuard
configurationandexecutesanautomaticfailover.YoumightbethinkthattheDataGuardbrokerisasinglepointoffailure,whichisincorrect,brokerprocessesare
backgroundprocessesthatexistoneachdatabaseintheconfigurationandcommunicatewitheachother.ifthesystemonwhichyoareattachedfails,yousimpleattachto
anotherdatabasewithintheconfigurationandresumemanagementfromthere.

Data Guard - Architecture

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Guard - Architecture

Загружено:

Авторское право:

Доступные форматы

5/8/2015 DataGuardArchitecture

In the diagram to the right the phases of a transaction are

Asynchronous transport (ASYNC) is different from SYNC in that it eliminates the

The key features of this solution are

Complete application and data transparency no data type or other restrictions

The key features of this solution are

This is the default mode.

What happens during a switchover is the following

1. Notifies the primary database that a switchover is about to occur

Maximum Performance possible chance of data loss

Вам также может понравиться