Вы находитесь на странице: 1из 13

RAC configuration with points of failure:

The below given figure illustrates the various areas of the system (operating system, hardware and
database) that could fail. The various failure scenarios in a two-node configuration, as illustrated in the
below given figure are:
1. Interconnect failure
. !ode failure
". Instance failure
#. $edia failure
%. &'()&*' failure
Instance recovery is complete when +racle has performed the following steps:
1. ,eplaying the online redo log files of the failed instance, called cache recovery.
. ,olling bac- all uncommitted transactions of the failed instance, called transaction recovery.
How does Oracle know that recovery is required for a given data file:
1. Start SC:
.hen a database chec-points, an '*! (called the chec-point '*!) is written to the data file
headers.
!. Stop SC:
There is also an '*! value in the control file for every data file, which is called the stop '*!. The
stop '*! is set to infinity while the database is open and running.
". Checkpoint counter:
There is another data structure called the chec-point counter in each data file header
and also in the control file for each data file entry.
The chec-point counter increments every time a chec-point happens on a data file and
the start '*! value is updated.
.hen a data file is in hot bac-up mode, the chec-point information in the file header is
fro/en but the chec-point counter still gets updated.
#f the start SC of a specific data file does not $atch the stop SC value in the
control file% then at least a crash recovery is required. &his can happen when
the data'ase is shut down with the SH(&)O* A+OR& state$ent or if the
instance crashes.
Oracle perfor$s the second check on the data files 'y checking the checkpoint
counters. #f the checkpoint counter check fails% then Oracle knows that the
data file has 'een replaced with a 'ackup copy ,while the instance was down-
and therefore% $edia recovery is required.
0n +racle-provided *$ is used and the heartbeat mechanism is wrapped with another
process called the watchdog process provided by +racle to give this functionality.
The watchdog process is only present in environments where +racle has provided the *$
layer of the product, for e1ample 2inu1.
Heartbeat interval parameter is normally specified in seconds.
'etting a very small value could cause some performance problems.
Though the overhead of this process running is really insignificant, on very busy systems,
fre3uent running of this process could turn out to be e1pensive.
'etting this parameter to an ideal value is important and is achieved by constant monitoring of
the activities on the system and the amount of overhead this particular process is causing.
The heartbeat timeout interval, li-e the heartbeat interval parameter, should not be set
low.
4nli-e the heartbeat interval parameter, in the case of the timeout interval it is not a
performance concern5 rather, a potential to cause false failure detections because the cluster
might inversely determine that a node is failing due to transient failures if the timeout interval
is set too low.
The false failure detections can occur on busy systems, where the node is processing tas-s
that are highly *64 intensive.
.hile the system should reserve a percentage of its resources for these -inds of activities,
occasionally when systems are high on resources with high *64 utili/ation, the response to
the heartbeat function could be delayed and hence, if the heartbeat timeout is set very low,
could cause the *$ to assume that the node is not available when actually it is up and
running.
.hen the database is shut down gracefully, with the '74T(+.! !+,$02 or '74T(+.! I$$8(I0T8
command, +racle performs a chec-point and copies the start '*! value of each data file to its
corresponding stop '*! value in the control file before the actual shutdown of the database.
.hen the database is started, +racle performs two chec-s (among other consistency chec-s):
1. To see if the start '*! value in every data file header matches with its corresponding stop '*!
value in the control file.
. To see if the chec-point counter values match.
If both these chec-s are successful, then +racle determines that no recovery is re3uired for that data file.
These two chec-s are done for all data files that are online.
1. Instance fails:
This is the first stage in the process, when an instance fails and recovery becomes a necessity.
2. Failure detected:
The *$ of the clustered operating system does the detection of a node failure or an
instance failure.
The *$ is able to accomplish this with the help of certain parameters such as
1. the heartbeat interval and
. the heartbeat timeout parameter.
The heartbeat interval parameter invo-es a watchdog process that wa-es up at a
stipulated time interval and chec-s the e1istence of the other members in the cluster.
.hen the instance fails, the watchdog process or the heartbeat validation interval does
not get a response from the other instance within the time stipulated in the heartbeat
timeout parameter5 the *$ clears and declares that the instance is down. 9rom the first
time that the *$ does not get a response from the heartbeat chec-, to the time that the
*$ declares that the node has failed, repeated chec-s are done to ensure that the initial
message was not a false message.
3. Cluster reconfiguration: .hen a failure is detected, the cluster reorgani/ation occurs.
(uring this process, +racle alters the node:s cluster membership status. This involves
+racle ta-ing care of the fact that a node has left the cluster.
The &*' and &8' provide the *$ interfaces to the software and e1pose the cluster
membership map to the +racle instances when nodes are added or deleted from the
cluster.
The 2$+! process performs this e1posure of the information to the remaining +racle
instances.
2$+! performs this tas- by continually sending messages from the node it runs on and
often writing to the shared dis-.
.hen such write activity does not happen for a prolonged period of time, it provides
evidence to the surviving nodes that the node is no longer a member of the cluster.
'uch a failure causes a change in a node:s membership status within the cluster and
2$+! initiates the recovery actions, which include remastering of &*' and &8'
resources and instance recovery.
The cluster reconfiguration process, along with other activities performed by +racle processes,are
recorded in the respective bac-ground process trace files and in the instance-specific alert log
files.
&hread recovery:
0 thread is a stream of redo, for e1ample, all redo log files for a given instance. In a single stand-alone
configuration there is usually only one thread, although it is possible to specify more, under certain
circumstances.
0n instance has one thread associated with it, and recovery under this situation would be li-e any stand-
alone configuration. .hat is the difference in a ,0* environment; In a ,0* environment, multiple threads
are usually seen5 there is generally one thread per instance, and the thread applicable to a specific
instance is defined in the server parameter or int<'I(=.ora file.
In a crash recovery, redo is applied one thread at a time because only one instance at a time can dirty a
bloc- in cache5 in between bloc- modifications the bloc- is written to dis-. Therefore a bloc- in a current
online file can read redo for at most one thread. This assumption cannot be made in media recovery as
more than one instance may have made changes to a bloc-, so changes must be applied to bloc-s in
ascending '*! order, switching between threads where necessary.
In a ,0* environment, where instances could be added or ta-en off the cluster dynamically, when an
instance is added to the cluster, a thread enable record is written, a new thread of redo is created.
'imilarly, a thread is disabled when an instance is ta-en offline through a shutdown operation. The
shutdown operation places an end of thread (8+T) flag on the log header.
The below given figure illustrates the thread recovery scenario. In this scenario there are three instances,
,0*1, ,0*, and ,0*" that form the ,0* configuration. 8ach instance has set of redo log files and is
assigned thread 1, thread , and thread " respectively.
0s discussed above, if multiple instances fail, or during a crash recovery, all instances have to synchroni/e
the redo log files by the '*! number during the recovery operation. 9or e1ample, in above figure, '*! >1
was applied to the database from thread , which belongs to instance ,0*, followed by '*! > from
thread ", which belongs to instance ,0* ", and '*! >" also from thread ", before applying '*! ># from
thread 1, which is assigned to instance ,0*1.
..k!gfipccb.. is the callbac- on the completion of a *&' message send. If the delivery of
message fails, a log would be generated. 0ssociated with the log are message buffer pointer,
recovery state ob?ect, message type and others.
"n#ueue reconfiguration: 8n3ueue resources are reconfigured among the available
instances.
Contents of /0O trace file:
Analy1ing the /0O trace:
In general, the 2$+! trace file listed above contains the recovery and reconfiguration information on
loc-s, resources, and states of its instance group.
The si1 substates in the *luster &roup 'ervice (*&') that are listed in the trace file are:
1. 'tate @: .aiting for the instance reconfiguration.
. 'tate 1: ,eceived the instance reconfiguration event.
". 'tate : 0greed on the instance membership.
#. 'tates ", #, %: *&' name service recovery.
%. 'tate A: &8')&*' (loc-)resource) recovery.
8ach state is identified as a pair of incarnation number and its current substate. ::'etting state to B A::
means that the instance is currently at incarnation B and substate A.
*** 2002-11-16 23:48:02.753
kjxgmpoll reconfig bitmap: 1
*** 2002-11-16 23:48:02.753
kjxgmrcfg: econfig!ration "tarte#$ rea"on 1
kjxgmc": %etting "tate to 6 0.
*** 2002-11-16 23:48:02.880
&ame %er'ice fro(en
kjxgmc": %etting "tate to 6 1.
kjxgfipccb: m"g 0x1038c6a88$ mbo 0x1038c6a80$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *3744$204+
kjxgfipccb: m"g 0x1038c6,38$ mbo 0x1038c6,30$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *3416$204+
kjxgfipccb: m"g 0x1038c67e8$ mbo 0x1038c67e0$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *3088$204+
kjxgfipccb: m"g 0x1038c6b#8$ mbo 0x1038c6b#0$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *2760$204+
kjxgfipccb: m"g 0x1038c7118$ mbo 0x1038c7110$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *2432$204+
kjxgfipccb: m"g 0x1038c6fc8$ mbo 0x1038c6fc0$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *2104$204+
kjxgfipccb: m"g 0x1038c7268$ mbo 0x1038c7260$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *1776$204+
kjxgfipccb: m"g 0x1038c6e78$ mbo 0x1038c6e70$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *1448$204+
kjxgfipccb: m"g 0x1038c6#28$ mbo 0x1038c6#20$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *1120$204+
kjxgfipccb: m"g 0x1038c7a48$ mbo 0x1038c7a40$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *7,2$204+
kjxgfipccb: m"g 0x1038c7508$ mbo 0x1038c7500$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *464$204+
kjxgfipccb: m"g 0x1038c73b8$ mbo 0x1038c73b0$ t)pe 22$ ack 0$ ref 0$ "tat 6
kjxgfipccb: %en# cancelle#$ "tat 6 in"t 0$ t)pe 22$ tkt *136$204+
*** 2002-11-16 23:48:03.104
$$%&nchroni'ation timeout interval$$ is the timeout value for &8' to signal an abort on its recovery
process. 'ince the recovery process is distributed, at each step, each instance waits for others to
complete the corresponding step before moving to the ne1t one. This value has a minimum of 1@
minutes and is computed according the number of resources.
(esource reconfiguration:
This phase of the recovery is important in a ,0* environment where the &*' commences
recovery and remastering of the bloc- resources, which involves rebuilding lost resource masters
on surviving instances.
,emastering of resources is e1haustive by itself because of the various scenarios under which
remastering of resources ta-es place.
-btaine# !p#ate lock for "e.!ence 6$ "e. 6
*** 2002-11-16 23:48:04.611
/oting re"!lt"$ !p# 1$ "e. 7$ bitmap: 1
kjxgmp": propo"ing "!b"tate 2
kjxgmc": %etting "tate to 7 2.
0erforme# t1e !ni.!e in"tance i#entification c1eck
kjxgmp": propo"ing "!b"tate 3
kjxgmc": %etting "tate to 7 3.
&ame %er'ice reco'er) "tarte#
2elete# all #ea#-in"tance name entrie"
kjxgmp": propo"ing "!b"tate 4
kjxgmc": %etting "tate to 7 4.
3!ltica"te# all local name entrie" for p!bli"1
epla)e# all pen#ing re.!e"t"
kjxgmp": propo"ing "!b"tate 5
kjxgmc": %etting "tate to 7 5.
&ame %er'ice normal
&ame %er'ice reco'er) #one
*** 2002-11-16 23:48:04.612
kjxgmp": propo"ing "!b"tate 6
kjxgmc": %etting "tate to 7 6.
kjfmact: call k"im#ic on in"tance *0+
*** 2002-11-16 23:48:04.613
*** 2002-11-16 23:48:04.614
econfig!ration "tarte#
%)nc1roni(ation timeo!t inter'al: 660 "ec
)(* free'e: The first step in the cluster reconfiguration process, before beginning the actual
recovery process, is for the *$ to ensure that the &,( is not distributed and hence free/es
activity on the &,( so that no future writes or updates happen to the &,( on the node that is
currently performing the recovery. This step is also recorded in the alert logs. 'ince the &,( is
maintained by the &*' and &8' processes, all &*' and &8' resources and also the write re3uests
are fro/en. (uring this step of the temporary free/e, +racle ta-es control of the situation and
balances the resources among the available instances.
%at &o' 16 23:48:04 2002
econfig!ration "tarte#
4i"t of no#e": 1$
5lobal e"o!rce 2irector) fro(en
one no#e partition
6omm!nication c1annel" ree"tabli"1e#
"n#ueue tha+: 0fter the reconfiguration of resources among the available instances, +racle ma-es the
en3ueue resources available. 0t this point the process for-s to perform two tas-s in parallel, resource
reconfiguration and pass 1 recovery.
(esource release: +nce the remastering of resources is completed, the ne1t step is to complete
processing of pending activities. +nce this is completed, all resources that were loc-ed during the
recovery process are released or the loc-s are downgraded (converted to a lower level).
4i"t of no#e": 1$
5lobal e"o!rce 2irector) fro(en
no#e 1
* kj"1a"1cfg: 78m t1e onl) no#e in t1e cl!"ter *no#e 1+
9cti'e %en#back :1re"1ol# ; 50<
6omm!nication c1annel" ree"tabli"1e#
3a"ter broa#ca"te# re"o!rce 1a"1 'al!e bitmap"
&on-local 0roce"" block" cleane# o!t
e"o!rce" an# en.!e!e" cleane# o!t
e"o!rce" rema"tere# 2413
35334 56% "1a#o=" tra'er"e#$ 0 cancelle#$ 1151 clo"e#
17,68 56% re"o!rce" tra'er"e#$ 0 cancelle#
20107 56% re"o!rce" on freeli"t$ 37877 on arra)$ 37877 allocate#
"et ma"ter no#e info
%!bmitte# all remote-en.!e!e re.!e"t"
>p#ate r#omain 'ariable"
2=n-c't" repla)e#$ /94?4@" #!bio!"
9ll grantable en.!e!e" grante#
*** 2002-11-16 23:48:05.412
35334 56% "1a#o=" tra'er"e#$ 0 repla)e#$ 1151 !nopene#
%!bmitte# all 56% cac1e re.!e"t"
0 =rite re.!e"t" i""!e# in 34183 56% re"o!rce"
2, 07" marke# "!"pect$ 0 fl!"1 07 m"g"
*** 2002-11-16 23:48:06.007
econfig!ration complete
0o"t %3-& to "tart 1"t pa"" 7
*** 2002-11-16 23:52:28.376
kjxgmpoll reconfig bitmap: 0 1
*** 2002-11-16 23:52:28.376
kjxgmrcfg: econfig!ration "tarte#$ rea"on 1
kjxgmc": %etting "tate to 7 0.
*** 2002-11-16 23:52:28.474
&ame %er'ice fro(en
kjxgmc": %etting "tate to 7 1.
*** 2002-11-16 23:52:28.881
-btaine# !p#ate lock for "e.!ence 7$ "e. 7
*** 2002-11-16 23:52:28.887
/oting re"!lt"$ !p# 1$ "e. 8$ bitmap: 0 1
kjxgmp": propo"ing "!b"tate 2
kjxgmc": %etting "tate to 8 2.
0erforme# t1e !ni.!e in"tance i#entification c1eck
kjxgmp": propo"ing "!b"tate 3
kjxgmc": %etting "tate to 8 3.
&ame %er'ice reco'er) "tarte#
2elete# all #ea#-in"tance name entrie"
kjxgmp": propo"ing "!b"tate 4
kjxgmc": %etting "tate to 8 4.
3!ltica"te# all local name entrie" for p!bli"1
epla)e# all pen#ing re.!e"t"
kjxgmp": propo"ing "!b"tate 5
kjxgmc": %etting "tate to 8 5.
&ame %er'ice normal
&ame %er'ice reco'er) #one
*** 2002-11-16 23:52:28.8,6
kjxgmp": propo"ing "!b"tate 6
kjxgmc": %etting "tate to 8 6.
*** 2002-11-16 23:52:2,.116
*** 2002-11-16 23:52:2,.116
econfig!ration "tarte#
%)nc1roni(ation timeo!t inter'al: 660 "ec
4i"t of no#e": 0$1$
,ass 1 recover&:
This step of the recovery process is performed in parallel with steps B and C.
'$+! will merge the redo thread ordered by '*! to ensure that changes are written in an
orderly fashion.
'$+! will also find D., in the redo stream and remove entries that are no longer needed for
recovery, because they are 6I of bloc-s already written to dis-.
0 recovery set is produced that only contains bloc-s modified by the failed instance with no
subse3uent D., to indicate that the bloc-s were later written.
8ach entry in the recovery list is ordered by first-dirty '*! to specify the order to ac3uire
instance recovery loc-s.
,eading the log files and identifying the bloc-s that need to be recovered completes the first pass
of the recovery process.
0o"t %3-& to "tart 1"t pa"" 7
%at &o' 16 23:48:06 2002
7n"tance reco'er): looking for #ea# t1rea#"
%at &o' 16 23:48:06 2002
?eginning in"tance reco'er) of 1 t1rea#"
%at &o' 16 23:48:06 2002
%tarte# fir"t pa"" "can
%at &o' 16 23:48:06 2002
6omplete# fir"t pa"" "can
5101 re#o block" rea#$ 4,0 #ata block" nee# reco'er)
%at &o' 16 23:48:07 2002
%tarte# reco'er) at
:1rea# 1: log"e. 2,$ block 2$ "cn 0.1157,5034
eco'er) of -nline e#o 4og: :1rea# 1 5ro!p 1 %e. 2, ea#ing mem 0
3emA 0 err" 0: B#e'B'xBr#"kBorarac#gBpartition15C31
3emA 1 err" 0: B#e'B'xBr#"kBorarac#gBpartition15C21
%at &o' 16 23:48:08 2002
6omplete# re#o application
%at &o' 16 23:48:08 2002
Dn#e# reco'er) at
:1rea# 1: log"e. 2,$ block 5103$ "cn 0.115820072
420 #ata block" rea#$ 500 #ata block" =ritten$ 5101 re#o block" rea#
Dn#ing in"tance reco'er) of 1 t1rea#"
-lock
resource claimed for recover&:
+nce pass 1 of the recovery process completes and the &*' reconfiguration has completed, the recovery
process continues by:
1. +btaining buffer space for the recovery set, possibly by performing write operation to ma-e
room.
. *laiming resources on the bloc-s identified during pass 1.
". +btaining a source buffer, either from an instance:s buffer cache or by a dis- read.
(uring this phase, the recovering '$+! process will inform each loc- element:s master node for each
bloc- in the recovery list that it will be ta-ing ownership of the bloc- and loc- for recovery. Dloc-s become
available as they have been recovered. The loc- recovery is based on the ownership of the loc- element.
This depends on one of the various scenarios of loc- conditions:
%cenario 1: 2et us assume that all instances in the cluster are holding a loc- status of !2@5
'$+! ac3uires the loc- element in E2@ mode, reads the bloc- from dis- and applies redo
changes, and subse3uently writes out the recovery buffer when complete.
%cenario 2: In this situation, the '$+! process of the recovering instance has a loc- mode of
!2@ and the second instance has a loc- status of E2@5 however, the failed instance has a status
similar to the recovery node, i.e., !2@. In this case, no recovery is re3uired because the current
copy of the buffer already e1ists on another instance.
%cenario 3: In this situation, let us assume that the recovering instance has a loc- status of
!2@5 the second instance has a loc- status of E&@.
7owever, the failed instance has a status similar to the recovery node. In this case also, no
recovery is re3uired because a current copy of the buffer already e1ists on another instance.
'$+! will remove the bloc- entry from the recovery set and the recovery buffer is released. The
recovery instance has a loc- status of !&15 however, the second instance that originally had a
E&@ status now holds !2@ status after writing the bloc- to dis-.
%cenario .: !ow, what if the recovering instance has loc- status of !2@5 the second instance has
a loc- status of !&1. 7owever, the failed instance has a status similar to the recovery node. In
this case the consistent read image of the latest 6I is obtained, based on '*!. The redo changes
are applied and the recovery buffer is written when complete. The recovery instance has a loc-
element of E&@ and the second instance continues to retain the !&1 status on the bloc-.
%cenario /: The recovering instance has a loc- status of '2@ or E2@ and the other instance has
no loc- being held. In this case, no recovery is needed because a current copy of the buffer
already e1ists on another instance. '$+! will remove the bloc- from the recovery set. The loc-
status will not change.
%cenario 0: The recovery instance holds a loc- status of E&@ and the second instance has a loc-
status of !&1. '$+! initiates the write of the current bloc-. !o recovery is performed by the
recovery instance. The recovery buffer is released and the 6I count is decremented when the
bloc- write has completed.
%cenario 1: The recovery instance holds a loc- status of !&1 and the second instance holds a
loc- with status of E&@. In this case, '$+! initiates a write of the current bloc- on the second
instance. !o recovery is performed by the recovery instance. The recovery buffer is released and
the 6I count is decremented when the bloc- write has completed.
%cenario 2: The recovering instance holds a loc- status of !&1, and the second instance holds a
loc- status of !&@. In this case a consistent read copy of the bloc- is obtained from the highest
6I based on '*!. ,edo changes are applied and the recovery buffer is written when complete.
)(* unfro'en3 Reconfiguration co$plete- :
0fter the necessary resources are obtained, and the recovering instance has all the resources it
needs to complete pass with no further intervention, the bloc- cache space in the &,( is
unfro/en.
0t this stage, the recovery process splits into two parallel phases while certain areas of the
system are being made partially available5 the second phase of the recovery begins.
1. ,artial availabilit&: 0t this stage of the recovery process the system is partially available for
use. The bloc-s not in recovery can be operated on as before. Dloc-s being recovered are bloc-ed
by the resource held in the recovering instance.
. ,ass 2 recover&: The second phase of recovery continues, ta-ing care of all the bloc-s identified
during pass 1, recovering and writing each bloc-, then releasing recovery resources. (uring the
second phase the redo threads of the failed instances are once again merged by '*! and instead
of performing a bloc- level recovery in memory, during this phase the redo is applied to the data
files.
". -lock availabilit&: 'ince the second pass of recovery recovers individual bloc-s, these bloc-s
are made available for user access as they are recovered a bloc- at a time.
#. (ecover& en#ueue release: .hen all the bloc-s have been recovered and written and the
recovery resources released, the system is completely available and the recovery en3ueue is
released.
(uring normal instance recovery operation, there is a potential that one or more (including the recovering
instance) of the other instances could also encounter failure. If this happens, +racle has to handle the
situation appropriately, based on the type of failure:
If recovery fails without the death of the recovering instance, instance recovery is restarted.
If during the process of recovery the recovering process dies, one of the surviving instances will
ac3uire the instance recovery en3ueue and start the recovery process.
If during the recovery process, another non-recovering instance fails, '$+! will abort the
recovery, release the instance recovery (I,) en3ueue and reattempt instance recovery.
(uring the recovery process, if I)+ errors are encountered, the related fies are ta-en offline and
the recovery is restarted.
If one of the bloc-s that '$+! is trying to recover is corrupted during redo application, +racle
performs online bloc- recovery to clean up the bloc- in order for instance recovery to continue.
Steps of $edia recovery:
+racle has to perform several steps during a media recovery from validating the first data file up to the
recovery of the last data file. (etermining if the archive logs have to be applied etc. is also carried out
during this process. 0ll database operations are se3uenced using the database '*! number. 'imilarly,
during a recovery operation, the '*! plays an even more important role because data has to be recovered
in the order in which it was created.
1. The first step during the media recovery process is to determine the lowest data file header
chec-point '*! of all data files being recovered. This information is stored in every data file
header record.
The output below is from a data file header dump and indicates the various mar-ers validated
during a media recovery process.
)A&A 2#/3 41:
(name >"") )dev)v1)rds-)oraracdg)partition1&F"
creation si/eG@ bloc- si/eGC1H statusG@1e headG"" tailG"" dupG1
tablespace @, inde1G1 -rfilG1 prevFfileG@
unrecoverable scn: @1@@@@.@@@@@@@@ @1)@1)1HCC @@:@@:@@
*hec-point cnt:1"H scn: @1@@@@.@Affc@%@ 11)@)@@ @:"C:1#
'top scn: @1ffff.ffffffff 11)1A)@@ 1H:@1:1B
*reation *hec-pointed at scn: @1@@@@.@@@@@@@A @C)1)@@ 1B:@#:@%
thread:@ rba:(@1@.@.@)
enabled threads: @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@
@@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@
+ffline scn: @1@@@@.@A%acf1d prevFrange: @
+nline *hec-pointed at scn: @1@@@@.@A%acf1e 1@)1H)@@ @H:#":1%
thread:1 rba:(@11..@)
enabled threads: @1@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@
@@@@@@@@ @@@@@@@@ @@@@@@@@
7ot Dac-up end mar-er scn: @1@@@@.@@@@@@@@
au1Ffile is !+T (89I!8(
9I28 780(8,:
'oftware vsnG1%"@H@HAG@1H@@@@@, *ompatibility
IsnG1"#1BBCG@1C@@@@@@
(b I(G"%HCCC%HHHG@1dACa#Af, (b !ameG:6,+((D:
0ctivation I(G@G@1@
*ontrol 'e3G1CG@1CCA, 9ile si/eG11%@@G@11c@@
9ile !umberG1, Dl-si/GC1H, 9ile TypeG" (0T0
Tablespace >@ - 'J'T8$ relFfn:1
*reation at scn: @1@@@@.@@@@@@@A @C)1)@@ 1B:@#:@%
Dac-up ta-en at scn: @1@@@@.@@@@@@@@ @1)@1)1HCC @@:@@:@@ thread:@
reset logs count:@11c%a1a"" scn: @1@@@@.@A%acf1e recovered at 11)1A)
@@ 1H:@:%@
status:@1# root dba:@1@@#@@@b" ch-pt cnt: 1"H ctl cnt:1"C
begin-hot-bac-up file si/e: @
*hec-pointed at scn: @1@@@@.@Affc@%@ 11)@)@@ @:"C:1#
thread: rba:(@11%."1aaA.1@)
enabled threads: @11@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@
@@@@@@@@ @@@@@@@@ @@@@@@@@
Dac-up *hec-pointed at scn: @1@@@@.@@@@@@@@
thread:@ rba:(@1@.@.@)
enabled threads: @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@
@@@@@@@@ @@@@@@@@
81ternal cache id: @1@ @1@ @1@ @1@
0bsolute fu//y scn: @1@@@@.@@@@@@@@
,ecovery fu//y scn: @1@@@@.@@@@@@@@ @1)@1)1HCC @@:@@:@@
Terminal ,ecovery 'tamp scn: @1@@@@.@@@@@@@@ @1)@1)1HCC @@:@@:@@
If a data file:s chec-point is in its offline range, then the offline-end chec-point is used instead of
the data file header chec-point as its media-recovery-start '*!.
2i-e the start '*!, +racle uses the stop '*! on all data files to determine the highest '*! to
allow recovery to terminate. This prevents a needless search beyond the '*! that actually needs
to be applied.
(uring the media recovery process, +racle automatically opens any enabled thread of redo and if
the re3uired redo records are not found in the current set of redo log files, the database
administrator is prompted for the archived redo log files.
. +racle places an e1clusive $, (media recovery) loc- on the files undergoing recovery. This
prevents two or more processes from starting media recovery operation simultaneously. The loc-
is ac3uired by the session that started the operation and is placed in a shared mode so that no
other session can ac3uire the loc- in e1clusive mode.
". The $, fu//y bit is set to prevent the files from being opened in an inconsistent state.
#. The redo records from the various redo threads are merged to ensure that the redo records are
applied in the right order using the ascending '*!.
%. (uring the media recovery operation, chec-pointing occurs as normal, updating the chec-point
'*! in the data file headers. This helps if there is a failure during the recovery process because it
can be restarted from this '*!.
A. This process continues until a stop '*! is encountered for a file, which means that the file was
ta-en offline, or made read-only at this '*! and has no redo beyond this point. .ith the
database open, ta-ing a data file offline produces a finite stop '*! for that data file5 if this is not
done, there is no way for +racle to determine when to stop the recovery process for a data file.
B. 'imilarly, the recovery process continues until the current logs in all threads have been applied.
The end of thread (8+T) flag that is part of the redo log header file of the last log guarantees that
this has been accomplished.
The following output from a redo log header provides indication of the 8+T mar-er found in the
redo log file:
/O5 2#/3 46:
(name >#) )dev)v1)rds-)oracledg)partition1&F1@@
(name >#") )dev)v1)rds-)oracledg)partition1&F#@@
Thread redo log lin-s: forward: @ bac-ward: %
si/: @11H@@@@ se3: @1@@@@@@1% hws: @1# bs/: %1 nab:
@1ffffffff flg: @1C dup:
0rchive lin-s: fwrd: " bac-: @ 6rev scn: @1@@@@.@AeA"c#e
2ow scn: @1@@@@.@AeAe#HA 11)1A)@@ @:":1H
!e1t scn: @1ffff.ffffffff @1)@1)1HCC @@:@@:@@
9I28 780(8,:
'oftware vsnG1%"@H@HAG@1H@@@@@, *ompatibility
IsnG1%"@H@HAG@1H@@@@@
(b I(G"%HCCC%HHHG@1dACa#Af, (b !ameG:6,+((D:
0ctivation I(G"A@#@CC"G@1dAd1eeAb
*ontrol 'e3G1C1G@1CC%, 9ile si/eG1A"C#@@G@11H@@@@
9ile !umberGA, Dl-si/G%1, 9ile TypeG 2+&
descrip:KThread @@@, 'e3> @@@@@@@@1, '*!
@1@@@@@AeAe#HA-@1ffffffffffffK
thread: nab: @1ffffffff se3: @1@@@@@@1% hws: @1# eot: dis: @
reset logs count: @11c%a1a"" scn: @1@@@@.@A%acf1e
2ow scn: @1@@@@.@AeAe#HA 11)1A)@@ @:":1H
!e1t scn: @1ffff.ffffffff @1)@1)1HCC @@:@@:@@
8nabled scn: @1@@@@.@A%acfa 1@)1H)@@ @H:#A:@A
Thread closed scn: @1@@@@.@Affc@"e 11)@)@@ @:"B:%
2og format vsn: @1C@@@@@@ (is- c-sum: @1bCf *alc c-sum: @1bCf
Terminal ,ecovery 'tamp scn: @1@@@@.@@@@@@@@ @1)@1)1HCC @@:@@:@@
$ost recent redo scn: @1@@@@.@@@@@@@@
2argest 2.!: @ bloc-s
$iscellaneous flags: @1@
5CS and 53S failure:
The &*' and &8' services that comprise the 2$', 2$(, and &,( processes provide the communication of
re3uests over the cluster interconnect. These processes are also prone to failures. This could potentially
happen when one or more of the processes participating in this configuration fails, or fails to respond
within a predefined amount of time. 9ailures such as these could be as a result of failure of any of the
related processes, a memory fault, or some other cause. The 2$+! on one of the surviving nodes should
detect the problem and start the reconfiguration process. .hile this is occurring, no loc- activity can ta-e
place, and some users will be forced to wait to obtain re3uired 6*$ loc-s or other resources.
The recovery that occurs as a result of the &*' or &8' process dying is termed online bloc- recovery. This
is another -ind of recovery that is uni3ue to the ,0* implementation. +nline bloc- recovery occurs when a
data buffer becomes corrupt in an instance:s cache. Dloc- recovery could also occur if either a foreground
process dies while applying changes or if an error is generated during redo application. If the bloc-
recovery is to be performed as a result of the foreground process dying, then 6$+! initiates online bloc-
recovery. 7owever, if this is not the case, then the foreground process attempts to ma-e an online
recovery of the bloc-.
4nder normal circumstances, this involves finding the bloc-:s predecessor and applying redo records to
this predecessor from the online logs of the local instance. 7owever, under the cache fusion architecture,
copies of bloc-s are available in the cache of other instances and therefore the predecessor is the most
recent 6I for the buffer that e1ists in the cache of another instance. If, under certain circumstances, there
is no 6I for the corrupted buffer, the bloc- image from the dis- data is used as the predecessor image
before changes from the online redo logs are used.
#nstance hang or false failure:
4nder very unusual circumstances, probably due to an e1ception at the +racle -ernel level, an instance
could encounter a hang condition, in which case the instance is up and running but no activity against this
instance is possible. 4sers or processes that access this instance could encounter a hung connection and
no response is received. In such a situation, the instance is neither down nor available for access. The
other surviving instance may not receive a response from the hung instance5 however, it cannot declare
that the instance is not available because the re3uired activity of the 2$+! process, such as writing to the
shared dis-, did not complete. 'ince the surviving instance did not receive any failure signal, it attempts
to shut down the non-responding instance and is unable to because of the reasons stated above. In these
situations the only opportunity is to perform a hard failure of either the entire node holding the hung
instance or the instance itself. In either situation human interruption is re3uired.
In the case of forcing a hard failure of the node holding the hung instance, the systems administrator will
have to perform a bounce of the node5 when the node is bac- up and alive the instance can be started.
In the case where the instance shutdown is preferred, no graceful shutdown is possible5 instead, an
operating-system-level intervention by shutting down one of the critical bac-ground processes such as
'$+! will cause an instance crash.
,ecovery in both these scenarios is an instance recovery. 0ll steps discussed in the section on instance
failures apply to this type of failure.

Вам также может понравиться