Understanding Instance Recovery in Rac

Understanding Instance Recovery in RAC Understanding Cache Fusion in RAC
Crash Recovery - all instances have failed

Instance Recovery - one instance has failed
In both cases the threads from failed instances need to be merged, in a instance recovery SMON will perform the recovery where
as in a crash recovery a foreground process performs the recovery.
The main features (advantages) of cache fusion recovery are
Recovery cost is proportional to the number of failures, not the total number of nodes
It eliminates disk reads of blocks that are present in a surviving instance's cache
It prunes recovery set based on the global resource lock state
The cluster is available after an initial log scan, even before recovery reads are complete
In cache fusion the starting point for recovery of a block is its most current PI version, this could be located on any of the surviving
instances and multiple PI blocks of a particular buffer can exist.
Remastering is the term used that describes the operation whereby a node attempting recovery tries to own or master the
resource(s) that were once mastered by another instance prior to the failure. When one instance leaves the cluster, the GRD of that
instance needs to be redistributed to the surviving nodes. RAC uses an algorithm called lazy remastering to remaster only a
minimal number of resources during a reconfiguration. The entire Parallel Cache Management (PCM) lock space remains invalid
while the DLM and SMON complete the below steps
1. IDLM master node discards locks that are held by dead instances, the space is reclaimed by this operation is used to
remaster locks that are held by the surviving instance for which a dead instance was remastered
2. SMON issues a message saying that it has acquired the necessary buffer locks to perform recovery
1
Lets look at an example on what happens during a remastering, lets presume the following
Instance A masters resources 1, 3, 5 and 7

Instance B masters resources 2, 4, 6, and 8
Instance C masters resources 9, 10, 11 and 12
Instance B is removed from the cluster, only the resources from instance B are evenly remastered across the surviving nodes (no
resources on instances A and C are affected), this reduces the amount of work the RAC has to perform, likewise when a instance
joins a cluster only minimum amount of resources are remastered to the new instance.
Before Remastering
After Remastering
You can control the remastering process with a number of parameters

_gcs_fast_config
enables fast reconfiguration for gcs locks (true|false)
_lm_master_weight
controls which instance will hold or (re)master more resources than others
_gcs_resources
controls the number of resources an instance will master at a time
you can also force a dynamic remastering (DRM) of an object using oradebug
2
## Obtain the OBJECT_ID form the below table

SQL> select * from v$gcspfmaster_info;
force dynamic remastering
(DRM)
## Determine who masters it

SQL> oradebug setmypid
SQL> oradebug lkdebug -a <OBJECT_ID>
## Now remaster the resource
SQL> oradebug setmypid
SQL> oradebug lkdebug -m pkey <OBJECT_ID>
The steps of a GRD reconfiguration is as follows
Instance death is detected by the cluster manager

Request for PCM locks are frozen
Enqueues are reconfigured and made available
DLM recovery
GCS (PCM lock) is remastered
Pending writes and notifications are processed
I Pass recovery
The instance recovery (IR) lock is acquired by SMON
The recovery set is prepared and built, memory space is allocated in the SMON PGA
SMON acquires locks on buffers that need recovery
II Pass recovery
II pass recovery is initiated, database is partially available
Blocks are made available as they are recovered
The IR lock is released by SMON, recovery is then complete
The system is available
Graphically it looks like below
Cache Fusion in Operation

A quick recap of GCS, a GCS resource can be local or global, if it is local it can be acted upon without consulting other instances, if
it is global it cannot be acted upon without consulting or informing remote instances. GCS is used as a messaging agent to
coordinate manipulation of a global resource. By default all resources are in NULL mode (remember null mode is used to convert
from one type to another (share or exclusive)).
The table below denotes the different states of a resource
Mode/Role
Local
Global
Null (N)
NL
NG
Shared (S)
SL
SG
Exclusive (X)
XL
XG
States
SL
it can serve a copy of the block to other instances and it can read the
block from disk, since the block is not modified there is no need to
write to disk
XL
it has sole ownership and interest in that resource, it has exclusive

right to modify the block, all changes to the blocks are in the local
buffer cache and it can write the block to the disk. If another instance
wants the block it can to come via the GCS
NL
used to protect consistent read block, if an instance wants it in X

mode, the current instance will send the block to the requesting
6
instance and downgrades its role to NL

SG
a block is present in one or more instances, an instance can read the

read from disk and serve it to other instances
XG
a block can have one or more PIs, the instance with the XG role has
the latest copy of the block and is the most likely candidate to write
the block to the disk. GCS can ask the instance to write the block and
serve it to other instances
NG
after discarding PIs when instructed to by GCS, the block is kept in

the buffer cache with NG role, this serves only as the CR copy of the
block.
Below are a number of common scenarios to help understand the following
reading from disk

reading from cache
getting the block from cache for update
performing an update on a block
performing an update on the same block
reading a block that was globally dirty
performing a rollback on a previously updated block
reading the block after commit
We will assume the following
Four RAC environment (Instances A, B, C and D)

Instance D is the master of the lock resource for the data block BL
7
We will only use one block and it will reside at SCN 987654
We will use a three-letter code for the lock states

o
first letter will indicate the lock mode - N = Null, S = Shared and X = Exclusive
second latter will indicate lock role - G = Global, L = Local
The third letter will indicate the PIs - 0 = no PIs, 1 = a PI of the bloc
for example a code of SL0 means a global shared lock with no past images (PIs)
Reading a block from disk
instance C want to read the block it will request a lock in share
mode from the master instance
1. Instance C requests the block by sending a shared lock
request to master D
2. The block has never been read into the buffer cache of any
instance and it is not locked. Master D grants the lock to
instance C. The lock granted is SL0 (see above to work out
three-letter code)
3. Instance C reads the block from the shared disk into its
buffer cache
4. Instance C has the block in shard mode, the lock manager
updates the resource directory.
Reading a block from the cache
Carrying on from the above example, Instance B wants to read the

same block that is cached in instance C buffer.
1. Instance B sends a shared lock request to master instance D
2. The lock master knows that the block may be available at
instance C and sends a ping message to instance C
3. Instance C sends the block to instance B via the
interconnect, along with the block instance C indicates that
instance B should take the current lock mode and role from
instance C, instance C keeps a copy of the block
4. Instance B sends a message to instance D that it has
assumed the SL lock for the block. This message is not
critical for the lock manager, thus the message is sent
asynchronously
Getting a (Cached) clean block for update
Carrying on from the above example, instance A wants to modify
the same block that is already cached in instance B and C (block
987654)
1. Instance A sends an exclusive lock request to master D
2. The lock master knows that the block may be available at
instance B in SCUR mode and at instance C in CR mode. it
also sends a ping message to the shared lock holders. The
most recent access was at instance B and instance D sends
a BAST message to instance B
3. Instance B sends the block to instance A via the interconnect
and closes it shared lock. The block may still be in its buffer
to be as CR, but all locks are released
9
4. Instance A now has the exclusive lock on the block and

sends an assume message to instance D, the lock is in XL0
5. Instance A modifies the block in its buffer cache, the changes
are not committed and thus the block has not been written to
disk, thus the SCN remains at 987654
Getting a (Cached) modified block for update and commit
Carrying on from the above example, instance C now wants to
modify the block, if it tries to modify the same row it will have to wait
until instance A either commits or rolls back. However in this case
instance C wants to modify a different row in the same block.
1. Instance C sends an exclusive lock request to master D
2. The lock master knows that instance A holds an exclusive
lock on the block and hence sends a ping message to
instance A
3. Instance A sends the dirty buffer to instance C via the
interconnect, it downgrades the lock from XCR to NULL, it
keeps a PI version of the block and disowns any lock on that
buffer. Before shipping the block, Instance A has to create a
PI image and flush any pending redo for the block change,
the block mode on instance A is now NG1
4. Instance C sends a message to instance D indicating it has
the block in exclusive mode. The block role G indicates that
the block is in global mode and if it needs to write the block
to disk it must coordinate it with other instances that have
past images (PIs) of that block. Instance C modifies the block
and issues a commit, the SCN is now 987660.
Commit the previously modified block and select the data
10
Carrying on from the above example, instance A now issues a

commit to release the row level locks held by the transaction and
flush the redo information to the redologs
1. Instance A wants to commit the changes, commit operations
do not require any synchronous modifications to the block
2. The lock status remains the same as the previous state and
change vectors for the commits are written to the redologs.
Write the dirty buffers to disk due to a checkpoint

Carrying on from the above example, instance B writes the dirty
blocks from the buffer cache due to a checkpoint (this is were it gets
interesting and very clever)
1. Instance B sends a write request to master D with the
necessary SCN
2. The master knows that the most recent copy of the block
may be available at instance C and hence sends a message
to instance C asking to write
3. Instance C initiates a disk write and writes a BWR into the
redolog file
4. Instance C get the write notification that the write is complete
5. Instance C notifies the master that the write is completed
6. On receipt of the notification, instance D tells all PI holders to
discard their PIs, and the lock at instance C writes the
11
modified block to the disk

7. All instances that have previously modified this block will also
have to write a BWR. The write request by instance C has
now been satisfied and instance C can now proceed with its
checkpoint as usual
Master instance crashes
Carrying on from the above example
1. the master instance D crashes
2. The Global Resource Directory is frozen momentarily and
the resources held by master instance D will be equally

distributed in the surviving nodes, also know as remastering
(see remastering for more details).
Select the rows from Instance A
12
Carrying on from the above example, now instance A queries the

rows from that table to get the most recent data
1. Instance A sends a shared lock to now the new master
instance C
2. Master C knows the most recent copy of the block may be in
instance C and asks the holder to ship the CR block to
instance A
3. Instance C ships the CR block to instance A via the
interconnect
The above sequence of events can be seen in the table below

Example
Operation on Node
A
update the
block
update the same
block
commit the
changes
trigger checkpoint
SCUR
read the block from

cache
C
read block from
disk
Buffer Status
CR
SCUR
XCUR
CR
CR
PI
CR
XCUR
PI
CR
XCUR
CR
XCUR
13
instance
crash
7
8
select the rows
CR
XCUR
14

Understanding Instance Recovery in Rac

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Understanding Instance Recovery in Rac

Загружено:

Авторское право:

Доступные форматы

Understanding Instance Recovery in RAC Understanding Cache Fusion in RAC

Crash Recovery - all instances have failed

It prunes recovery set based on the global resource lock state

Instance A masters resources 1, 3, 5 and 7

Instance C masters resources 9, 10, 11 and 12

You can control the remastering process with a number of parameters

enables fast reconfiguration for gcs locks (true|false)

controls the number of resources an instance will master at a time

## Obtain the OBJECT_ID form the below table

## Determine who masters it

The steps of a GRD reconfiguration is as follows

Instance death is detected by the cluster manager

Enqueues are reconfigured and made available

GCS (PCM lock) is remastered

Pending writes and notifications are processed

The instance recovery (IR) lock is acquired by SMON

SMON acquires locks on buffers that need recovery

II pass recovery is initiated, database is partially available

Blocks are made available as they are recovered

The IR lock is released by SMON, recovery is then complete

The system is available

Graphically it looks like below

Cache Fusion in Operation

it has sole ownership and interest in that resource, it has exclusive

used to protect consistent read block, if an instance wants it in X

instance and downgrades its role to NL

a block is present in one or more instances, an instance can read the

after discarding PIs when instructed to by GCS, the block is kept in

Below are a number of common scenarios to help understand the following

reading from disk

getting the block from cache for update

performing an update on a block

performing an update on the same block

reading a block that was globally dirty

performing a rollback on a previously updated block

reading the block after commit

We will assume the following

Four RAC environment (Instances A, B, C and D)

We will use a three-letter code for the lock states

second latter will indicate lock role - G = Global, L = Local

Carrying on from the above example, Instance B wants to read the

4. Instance A now has the exclusive lock on the block and

Carrying on from the above example, instance A now issues a

Write the dirty buffers to disk due to a checkpoint

modified block to the disk

the resources held by master instance D will be equally

Select the rows from Instance A

Carrying on from the above example, now instance A queries the

The above sequence of events can be seen in the table below

read the block from

select the rows

Вам также может понравиться