Redol Log File Management

PURPOSE
~~~~~~~~~~~~~~~~~~~
This discussion is the result of numerous customer requests made to Oracle Support Services regarding the
management of the redo log files when using RMAN to automate on recovery.
Automatism on recovery is done by auto-inspecting all destinations of the files that belong to the
database, by identifying the files that are missing and by choosing the recovery path
accordingly.
This document is intended to show when, and in which situations RMAN is able to do this
auto-inspection by itself, without manual intervention. It also discusses current limitations.
The discussion attempts to clarify the need for manual intervention by the DBA that is
requested before recovery, i.e shows the amount of work uploaded to the DBA, as well as what
the DBAs need to check before starting recovery.
It also attempts to explain the errors resulting from this issue, to give customers the
possibility to handle them without Oracle Support assistance.
It should help DBAs get a better understanding of the way RMAN works and to help them in the
process of automatizing recovery with RMAN.
It is the task of the DBA to pre-process RMAN recovery by writing customized OS shell scripts
that auto-inspect the destinations of the database files after a media failure, and
dynamically create RMAN recovery scripts.
TEST ENVIRONMENT
~~~~~~~~~~~~~~~~
All tests were performed with 8.1.6 on Windows NT.
The prerequisites are the use of an RMAN catalog.
SCOPE & APPLICATION

~~~~~~~~~~~~~~~~~~~
This document is intended to provide an understanding of the way RMAN manages the redo log files
when testing recovery concepts.
This information can also be used by customers who intend to automatize the recovery process.
Please note that this article does not currently discuss backup/recovery concepts and does not
supply DBAs with shell scripts or SQL scripts for automatic backup and recovery operations.
It is intended to clarify some typical error situations encountered on recover, and
helps DBAs to decide how far they can go in the attempt to automatize this operation.
This article concentrates primarily on the way the log files (archived and online redo logs) are
managed with RMAN, the related errors during recovery, and the manual intervention needed to
handle these common errors.
It is assumed that the reader is familiar with RMAN and has consolidated recovery knowledge.
This article is laid out as follows:
Part I Explanation for the need for manual intervention on recover
Part II Case studies and error explanation
The following scenarios and related errors are analyzed and explained in this article:
Case 1: Some archived log files are NOT BACKED UP, NOT CATALOGED, but are ON DISK
RMAN-03013: command type: recover
RMAN-20000: abnormal termination of job step
RMAN-06054: media recovery requesting unknown log: thread 1 scn 822898
Case 2: Some archived log files are NOT BACKED UP, are CATALOGED, but are NOT ON DISK
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of log thread 1 seq 6 scn 843036 found to restore
Case 3: Online redo logs that are not current are lost, but CATALOGED archived logs with the same
seq# are ON DISK.
Case 4: Only the current redo log is lost.
Part III Sample RMAN scripts for backup and recover used in the tests
Part IV 9i enhancements related to automatic recovery with RMAN
Part I Explanation for the need for manual intervention on recover

How RMAN manages the archived log files and the implications for recovery
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section also explains the RMAN commands: RESYNC and CATALOG ARCHIVELOG.
During recovery, RMAN (up to version 8.1.7), does not scan the disk (as may be expected), to
automatically search for unknown archived log files. The destination of the archived log files
is scanned only to identify the known archived log files. We will try to explain this mechanism,
where archived log files are 'known' and where they are not.
RMAN relies on two information sources (repositories) which are used in following order:
1. the information recorded in the current controlfile
2. the information recorded in the RMAN catalog at the time of media failure
The most up-to-date information about files belonging to the database is recorded automatically
only in the CURRENT CONTROLFILE immediately after log sequence X was archived on disk.
After completion, a new record is added to the controlfile to protocol this action
(we could say the controlfile 'knows' immediately about the archived log file).
At the time the log was archived, the RMAN catalog has no 'knowledge' about this file, nor
about all other files archived after that and available on disk.
The information is transferred from the CURRENT CONTROLFILE in the RMAN CATALOG only
if the DBA is starting RMAN, connecting to the database and to the RMAN catalog and runing the RMAN
command 'RESYNC' (or every other RMAN command that would do an implicit 'RESYNC').
There is no process implemented in the database that does this automatically. The command for
doing a manual complete RESYNC is:
eg: RMAN> resync catalog;

(for more information about the RESYNC command please see documentation)
eg: before next RESYNC after RESYNC
arch seq# : 3 4 5 6 7 8 9 3 4 5 6 7 8 9
on disk : |-----------------| |-----------------|

recorded in controlfile : |-----------|-----| |-----------------|
| information from controlfile
V |
NOT CATALOGED ('unknown') V
transfered to catalog
recorded in catalog : |--------| |-----------------|
| |
CATALOGED CATALOGED ('known')
NOTE: Archived log files that are recorded in the catalog are called
CATALOGED archived log files; only CATALOGED files are 'known' to RMAN
That means there is only one scenario where the CATALOG has the most up-to-date
information about the log files that were archived on disk; if after every log switch
RMAN is started and a RESYNC is done! In all other situations, only the current database
controlfile has the current information about the log sequences that are archived on disk.
This has some implications on the recovery process when the current controlfile is lost.
RMAN limitations regarding automatic recovery of archived log files

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a crash happens between the RESYNC operation, some archived log files available on disk
are NOT CATALOGED. If the CURRENT CONTROLFILE is also lost after a crash, the possibility
to transfer the information about the archived logs available on disk into the CATALOG using
RESYNC (manually or implicit) is gone. A BACKUP CONTROLFILE does not have this current
information. As already explained, during recovery RMAN until 9.0.0.0 does not scan the
archive log destinations to search for 'unknown' archived log files and to 'catalog' the ones
found automatically.
In this case, recovery will fail, and manual intervention is needed. In additional to
RESYNC there is another RMAN command that records the information about the archive log
files available on disk in the RMAN CATALOG. This command is 'catalog archivelog'.
DBAs need to run this command manually for each UNCATALOGED archived log file available
on disk and needed for recovery.
We can illustrate this with following example:
RMAN> catalog archivelog 'D:\ARC00007.001';

(for more informations about CATALOG command please see documentation)
situation on disk after after manual intervention

crash: RMAN> CATALOG archivelog ...
arch seq# : 3 4 5 6 7 8 9 3 4 5 6 7 8 9
on disk : |-----------|-----| |-----------------|
recorded in catalog: |--------| | |-----------------|
| | |
CATALOGED NOT CATALOGED CATALOGED ARCH FILES
RMAN RECOVERY stops
at seq# 6 and errors ------>|
after manual 'catalog archivelog'

RMAN RECOVERY can apply all archived log ---------------->|
files available on disk if needed for recovery
Manual intervention is not needed when recovery is started using the

CURRENT CONTROLFILE, because an implicit RESYNC is done on recover. This way
all 'unknown' archived log file available on disk are automatically cataloged.
It is important to understand that RMAN basically works only with CATALOGED ('known')
log files. This is a little different to the way recovery is done with server manager. This
is why there is an explanation demand for this issue.
NOTE: Always catalog all archived logs available on disk before starting recovery using
backup controlfile.
How RMAN manages the ONLINE log files and the implications for recovery
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Regarding the management of the online redo logs on recovery, RMAN can automatize the process in some
situations, more than server manager can do.
The location and the names of the online redo logs is known by RMAN. This location in recorded in
the catalog in the RC_REDO_LOGS view. As they are reused in recycled order, their sequences change
after each log switch. From the RMAN point of view, the seq# for the online redo logs is important to
be known only in recovery situations. The need to RESYNC is done ONLY if a new redo log member is
added to the database.
Thus, in recovery situations, you need to catalog all online redo logs ONLY if a failure occurs
after a new online redo log was added, and no RESYNC was done in between. In this case the new redo
log is completely 'unknown' to RMAN. Hence, there is a need for manual intervention in
the same way as for the 'unknown' archived redo logs. We need to catalog the 'unknown' online redo
logs needed for recovery.
This is an exceptional case, but is very important.
In some recovery situations, RMAN searches for the 'known' online redo logs in the log destination on
disk,
and records the seq# of the all redo logs found. In this way, RMAN 'knows' about the seq# of the
redo logs available on disk after system failure, and can pass them to the recovery process when the
related sequence is needed.
In other words, RMAN can catalog the available online redo logs automaticaly by auto-inspecting the
log destination during recovery.
NOTE: This is not the way RMAN handles the archived redo logs at the moment.
Furthermore, if the online redo log searched for, cannot be found on disk, but the archived redo log
with
the same seq# is cataloged and available on disk, RMAN is able to apply the archived log file instead of
the missing redo log for the requested seq#. (see illustrations below)
On the other hand, if an online redo log is not found on disk, and no archived log file for this
seq# exists, RMAN will report it as UNKNOWN. So, a cataloged, known online redo log
can become 'unknown' during recovery, if it cannot be found on disk, and no archived log that could
replace it, exists. This mainly happens when the CURRENT redo log file is lost and is requested
for recovery.
To show the way RMAN handles the online redo logs on recovery we need to analyze two
situations: (1) recovery using current controlfile and (2) recovery using backup controlfile.
Because the application of online redo logs is mainly requested on complete recovery we will illustrate
this situation.
We assume that all ARCHIVED REDO LOGS are CATALOGED and available on disk.
situation on disk after crash
NOTE: For seq# 8 and 9 there are two versions of logs on disk:
the online log and the archived log for each seq#
log seq# : 3 4 5 6 7 8 9 10 11 12
ARCHIVED seq# on disk : |--------------------------|

ONLINE redo logs on disk : |-------------|
V
CURRENT LOG:seq#10
logs recorded in catalog : |--------------------------|
| |-------------|
V V
CATALOGED: ARCHIVED LOGS ONLINE LOGS
RMAN limitations regarding automatic recovery of online redo log files

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The limitations in this area depend on following situations:
COMPLETE RECOVERY using CURRENT CONTROLFILE

3 4 5 6 7 8 9 10 11 seq# archived
applies all CATALOGED ARCH LOGS up to the last ------------>\
seq before the oldest ONLINE log, here seq# 8 \ 9 10 12 seq# online
then it requests all ONLINE logs \-------------->|
|
V
IMPLICATION of this behaviour: online redo seq# 11 is lost
so, recovery stops with an ERROR
In the case above, RMAN does not auto-inspect the log destination to search for the online
redo logs and catalog their seq#. The recovery process will always request the online
redo logs and not the archived log with the same seq#.
This behaviour results in recovery being aborted when one of the online redo logs
is not available, even if the archived version of this log exists and is 'known' to RMAN.
COMPLETE RECOVERY using BACKUP CONTROLFILE

3 4 5 6 7 8 9 10 11 seq# archived
applies all CATALOGED ARCH LOGS up to the last ------------>\ /--\
sequence before the oldest ONLINE log FOUND ON DISK, \-------/ \->|
then switches to the available log with the 9 10 12 seq# online
sequence choosing between online and archived logs |
V
IMPLICATION of this behaviour: online redo seq# 11 is lost
arch log seq# 11 is applied instead
recovery completes successfully
In this case, RMAN auto-inspects the log destination to search for the online
redo logs and automaticaly catalogs the seq# of the logs found.
If the online redo log searched for, cannot be found on disk, but the archived redo log with
the same seq# is cataloged and available on disk, RMAN is able to apply the archived
log file instead of the missing redo log for the requested seq#.
Here, RMAN is automatizing recovery as far as possible and is more proficient than server
manager.
NOTE: be aware that you can use the automatism RMAN has regarding the online log files
on recover only if you start recover using backup controlfile
Situations that need to be handled and what to check to identify gaps in the log sequence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before starting recovery we need to evaluate the situation of the log files available on disk,
after system failure.
We can have the following general situations and related common RMAN errors.
NOTE: We do not need to be concerned with the backed up files. They can be restored from the backup.
eg: 1.
log seq# : 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ON DISK: ARCH : |------------------------------------------|
ONLINE logs: |-----------|
BACKED UP : |-----|
CATALOGED : |---------------------| |-----------|
|<---------------------->|
|
NOT BACKED UP,NOT CATALOGED, ON DISK
|
V
These are 'unknown' logs and RMAN cannot recover them, fails with:
RMAN-06054: media recovery requesting unknown log
SOLUTION: CATALOG archived logs from seq# 11 to seq# 16

Start COMPLETE RECOVERY
eg: 2.
log seq# : 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ON DISK: ARCH : |-----| GAP on disk |----------------------------|

ONLINE logs : |-----------|
BACKED UP : |-----|
CATALOGED : |------------------------------------| |-----------|
|<------>|
|
NOT BACKED UP,CATALOGED, NOT ON DISK
|
V
These logs are 'missing' and RMAN searches a backup of them,
and as none can be found, recovery will fail with:
NOTE: RMAN can identify gaps in the sequence of the 'known' archived log files
SOLUTION: In this case we have a gap in the seq# of the archived log files.
The state of the archived log files after that first gap is irrelevant, because
we only cannot recover over it.
Start INCOMPLETE RECOVERY UNTIL SEQUENCE 9
If you want to automatize recovery, you first need to evaluate the situation on disk. You need to
search for gaps in the sequence of the archived log files and online log files
before starting recovery.
eg: 3.
archived logs : in backup on disk

|-----| |---------------------------------------------|
log seq# : 3 4 5 6 7 8 9 10 11 12 14 15 17
13 14 16
|------------------------------|
online redo logs V
current
|----------------------| |------------------| |
| | |
V V V
Search for gaps: gap here? gaps in this area? gap in this area?
you need to check if |
at least one version ????? |
arch log file or redo log file |
for each seq# exists |
if there is one sequence where |
both are lost, you have a gap |
Gap found: |
1. catalog all archived logs up to the gap |
2. start incomplete recovery with or without backup controlfile |
|
at seq 11 --- recover until seq 11 ----->| |
at seq#15 --- recover until seq 15 ----------------------------->| V
at seq#17 --- recover until seq 17 ----------------------------------->| all current logs lost
No gap found, but some online logs are lost

1. first catalog all archived logs on disk
2. start complete recovery using backup controlfile -------------------------------->|
^^^^^|^^^^^^
V
As discussed above, RMAN takes the archived log
if no online log found for the needed seq#
Doing so, you can use the RMAN automatism in this situation
NOTE:The current controlfile needs to be saved before
Identifying the gaps means you need to find the sequence numbers of the archived logs in
the backup, the sequence numbers of the archived logs on disk, and the sequence numbers of
the redo log files that were current before the media failure occured.
The sequence numbers of the BACKED UP archived log files can be found in the RMAN catalog.
The sequence numbers of the CATALOGED archived log files can also be found in the RMAN catalog.
The sequence numbers of the archived log files on disk are retrieved inspecting all archived
log destinations.
The available online redo log files are retrieved by inspecting all log file destinations,
and the related sequences can be found by querying the controlfile views or from the alert
log. If the current controlfile is available you can mount it and join V$LOG and V$LOGFILE
to find out the sequences of the online log files.
If you have lost the current controlfile you cannot query the database before recover.
The only way to get this information is to scan the alert log from bottom up
to find the last group of log switches (in order to see last completed log switch) for all members
of the redo log groups you have.
eg: 4.
example of entries if you have 3 redo log groups (one member in each group)
Thread 1 opened at log sequence 20
Current log# 1 seq# 20 mem# 0: D:\816\ORADATA\ORA816\REDO03.LOG
... ---> oldest online redo log :seq# 20
Tue Mar 27 20:35:59 2001
Thread 1 advanced to log sequence 21
... ---> next online redo log :seq# 21
Tue Mar 27 20:36:15 2001
Thread 1 advanced to log sequence 22
---> CURRENT redo log: seq# 22
Tue Mar 27 20:36:15 2001
ARCH: Beginning to archive log# 2 seq# 21
ARCH: Completed archiving log# 2 seq# 21
---> last ARCHIVED log: seq# 21
ARCHIVED logs : 15 16 17 18 19 20 21
seq# --------------------------------|
ONLINE redo logs: 20 21 22
|-----------|
V | V
oldest redo<=REDO03.LOG | REDO01.LOG=>CURRENT
V
next redo<=REDO02.LOG
@ another possibility is the scan the redo log file headers, but this is not for customer.
Part II Case analysis and error explanation

This section reproduces the error situations described in Part I, using worked examples.
We use simple queries on the RMAN catalog and inspect the log file destinations using OS commands
to evaluate the gaps in the sequence numbers. All steps are performed manually.
The steps performed in the analyze of each case are:
1. Collect the informations you need about the archived and online redo log files.
1.1 Find the database ID and the current databse INCARNATION (needed to scan the catalog)
1.2 Find the seq# of the CATALOGED archived log files
1.3 Find the seq# of the BACKED UP archived log files
1.4 Find the seq# of the archived log files available on disk
1.5 Find the seq# of the online redo logs availabe on disk
2. Evaluate the collected information
3. Explain the errors on recovery and interpret the RMAN errorstack
4. Handle the reproduced error accordingly
5. Present the solutions for the analyzed case
Case 1
======
Some archived log files are NOT BACKED UP, NOT CATALOGED, but are ON DISK
Case 2
======
Some archived log files are NOT BACKED UP, are CATALOGED, but are NOT ON DISK
Case 3
======
Online redo logs that are not current are lost, but CATALOGED archived logs with the same seq# are ON
DISK.
Case 4
======
Only the current redo log is lost.
Case 1
======
Some archived log files are NOT BACKED UP, are NOT CATALOGED, but are available ON DISK
This situation occurs primarily when you lose the current controlfile.
During recovery using backup controlfile the following errors can be raised:

Below is the worked example that explains the situation for this error, how to evaluate and solve it.
1. Collect the information you need to evaluate the situation

1.1 Find the database ID and the current database INCARNATION
Query RMAN catalog:

svrmgrl>select * from rc_database_incarnation;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_ RESETLOGS CUR PARENT_DBI
---------- ---------- ---------- -------- ---------- --------- --- ----------
1 1519956463 12 UNKNOWN 782197 14-DEC-00 NO
1 1519956463 2 ORA816 782306 14-DEC-00 YES
We only have one database registered in the RMAN catalog.

The current incarnation is DBINC_KEY = 2
Query the RMAN catalog:

svrmgrl> select i.DBID,a.DB_KEY,a.DBINC_KEY,a.DB_NAME,SEQUENCE#,a.FIRST_CHANGE#,
a.NEXT_CHANGE#,a.COMPLETION_TIME,a.STATUS
from RC_ARCHIVED_LOG a, rc_database_incarnation i
where a.DBINC_KEY = i.DBINC_KEY and i.CURRENT_INCARNATION='YES' and i.DBID=1519956463
order by SEQUENCE#;
DB_KEY DBINC_KEY DB_NAME SEQUENCE# FIRST_CHAN NEXT_CHANG COMPLETIO
---------- ---------- -------- ---------- ---------- ---------- ---------
1 2 ORA816 13 802821 802825 27-MAR-01
1 2 ORA816 14 802825 802828 27-MAR-01
1 2 ORA816 15 802828 802831 27-MAR-01
1 2 ORA816 16 802831 802834 27-MAR-01
1 2 ORA816 17 802834 802874 27-MAR-01
1 2 ORA816 18 802874 822898 27-MAR-01
The last cataloged archivelog has seq# 18

(we assume that all archived log files needed to make the last backup consistent were backed up)
Query RMAN catalog:

svrmgrl> select i.DBID,b.DB_KEY,b.DBINC_KEY,b.DB_NAME,SEQUENCE#,b.FIRST_CHANGE#,
b.NEXT_CHANGE#,b.COMPLETION_TIME,b.STATUS
from RC_BACKUP_REDOLOG b, rc_database_incarnation i
where b.DBINC_KEY = i.DBINC_KEY and i.CURRENT_INCARNATION='YES' and i.DBID=1519956463
order by SEQUENCE#;
DB_KEY DBINC_KEY DB_NAME SEQUENCE# FIRST_CHAN NEXT_CHANG COMPLETIO
---------- ---------- -------- ---------- ---------- ---------- ---------
1 2 ORA816 1 782306 802355 26-FEB-01
1 2 ORA816 2 802355 802429 26-FEB-01
1 2 ORA816 3 802429 802431 26-FEB-01
1 2 ORA816 4 802431 802483 26-FEB-01
1 2 ORA816 5 802483 802488 26-FEB-01
1 2 ORA816 6 802488 802787 26-MAR-01
1 2 ORA816 7 802787 802789 26-MAR-01
1 2 ORA816 8 802789 802794 26-MAR-01
1 2 ORA816 9 802794 802812 27-MAR-01
1 2 ORA816 10 802812 802817 27-MAR-01
1 2 ORA816 11 802817 802819 27-MAR-01
1 2 ORA816 12 802819 802821 27-MAR-01
1 2 ORA816 13 802821 802825 27-MAR-01
1 2 ORA816 14 802825 802828 27-MAR-01
1 2 ORA816 15 802828 802831 27-MAR-01
The last backed up archivelog has seq# 15
Inspect all archived log destinations and search for possible gaps in the sequence numbers
D:\816\ORADATA\ora816\archive>ls -lrt
-rw-rw-rw- 1 user group 1024 Mar 27 19:26 ARC00013.001 --> first seq# on disk
-rw-rw-rw- 1 user group 1024 Mar 27 19:26 ARC00014.001
-rw-rw-rw- 1 user group 1024 Mar 27 19:26 ARC00015.001 --> last seq# backed up
==> the first seq# on disk is seq# 13

==> the last seq# on disk is seq# 21
==>there are no gaps in the sequence numbers on disk
==> last backed up seq# was 15

==>there are no gaps in the seq# between last backed up seq# and first seq# on disk
Inspect all online redo log destinations

D:\816\ORADATA\ora816>ls -l|grep REDO
-rw-rw-rw- 1 user group 1049088 Mar 27 20:36 REDO01.LOG
We have 3 redo log groups (each having one member)

==> all redo log members can be found on disk
==> if the last sequence number of the archived logs available on disk was the last one
successfully archived before the crash, then the seq# for the current logfile
should be seq# 22
==> we check the alert log file as described in Part I
REDO03.LOG -> seq# 20
==> COMPLETE RECOVERY CAN BE DONE!!!
2. Evaluate this information to be able to understand the recovery errors
Above we found following situation on disk after crash:
* last backed up seq# was 15

* last cataloged archivelog was seq# 18
* last archived log on disk was seq# 21
* no gaps found
* evaluation:
* seq# 13 14 15 -> in backupset (RC_BACKUP_REDOLOG)
* seq# 13 14 15 16 17 18 -> CATALOGED archived log (RC_ARCHIVED_LOG)
* seq# 13 14 15 16 17 18 19 20 21 -> archived on disk
* seq# 20 21 22 -> CATALOGED online redo logs on disk (RC_REDO_LOG)
* ^^ sequence 19 is UNKNOWN to RMAN
* ^^ sequence 19 is on disk, but not CATALOGED and NOT BACKED UP
* ^^ sequence 19 is recorded only in the current controlfile
In this scenario, a complete recovery is possible because all data needed is available on disk or
can be restored from the backup. But using RMAN this can be done only if recovery is started using
the current controfile. If after the crash the current controlfile was lost, you need to restore
a backup controlfile. If you start complete recovery using backup controlfile, RMAN will
fail with following error:
RMAN>run {
allocate channel d1 type disk;
restore controlfile;
restore database;
sql 'alter database mount';
recover database;
sql 'alter database open resetlogs';
}
**** interpreting the RMAN errorstack (read from bottom up)
**** interpreting the related message before the stack and the real error in the stack:
****
RMAN-08060: unable to find archivelog
RMAN-08510: archivelog thread=1 sequence=19
^^^^^^^^^^^
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
^^^^^^^
* ^^ sequence 19 is UNKNOWN to RMAN
* ^^ sequence 19 is on disk, but not CATALOGED and NOT BACKED UP
* ^^ sequence 19 was recorded only in the current controlfile
RMAN recovered up to 'known' log seq# 18 and then errors. Check the alert log to see
how far recovery was done.
As described in Part I, manual intervention is requested if you use a backup controlfile.

You need to catalog the 'unknown' archived log files available on disk:
RMAN> alter database mount;

RMAN> catalog archivelog 'D:\816\ORADATA\ORA816\ARCHIVE\ARC00019.001';
NOTE: The backup controlfile needs to be mounted to be able to catalog the archived log files.
Remember that every action on any file needs to be first recorded in the controlfile.
After this operation you can restart the recovery step.
5. Solutions for Case 1.
The solution depends on this 2 situations:
CURRENT CONTROLFILE USED:
==> RMAN can handle this situation automatically
Start complete recovery using current controlfile
BACKUP CONTROLFILE USED:
==> RMAN up to 8.1.7 cannot handle this situation automatically - see Part V
==> The DBA has to automatize the process by pre-processing RMAN recovery
Start a customized complete recovery using backup controlfile
run {
restore database;

catalog archivelog 'D:\816\ORADATA\ORA816\ARCHIVE\ARC00019.001'; # dynamic coded
recover database;
}
This script could be created dynamically via an OS shell script.
Case 2
======
Some archived log files are NOT BACKED UP, are CATALOGED, but are NOT ON DISK
We can get errors regardless, whether we use a backup controlfile

or the current controlfile on recover.

Below is the worked example that explains the situation for this error, how to evaluate and resolve it.

Query RMAN catalog:

svrmgrl>select * from rc_database_incarnation;
---------- ---------- ---------- -------- ---------- --------- --- ----------
1 1519956463 2 ORA816 782306 14-DEC-00 NO
1 1519956463 12 UNKNOWN 782197 14-DEC-00 NO
1 1519956463 258 ORA816 842983 27-MAR-01 YES 2
We only have one database registered in the RMAN catalog

Query RMAN catalog:

svrmgrl> select i.DBID,a.DB_KEY,a.DBINC_KEY,a.DB_NAME,SEQUENCE#,a.FIRST_CHANGE#,
a.NEXT_CHANGE#,a.COMPLETION_TIME,a.STATUS
from RC_ARCHIVED_LOG a, rc_database_incarnation i
where a.DBINC_KEY = i.DBINC_KEY and i.CURRENT_INCARNATION='YES' and i.DBID=1519956463
order by SEQUENCE#;
DBID DB_KEY DBINC_KEY DB_NAME SEQUENCE# FIRST_CHAN NEXT_CHANG COMPLETIO S
---------- ---------- ---------- -------- ---------- ---------- ---------- --------- -
1519956463 1 258 ORA816 1 842983 843024 28-MAR-01 A
1519956463 1 258 ORA816 2 843024 843027 28-MAR-01 A
1519956463 1 258 ORA816 3 843027 843032 28-MAR-01 A
1519956463 1 258 ORA816 4 843032 843034 28-MAR-01 A
1519956463 1 258 ORA816 5 843034 843036 28-MAR-01 A
1519956463 1 258 ORA816 6 843036 843038 28-MAR-01 A
1519956463 1 258 ORA816 7 843038 843040 28-MAR-01 A
1519956463 1 258 ORA816 8 843040 843042 28-MAR-01 A
The last cataloged archivelog has seq# 8

Query RMAN catalog:

where b.DBINC_KEY = i.DBINC_KEY and i.CURRENT_INCARNATION='YES' and i.DBID=1519956463
order by SEQUENCE#;

---------- ---------- ---------- -------- ---------- ---------- ---------- --------- -
1519956463 1 258 ORA816 1 842983 843024 27-MAR-01 A
1519956463 1 258 ORA816 2 843024 843027 27-MAR-01 A
The last backed up archivelog has seq# 2
Inspect all archived log destinations and search for possible gaps in the sequence numbers
D:\816\ORADATA\ora816\archive>ls -lrt|grep ARC
->ARC00006.001 lost
In this example:
==> the first seq# missing on disk is seq# 6
==>there are gaps in the sequence numbers on disk

==> there are no gaps in the seq# between last backed up seq# and first seq# om disk
==> the missing seq# 6 in not in the backup
==> to see if there is a real gap that cannot be skiped on recover we need to first
check the seq# of the oldest online redo log:
if the seq# of the oldest online redo log is <= seq# 6, than this is not a real gap,
because the online redo logs can be applied, if available!
==> we still need to do next step

Here, we have 3 redo log groups (each having one member)

==> all redo log members can be found on disk
==> we check the alert log file as described in Part I, and find following
REDO01.LOG -> seq# 9 --> current
==> Now we compare the seq# of the missing archived log with the seq# of the oldest
redo log found on disk:
seq# of the oldest online redo log is 7
seq# of the lost archived log is 6
==> seq# of the oldest online redo log is greater than the one of the missing log
Only, at this time can we say that we found a real gap that cannot be skipped on recover.
==> COMPLETE RECOVERY CANNOT BE DONE!!!
==> the last seq# that can be applied is seq# 5
Listed above, we found the following situation on disk after crash:

* last cataloged archivelog was seq# 8
* oldest online redo log on disk was 7
* gap found
* evaluation:
* seq# 1 2 -> in backupset (RC_BACKUP_REDOLOG)
* seq# 1 2 3 4 5 6 7 8 -> CATALOGED archived log (RC_ARCHIVED_LOG)
* seq# 3 4 5 7 8 -> archived on disk
* seq# | 7 8 9 -> CATALOGED online redo logs on disk (RC_REDO_LOG)
* V
* real gap
* seq# 6 -> archived, CATALOGED, BUT MISSING (lost)
* ^^ sequence 6 is KNOWN to RMAN
* ^^ sequence 6 was archived on disk, was CATALOGED but was NOT BACKED UP
* ^^ sequence 6 IS LOST and is the GAP in the sequence numbers
A complete recovery is NOT possible, and the reason is not related to

RMAN limitations. The situation is to be handled in the same way if the current
controlfile is lost or not.
If you start complete recovery RMAN will identify the gap
and fail with the following error. We reproduce this using a backup controlfile, but as mentioned
before the same error is raised using the current controlfile.
RMAN>run {
restore database;
recover database;
}

****
RMAN-03013: command type: recover(4)
^^^^^^^^^^
**** ^^^^^^^
**** ^^ sequence 6 is known to RMAN, this is why RMAN can identify the gap
**** and reports the file as 'missing'
**** ^^ sequence 6 was archived on disk, was CATALOGED, but NOT BACKED UP
**** ^^ RMAN searches for a backup for seq# 6 but cannot find any errors
We check the alert log:

ORA-279 signalled during: alter database recover if needed
start using back...
Wed Mar 28 10:52:26 2001
alter database recover cancel
Recovery was cancelled. RMAN inspected the disk to see if the known archived logs are
available, identified the gap at this step, went to the backup to search if it can be restored,
and because no backup was found for this sequence, aborted recovery.
The only way to handle this is to start INCOMPLETE recovery until the missing sequence
number. We do not need to 'catalog archived logs' because all archived log files up to
the missing sequence are already known to RMAN.
We have the same solution independent of the controlfile used (current or backup controlfile).
The following example uses a backup controlfile, and with the 'set until' clause directs RMAN
to start incomplete recovery - the last log seg# applied will be 5:
run {
SET UNTIL logseq = 6 thread = 1; # dynamic coded
restore database;
recover database;
}
This script could be created dynamically via an OS shell script.
Case 3
======
Online redo logs that are not current are lost, but CATALOGED archived logs with the same seq# are ON
DISK.
We can get the following errors only if we use the current controlfile
RMAN-11001: Oracle Error: ORA-00283: recovery session canceled due to errors

ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: 'D:\816\ORADATA\ORA816\REDO01.LOG'
ORA-27041: unable to open file
OSD-04002: unable to open file
Below is the worked example that explains the situation for this error, how to evaluate and resolve it.

Query RMAN
SVRMGR> select * from rc_database_incarnation;
---------- ---------- ---------- -------- ---------- --------- --- ----------
1 1519956463 2 ORA816 782306 14-DEC-00 NO
1 1519956463 12 UNKNOWN 782197 14-DEC-00 NO
1 1519956463 258 ORA816 842983 27-MAR-01 NO 2
1 1519956463 364 ORA816 843037 28-MAR-01 NO 258
1 1519956463 423 ORA816 843119 30-MAR-01 NO 364
1 1519956463 474 ORA816 843170 30-MAR-01 NO 423
1 1519956463 535 ORA816 863254 30-MAR-01 NO 474
1 1519956463 565 ORA816 883341 30-MAR-01 YES 535
We only have one database registered in the RMAN catalog
This is not needed because we assume all archived logs on disk are cataloged.
We should always catalog all archived logs from disk up to the gap (if any) to simplify
the process of automatization. The errors raised when this is not done are as described in
Case 1.

Query RMAN
where b.DBINC_KEY = i.DBINC_KEY and b.DBINC_KEY=565 and i.DBID=1519956463
order by SEQUENCE#;
---------- ---------- ---------- -------- ---------- ---------- ---------- --------- -
1519956463 1 565 ORA816 1 883341 883379 31-MAR-01 A
1519956463 1 565 ORA816 2 883379 883399 31-MAR-01 A
1519956463 1 565 ORA816 3 883399 883401 31-MAR-01 A
1519956463 1 565 ORA816 4 883401 883403 31-MAR-01 A
1519956463 1 565 ORA816 5 883403 883405 31-MAR-01 A
1519956463 1 565 ORA816 6 883405 883407 31-MAR-01 A
1519956463 1 565 ORA816 7 883407 883409 31-MAR-01 A
1519956463 1 565 ORA816 8 883409 903411 31-MAR-01 A
1519956463 1 565 ORA816 9 903411 903450 31-MAR-01 A
The last backed up archivelog has seq# 9.
Inspect all archived log destinations, search for possible gaps in the sequence numbers
D:\816\ORADATA\ora816\archive>ls -lrt|grep ARC
In this example:
==>there are gaps in the sequence numbers on disk

==> there are no gaps in the seq# between the last backed up seq# and first the seq# on disk
==> we need check if online redo logs are missing
1.5 Find the seq# of the online redo logs available on disk

We have 3 redo log groups (each having one member)

==> REDO02.LOG is missing
==> we check the alert log as described in Part I, and find the seq# for the redo logs
REDO02.LOG -> seq# 13 --> oldest redo log cannot be found on disk
REDO01.LOG -> seq# 15 --> the current log is on disk
==> we check if we have gaps between the oldest available redo log and the last archived log
the seq# of the last archived log is 14 and greater than the seq# of the oldest online redo log
==> we have no gaps here
==> for each missing online redo log we check if a archived log with the same seq# exists
for seq# 13 there is no redo log available on disk, but the archived log with
the same seq# is available on disk
==> the current log is also found on disk
==> we have no real gaps in the seq#
==> COMPLETE RECOVERY CAN BE DONE!!!

In the example listed above we found following situation on disk after crash:

* oldest online redo log on disk was 13
ARCHIVED logs : ... 9 10 11 12 13 14

seq# --------------------------------|
ONLINE redo logs: 14 15
|-----------|
| V
V CURRENT
lost REDO02.LOG seq# 13
We found NO GAPS, but one REDO LOG that is not current and IS MISSING.
There is an archived log file with the same seq# available on disk.
A complete recovery is possible because all data needed is available on disk or

can be restored from the backup. But recover will complete successfully only if it is started using
the backup controfile.
RMAN behaves differently when using a backup controlfile or a current controlfile:
When USING BACKUP CONTROLFILE
RMAN is auto-inspecting the online log destinations searching

for the available online redo logs, and registers the sequences for all of the logs found on disk.
They then become 'known' to RMAN.
RMAN also searches for all cataloged archived logs on disk and checkes if they are available.
This can be seen in the RMAN logfile:
RMAN-03022: compiling command: recover(4)

RMAN-06050: archivelog thread 1 sequence 11 is already on disk as file
D:\816\ORADATA\ORA816\ARCHIVE\ARC00011.001
^^^^ archived version of REDO02.log found
D:\816\ORADATA\ORA816\REDO03.LOG
^^^^ online redo for this seq found
D:\816\ORADATA\ORA816\REDO01.LOG
^^^^ online redo for this seq found
RMAN-03023: executing command: recover(4)
This way RMAN has a list of available files for each sequence and when the recovery
process prompts for the next seq#, RMAN supplies one of the available log files: the online redo
or if this is missing, the archived log file.
Recovery completes successfully.
This behaviour is more proficient than the server manager recover. This proves the automatization
possibilities for RMAN.
When USING CURRENT CONTROLFILE
RMAN is not auto-inspecting the online log destinations.

Only the cataloged archived logs are checked.
This can be seen in the RMAN logfile.
RMAN-03022: compiling command: recover(4)

^^^^ archived version of REDO02.log found
RMAN-03023: executing command: recover(4)
If you start complete recovery using the current controlfile the following errors are raised:
RMAN> run {
restore database;
sql 'alter database mount'; # mount the current controlfile
recover database;
}
RMAN-00571: ===========================================================
RMAN-03002: failure during compilation of command
RMAN-03006: non-retryable error occurred during execution of command: recover(4)
RMAN-07004: unhandled exception during command execution on channel default
RMAN-10032: unhandled exception during execution of job step 1: ORA-00283: recovery session canceled
due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover
logfile 'D:\816\ORADATA\ORA816\ARCHIVE\ARC00012.001'
RMAN does not auto-inspect the log destination to search for the online
redo logs and catalog their seq#. The recovery process will always request the online
redo logs and not the archived log with the same seq#.
The last archived log applied is ARC00012.001. You can check this looking in the alert log.
If you get this error you need to check first if an archived log for seq# 13 is
available on disk. The error does not mean you have a real gap in the sequences.
If you find the archived log, you can restart recovery using server manager and manually
apply this archived log. Another option is to restart RMAN recovery using backup
controlfile as explained before.
RMAN can handle this automatically if you start complete recovery

using the backup controlfile.

The solutions depend on two situations.
CURRENT CONTROLFILE USED:
==> RMAN does not handle this situation automatically

Neither does server manager.
You need to restart recovery from server manager and manually apply the archived log file
instead of the missing online redo log.
BACKUP CONTROLFILE USED:
==> RMAN can handle this situation automatically
Start complete recovery using backup controlfile. RMAN will supply the needed automatism
regarding the application of the available archived log files instead of the missing
online redo log files.
Case 4
======
Only the current redo log is lost.
We do not perform all steps done as in the previous cases because they don't change.
We assume that after all checks were done, we find the following situation on disk
==> the only log files missing is the current log file with seq# 15
==> all logs up to this seq# are archived on disk and cataloged
==> this is the only gap in the log sequence
We need to do incomplete recovery to handle this situation.
The only reason we discuss this case is to explain the errors that would be raised
if you do complete recovery, by mistake.
Again, the errors would be different, depending on the use of the current or backup controlfile.
When starting complete recovery USING BACKUP CONTROLFILE you get following errorstack:
RMAN-08060: unable to find archivelog

RMAN-08510: archivelog thread=1 sequence=15
^^^^^^^^^^^
RMAN-03026: error recovery releasing channel resources
RMAN-08031: released channel: d1
RMAN-00571: ===========================================================
RMAN-00571: ===========================================================
^^^^^^°
NOTE: this is the same error you get when an archived log file is need for recovery
that is NOT BACKED UP and NOT CATALOGED
RMAN searches for the online redo logs, but this

current log file is not found on disk, so RMAN does not know about its existence.
No archived log file for this seq# 15 ever existed, so this seq# is completely
unknown to RMAN. This is the reason for this error.
You get this error for two reasons:

- when an archived log files need for recovery was not cataloged
- when you do complete recovery using backup controlfile and the current log is lost.
When starting complete recovery USING CURRENT CONTROLFILE you get following errorstack:
RMAN-00571: ===========================================================
RMAN-00571: ===========================================================
RMAN-10032: unhandled exception during execution of job step 3: ORA-00283: recovery session canceled
due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover
logfile 'D:\816\ORADATA\ORA816\ARCHIVE\ARC00003.001'
NOTE: This is the same error you get when you do complete recovery using current
controlfile and one of the online redo logs that was not current was lost.
When the current controlfile is used the online redo logs will be applied.
The recovery can complete ONLY if the current redo log is applied.
There is no archived version of this log.
Recovery takes another code path when the current controlfile is used. This is the reason for the
different errors.
Start incomplete recovery until seq# of the missing online redo log file.
Part III Sample RMAN scripts for backup and recover used in the tests
NOTE: in version 8.0.x the 'restore controlfile' command needs to be followed by the
'replicate controlfile' command. In 8.1.x the controlfile is implicitly
replicated with 'restore controlfile'.
INCOMPLETE RECOVERY script using backup controlfile (8.1.x):

run {
SET UNTIL logseq = 6 thread = 1; # step 0 ask for incomplete recovery last seq applied 5
restore controlfile; # step 1 restore a backup controlfile
restore database; # step 2 restore the datafiles
sql 'alter database mount'; # step 3 mount the backup controlfile and recover
recover database;
sql 'alter database open resetlogs';# step 4 when using backup controlfile or set until
}
NOTE:This script will start the recovery process in the background with following command
(pasted from the alert log file):
Wed Mar 28 20:57:18 2001
alter database recover if needed
start until cancel using backup controlfile
^^^^^^^^^^^^ ^^^^^^^^
COMPLETE RECOVERY script using backup controlfile (8.1.x):

run {
restore controlfile; # step 1 restore a backup controlfile
sql 'alter database mount'; # step 3 mount the backup controlfile and recover
recover database;
sql 'alter database open resetlogs';# step 4 when using backup controlfile or set until
}
Wed Mar 28 20:57:18 2001
start using backup controlfile
^^^^^ ^^^^^^
COMPLETE RECOVERY script using current controlfile (same for 8.1.x and 8.0.x):
run {
sql 'alter database mount'; # step 2 mount the current controlfile and recover
recover database;
}
Tue Mar 27 22:05:25 2001
start
^^^^^^ This is similar to svrmgrl> recover database;
BACKUP script for full database backup:
The backup order is very important:

run {
allocate channel ch1 type disk;
backup full database format 'D:\backup_%p_%s_%u.%d'; # step 1 includes current controlfile
sql 'alter system archive log current'; # step 2. backup the archived
log files needed to make above backup consistent
backup archivelog all delete input format 'D:\backup\al_backup_%p_%s_%u.%d';
}
Part IV 9i enhancements related to automatic recovery with RMAN
In 9i, on complete recovery RMAN will auto-inspect all known archived log destinations, catalog
the archivelogs found and continue with recovery. The fix doesn't care about backup/current controlfile.
If a archivelog is not found in controlfile, then it auto-inspects and catalogs it.
This enhancement request was reported in [BUG:1456351].
This auto-inspection is done in order to eliminate manual intervention regarding the uncataloged
archived log files available on disk. This is not needed when the current controlfile
is used (as on recover an implicit resync can be done that catalogs all the archived logs).
Furthermore, when using backup controlfile, RMAN is able to auto-inspect the online
log destinations to look for missing online redo logs, and can apply the archived logs instead.
So it seems that complete recovery using backup controlfile is 'fully automatized' in 9i.
This enhancement request was reported in new [BUG:1749049].
RELATED DOCUMENTS
~~~~~~~~~~~~~~~~~
[NOTE:110160.1] RMAN-06026 RMAN-06025 restore archivelogs seperately
[NOTE:94213.1] RMAN-6025 RMAN-6026 During Restoration of Archive Logs
[NOTE:100565.1] RMAN-6026 RMAN-6023 during restore
[NOTE:108883.1] RMAN-6023 when duplicating a database

Redol Log File Management

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Redol Log File Management

Загружено:

Авторское право:

Доступные форматы

PURPOSE

SCOPE & APPLICATION

This article is laid out as follows:

Part I Explanation for the need for manual intervention on recover

Part II Case studies and error explanation

Case 4: Only the current redo log is lost.

Part IV 9i enhancements related to automatic recovery with RMAN

Part I Explanation for the need for manual intervention on recover

eg: RMAN> resync catalog;

eg: before next RESYNC after RESYNC

on disk : |-----------------| |-----------------|

RMAN limitations regarding automatic recovery of archived log files

We can illustrate this with following example:

RMAN> catalog archivelog 'D:\ARC00007.001';

situation on disk after after manual intervention

after manual 'catalog archivelog'

Manual intervention is not needed when recovery is started using the

situation on disk after crash

ARCHIVED seq# on disk : |--------------------------|

RMAN limitations regarding automatic recovery of online redo log files

The limitations in this area depend on following situations:

COMPLETE RECOVERY using CURRENT CONTROLFILE

COMPLETE RECOVERY using BACKUP CONTROLFILE

SOLUTION: CATALOG archived logs from seq# 11 to seq# 16

ON DISK: ARCH : |-----| GAP on disk |----------------------------|

archived logs : in backup on disk

No gap found, but some online logs are lost

Part II Case analysis and error explanation

The steps performed in the analyze of each case are:

2. Evaluate the collected information

3. Explain the errors on recovery and interpret the RMAN errorstack

4. Handle the reproduced error accordingly

5. Present the solutions for the analyzed case

RMAN-03013: command type: recover

1. Collect the information you need to evaluate the situation

Query RMAN catalog:

We only have one database registered in the RMAN catalog.

1.2 Find the seq# of the CATALOGED archived log files

Query the RMAN catalog:

1.3 Find the seq# of the BACKED UP archived log files

Query RMAN catalog:

The last backed up archivelog has seq# 15

==> the first seq# on disk is seq# 13

==> last backed up seq# was 15

Inspect all online redo log destinations

We have 3 redo log groups (each having one member)

==> COMPLETE RECOVERY CAN BE DONE!!!

2. Evaluate this information to be able to understand the recovery errors

Above we found following situation on disk after crash:

* last backed up seq# was 15

3. Explain the errors on recovery and interpret the RMAN errorstack

As described in Part I, manual intervention is requested if you use a backup controlfile.

RMAN> alter database mount;

After this operation you can restart the recovery step.

5. Solutions for Case 1.

The solution depends on this 2 situations:

CURRENT CONTROLFILE USED:

==> RMAN can handle this situation automatically

Start complete recovery using current controlfile

BACKUP CONTROLFILE USED:

Start a customized complete recovery using backup controlfile

sql 'alter database mount';