Вы находитесь на странице: 1из 2

Hands-on Oracle: Backup & Recovery Games DatabaseJournal.

com

1 of 2

http://www.databasejournal.com/features/oracle/article.php/3732711/...

Featured Database Articles

Oracle
Posted Mar 10, 2008

Hands-on Oracle: Backup & Recovery Games


By Steve Callan
Games in Oracle? No, the games reference here is like the games in Olympic Games. The athletes competing in games at that level are
serious about what they do, and you can pretty much bet the farm they are passionate about their sport or event as well. As an Oracle DBA,
how serious and passionate are you about your job? Is job even the right word here? For serious practitioners, profession is probably a
better choice. The games or events were going to play, more specifically, what we learn from them, in this article are practical and necessary
skills a DBA should have at his or her disposal.

Loss of one or more special files


Special files in our world are the following: control files, online redo logs, archived redo logs, and datafiles. Lets say you have several control
files, identified (of course) in the initialization parameter file (Ill use a pfile for this article for quicker editing). Youve read about it a million
times: if you lose a control file, the recovery or fix consists of copying the (or a) good one into the errant files location, or remove the
reference to the file from the parameter file. After a control file is lost (for whatever reason), what happens to the database?
1. Operations can still take place, having other control files online means you can survive the loss of one or more control files
2. You (the DBA) have to shutdown, restore the file (or remove it from the init.ora file) and startup
3. Oracle shuts down/the instance is terminated (i.e., you have no control over that; its happening whether or not you like it)
The reason I ask is this: what is the difference between control files and online redo logs in terms of multiplexing them? For instance, what
happens to the database (or instance) if a member of the active group is lost? Does Oracle keep going, or do things come to a screeching halt?
In the following experiment, Ill place control03.ctl onto a flash drive, bounce the instance, and then pull the flash drive out of the USB port to
simulate media failure.
Current listing of control files

*.control_files='D:\oracle\product\10.2.0/oradata/db10/\control01.ctl',
'D:\oracle\product\10.2.0/oradata/db10/\control02.ctl',
'D:\oracle\product\10.2.0/oradata/db10/\control03.ctl'

New listing of control files

*.control_files='D:\oracle\product\10.2.0/oradata/db10/\control01.ctl',
'D:\oracle\product\10.2.0/oradata/db10/\control02.ctl',
'F:\oracle\control03.ctl'

Confirmation of the new control file:


SQL> startup
ORACLE instance started.
Total System Global Area 289406976 bytes
Fixed Size
1290184 bytes
Variable Size
104857656 bytes
Database Buffers
176160768 bytes
Redo Buffers
7098368 bytes
Database mounted.
Database opened.
SQL> col name for a52
SQL> select name from v$controlfile;
NAME
---------------------------------------------------D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\CONTROL01.CTL
D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\CONTROL02.CTL
F:\ORACLE\CONTROL03.CTL

Now Im ready to disconnect the flash drive. With no direct warning (nothing raised to the user interface until an operation is attempted), the
instance crashes.
SQL> select name from v$controlfile;
select name from v$controlfile
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

The alert log shows the following (plus much more; this is the relevant part):
Thu Mar 06 12:41:15 2008
Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_ckpt_2756.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: 'F:\ORACLE\CONTROL03.CTL'
ORA-27072: File I/O error
OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 1006) The volume for a file has been externally altered so that the opened file is no longer valid.
some more messages
Thu Mar 06 12:41:29 2008
Instance terminated by CKPT, pid = 2756

The fix part is easy, well known, and well documented elsewhere, but were you absolutely certain what actually happens when a control file is
lost?
Fix the problem, reconnect and startup

SQL> prompt Remove the reference to control03


SQL> startup
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
SQL> conn sys/oracle as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 289406976 bytes
Fixed Size
1290184 bytes
Variable Size
104857656 bytes
Database Buffers
176160768 bytes
Redo Buffers
7098368 bytes
Database mounted.
Database opened.
SQL>

Coming back to online redo log files, is the loss of a member survivable, that is, does the instance crash like it did for loss of a control file? The
answer is: it depends. A group can have one or more members. Lets look at #members = 1. Like the previous example, Ill create a group with
a member on a flash drive, and make that group the active group (what is the meaning of STATUS for the values of ACTIVE, CURRENT,
INACTIVE and UNUSED?). Once the group is active, Ill pull the plug. More than one way to do this, but what is the current redo log group?
From the alert log:
Extract from the alert log

Thu Mar 06 13:22:11 2008


Thread 1 advanced to log sequence 211

29-12-2015 17:34

Hands-on Oracle: Backup & Recovery Games DatabaseJournal.com

2 of 2

http://www.databasejournal.com/features/oracle/article.php/3732711/...

Extract from the alert log

Thu Mar 06 13:22:11 2008


Thread 1 advanced to log sequence 211
Current log# 10 seq# 211 mem# 0: F:\ORACLE\REDO10.LOG

Just like the loss of a control file, the instance is terminated.


SQL> select * from v$logfile;
GROUP# STATUS TYPE
MEMBER
---------- ------- ------- -----------------------------------------------7
ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO07.LOG
8
ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO08.LOG
10
ONLINE F:\ORACLE\REDO10.LOG
9
ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO09.LOG
SQL> select * from v$logfile;
select * from v$logfile
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

In this case, it was LGWR that was responsible for terminating the instance. Note that it was CKPT in the other example. What gets recorded
in the control file? The SCN. Which process stamps the control file with the SCN? The checkpoint process.
If trying this at home, running the database is noarchivelog mode, and using only one member in the current group, what will it take to be able
to open the database again?
Had that group been multiplexed, and the same file was pulled from the system, what happens? Life goes on, and the alert log is written to
with a message about the missing member. Lets add a second member to group 10, placing it on a different (i.e., permanent) drive, and then
pull the flash drive again (that should be a hint about what it takes to open the database).
Here is what the alert log shows:
Thu Mar 06 14:51:45 2008
Thread 1 advanced to log sequence 216
Current log# 10 seq# 216 mem# 0: F:\ORACLE\REDO10.LOG
Current log# 10 seq# 216 mem# 1: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO11.LOG
Thu Mar 06 14:53:36 2008
Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2224.trc:
ORA-00345: redo log write error block 23 count 2
ORA-00312: online log 10 thread 1: 'F:\ORACLE\REDO10.LOG'
ORA-27072: File I/O error
OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 1006) The volume for a file has been externally altered so that the opened file is no longer valid.
Thu Mar 06 14:53:36 2008
Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2224.trc:
ORA-00343: too many errors, log member closed
ORA-00346: log member marked as STALE
ORA-00312: online log 10 thread 1: 'F:\ORACLE\REDO10.LOG'

The best part about the messages is the lack of two words: Instance terminated. After a shutdown and startup, the alert log will still complain
about the missing file, but the bottom line is that the database will be open.
Thu Mar 06 14:56:51 2008
ALTER DATABASE OPEN
Thu Mar 06 14:56:52 2008
Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2288.trc:
ORA-00313: open failed for members of log group 10 of thread 1
Thu Mar 06 14:56:52 2008
Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2288.trc:
ORA-00313: open failed for members of log group 10 of thread 1
Thu Mar 06 14:56:52 2008
Thread 1 opened at log sequence 216
Current log# 10 seq# 216 mem# 1: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO11.LOG
Successful open of redo thread 1
some other messages
Thu Mar 06 14:56:59 2008
Completed: ALTER DATABASE OPEN

Loss of datafiles
So far, recovering from loss of a control file or redo log file is pretty close to losing no data. The single member redo log group example is a
case where data loss can take place. With respect to datafiles, there are two kinds: the ones that matter to you (your data) and the ones that
matter to Oracle (its data). To clarify this categorization, here are two questions:
1. What happens if one your tablespaces loses a datafile?
2. What happens if the SYSTEM tablespace, as an example of an owned-by-Oracle tablespace, loses a datafile?
If you answered those questions as is, what was your frame of reference with respect to archiving? If you didnt consider archiving when
formulating your answers, then you stand a 50-50% chance of having been wrong.
If operating in NOARCHIVELOG mode, and you lose a datafile any datafile its game over, instance terminates. If operating in
ARCHIVELOG mode, then the answers to the two questions above are different: loss of your datafile means the instance keeps going, loss of
an Oracle specific datafile means the instance is terminated. The section titled Recovering After the Loss of Datafiles: Scenarios in the
Backup and Recovery Advanced Users Guide spells this out quite nicely.

In Closing
Pretty much all of the backup and recovery scenarios related to loss of special files can be replicated on a PC using a very inexpensive flash
drive. Whether on an AIX 5300L gazillion CPU machine with a gazillion megabytes of RAM or on your personal desktop/laptop that just
barely runs Oracle, the principles are virtually the same. Give it a try and see for yourself. These are drills best practiced without the pressure of
having to do these things when your instance is crashed and time is money.

29-12-2015 17:34

Вам также может понравиться