Академический Документы
Профессиональный Документы
Культура Документы
对于该错误,网上的解决方法也很多,可惜都不管用。这种情况之下,往往都是需要强制打开
数据库的,首先需要做一个不完全恢复,如下:
SQL> recover database
ORA-00279: change 236912204 generated at 09/29/2015 12:49:13 needed for thread 1
ORA-00289: suggestion : /xxxx/1_5112_877094801.dbf
ORA-00280: change 236912204 for thread 1 is in sequence #5112
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log '/xxxx/1_5112_877094801.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
在进行相关操作之后,我备份了一下当前的控制文件信息,便于后面如果有问题,方便处理。
强制 open 的过程中,发现报如下错误:
Sat Oct 3 11:49:31 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sat Oct 3 11:49:33 2015
Errors in file /oracle/oracle/admin/cwdb/udump/cwdb1_ora_6029586.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [1], [18446744072394632417], [1],
[18446744072392296306], [], [], []
Sat Oct 3 11:49:34 2015
Errors in file /oracle/oracle/admin/xxxx/udump/xxxx1_ora_6029586.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [1], [18446744072394632417], [1],
[18446744072392296306], [], [], []
Sat Oct 3 11:49:34 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 6029586
这个错误已经处理过多次了。同样,百度一下,会发现很多人都写过相关的文章,包括 Oracle
mos 的文章解释也是说这是临时块的 scn 过大导致,通过 drop tempfile 即可绕过该问题。实际
上,这种情况之下,根本不会起作用。但是不管如何,这个问题很明显都是跟 block 的 scn 有
关系。既然是跟 scn 有关系,那么处理就不难了,通过推进 scn 即可。
通过推进 scn 之后,再次 open resetlogs 成功打开数据库,可惜的是 alert log 报了一堆错误,如
下所示:
Sat Oct 3 13:10:34 2015
Errors in file /oracle/oracle/admin/xxxx/bdump/xxxx1_smon_10420246.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 40
Sat Oct 3 13:10:35 2015
ORACLE Instance xxxx1 (pid = 25) - Error 600 encountered while recovering transaction (23, 85).
Sat Oct 3 13:10:35 2015
Errors in file /oracle/oracle/admin/xxxx/bdump/xxxx1_smon_10420246.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sat Oct 3 13:10:35 2015
Trace dumping is performing id=[cdmp_20151003131035]
Sat Oct 3 13:10:35 2015
replication_dependency_tracking turned off (no async multimaster replication found)
Sat Oct 3 13:10:36 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sat Oct 3 13:10:36 2015
Errors in file /oracle/oracle/admin/xxxx/bdump/xxxx1_smon_10420246.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sat Oct 3 13:10:37 2015
Starting background process QMNC
Sat Oct 3 13:10:37 2015
ORACLE Instance xxxx1 (pid = 25) - Error 600 encountered while recovering transaction (23, 85).
Sat Oct 3 13:10:37 2015
Errors in file /oracle/oracle/admin/xxxx/bdump/xxxx1_smon_10420246.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
QMNC started with pid=53, OS id=7536816
Sat Oct 3 13:10:41 2015
LOGSTDBY: Validating controlfile with logical metadata
Sat Oct 3 13:10:41 2015
LOGSTDBY: Validation complete
Sat Oct 3 13:10:46 2015
Errors in file /oracle/oracle/admin/xxxx/bdump/xxxx1_mmon_9110004.trc:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
Sat Oct 3 13:10:48 2015
Errors in file /oracle/oracle/admin/xxxx/udump/xxxx1_ora_6619434.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Completed: alter database open resetlogs
看上去一切的恢复过程都很简单,很顺利,然而这里真正的难题,真正的问题才开始。
也就是最后一个看似很简单的错误 ora-00600 [kdsgrp1]错误,对我们产生了极大的困难。首先
我们来看下产生该错误时涉及到那些对象:
Validate domain 0
Validated domain 0, flags = 0x0
kwqmnich: current time:: 13: 31: 34
kwqmnich: instance no 0 check_only flag 1
kwqmnich: initialized job cache structure
row 0041edda.2e continuation at
file# 1 block# 126426 slot 47 not found
**************************************************
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 47 ..... nrows: 175
**************************************************
*** 2015-10-03 13:31:40.864
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT OWNER,NAME,TYPE,COUNT(*) FROM DBA_SOURCE WHERE
SUBSTR(OWNER,1,4)='FMIS' GROUP BY OWNER,NAME,TYPE HAVING COUNT(*)>1000
----- PL/SQL Call Stack -----
Object id on Block? Y
seg/obj: 0x12 csc: 0x01.b1957474 itc: 3 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0009.00e.0003968d 0x0c07f50f.1ea7.5d --U- 1 fsc 0x0053.b1957475
0x02 0x0017.005.0006755c 0x0c0774de.1be3.38 C--- 0 scn 0x0001.b1957451
0x03 0x0002.042.00010d39 0x0080b268.0cf1.22 C--- 0 scn 0x0001.b1957429
data_block_dump,data header at 0x110f46074
===============
tsiz: 0x3f88
hsiz: 0x170
pbl: 0x110f46074
bdba: 0x0041edda
76543210
flag=--------
ntab=1
nrow=175
frre=0
fsbo=0x170
fseo=0x783
avsp=0x3cbe
tosp=0x3d13
0xe:pti[0] nrow=175 offs=0
0x12:pri[0] sfll=1
0x14:pri[1] sfll=2
我们可以发现,除开其他的非核心对象之后,这里还涉及到一个 obj#=18,也就是 obj$ 这个核
心的数据字典表。而该数据字典表上的几个 Index,i_obj1,i_obj2,i_obj3 都是 object_id 小于
57 的核心对象,这部分对象是属于 bootstrap$ 的核心数据字典对象。即是 Index 也无法通过 re
build,38003 event 或在 upgrade 模式下进行重建。
当然,这里也不是说完全无法去重建上述数据字典表,后面有一篇文章会讲解如何去重建。
在分析过程中,我发现其中的前面 2 个 Index 都有问题,如下:
SQL> analyze table obj$ validate structure;
Table analyzed.
SQL> select index_name from dba_indexes where table_name='OBJ$';
INDEX_NAME
------------------------------
I_OBJ1
I_OBJ2
I_OBJ3
SQL> analyze index I_OBJ1 validate structure;
analyze index I_OBJ1 validate structure
*
ERROR at line 1:
ORA-08100: index is not valid - see trace file for diagnostics
SQL> analyze index I_OBJ2 validate structure;
analyze index I_OBJ2 validate structure
*
ERROR at line 1:
ORA-08100: index is not valid - see trace file for diagnostics
SQL> analyze index I_OBJ3 validate structure;
Index analyzed.
DESCRIPTION:
This error was introduced in 10g with the fix to Bug 2442351, it provides for an extra health check
on a block, we detected a null row header, see Note:2442351.9 for more information.
Error may be caused by:
Case 1. A row referenced in an index that does not exist in the table.
Case 2. An non-existent rowid pointed to by a chained row.
Trace Examples:
Case 2. A row points to another rowid which does not exist (Chained row does not exist).
========================================================================
Trace file has:
It means that row with rdba 0x1186b11a continues in file# 70 block# 441621 slot 1.
But the information in file# 70 block# 441621 slot 1 does not exist. It is:
tab 0, row 16, @0xd7f ---> This is the slot with the problem.
tl: 29 fb: -------- lb: 0x0 cc: 11
nrid: 0x1186bd15.1 ---> It points to rdba=0x1186bd15 slot 1
(file# 70 block# 441621 slot 1) but that row does not exist in that block.
For this case ANALYZE TABLE .. VALIDATE STRUCTURE is not detecting this logical
corruption
Referece Bug 6858313
Run an export (exp) or Full Table Scan to identify if there is a permanent invalid chained row.
FUNCTIONALITY:
Kernel Data layer Seek/Scan
IMPACT:
PROCESS FAILURE
POSSIBLE PHYSICAL CORRUPTION
@?/rdbms/admin/catalog.sql
@?/rdbms/admin/catproc.sql
@?/rdbms/admin/utirp.sql
@?/rdbms/admin/utlrp.sql
@?/rdbms/admin/catupgrd.sql
Using dbms_repair for repairing this. Checklist (sources are on the bottom):
1). Connected as SYS: Creating repair table ‘REPAIR_TABLE’ and view ‘DBA_REPAIR_TABLE’:
BEGIN
DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => ‘REPAIR_TABLE’,
TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => ‘USERS’);
END;
/
6. Fix it, mark the blocks as corrupt (afterwards we get the ora-1578 with a full table scan):
SET SERVEROUTPUT ON
DECLARE num_fix INT;
BEGIN
num_fix := 0;
DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
SCHEMA_NAME => ‘DWH_TEST’,
OBJECT_NAME=> ‘DWH’,
OBJECT_TYPE => dbms_repair.table_object,
REPAIR_TABLE_NAME => ‘REPAIR_TABLE’,
FIX_COUNT=> num_fix);
DBMS_OUTPUT.PUT_LINE(‘num fix: ‘ || TO_CHAR(num_fix));
END;
/
8. Find Index entries pointing to the corrupt blocks (example of tWO indexes):
SET SERVEROUTPUT ON
DECLARE num_orphans INT;
BEGIN
num_orphans := 0;
DBMS_REPAIR.DUMP_ORPHAN_KEYS (
SCHEMA_NAME => ‘DWH_TEST’,
OBJECT_NAME => ‘IND_DWH_01’,
OBJECT_TYPE => dbms_repair.index_object,
REPAIR_TABLE_NAME => ‘REPAIR_TABLE’,
ORPHAN_TABLE_NAME=> ‘ORPHAN_KEY_TABLE’,
KEY_COUNT => num_orphans);
DBMS_OUTPUT.PUT_LINE(‘orphan key count: ‘ || TO_CHAR(num_orphans));
END;
/
declare
begin
dbms_repair.rebuild_freelists (
schema_name => ‘DWH_TEST’,
object_name => ‘DWH’,
object_type => dbms_repair.table_object);
end;
/
Sources: