You are on page 1of 82

RAC Troubleshooting

Julian Dyke Independent Consultant


Web Version - May 2008
1

2008 Julian Dyke

juliandyke.com

Agenda

Installation and Configuration Oracle Clusterware ASM and RDBMS

2008 Julian Dyke

juliandyke.com

Installation and Configuration


3

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Overview


Introduced in Oracle 10.2 Checks cluster configuration stages - verifies all steps for specified stage have been completed components - verifies specified component has been correctly installed Supplied with Oracle Clusterware Can be downloaded from OTN (Linux and Windows) Also works with 10.1 (specify -10gR1 option) For earlier versions see Metalink Note 135714.1 Script to Collect RAC Diagnostic Information (racdiag.sql)

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility CVUQDISK Package

On the Red Hat 4 and Enterprise Linux platforms, the following additional RPM is required for CLUVFY
cvuqdisk-1.0.1-1.rpm

This package is supplied in the clusterware/cluvfy/rpm directory on the clusterware CD-ROM It can also be download from OTN On each node as the root user install the RPM using:
rpm -ivh cvuqdisk-1.0.1-1.rpm

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Stages

CLUVFY stages include:


-post hwos
-pre cfs -post cfs -pre crsinst -post crsinst -pre dbinst -pre dbcfg

post check for hardware and operating system


pre-check for CFS setup post-check for CFS setup pre-check for Oracle Clusterware installation post-check for Oracle Clusterware installation pre-check for database installation pre-check for database configuration

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Components

CLUVFY components include:


nodereach nodecon cfs ssa space sys clu clumgr ocr crs nodeapp admprv peer Checks reachability between nodes Checks node connectivity Checks CFS integrity Checks shared storage accessibility Checks space availability Checks minimum system requirements Checks cluster integrity Checks cluster manager integrity Checks OCR integrity Checks Oracle Clusterware (CRS) integrity Checks node applications exist Checks administrative privileges Compares properties with peers

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Example

For example, to check configuration before installing Oracle Clusterware on node1 and node2 use:

sh runcluvfy.sh stage -pre crsinst -n london1,london2

Checks: node reachability user equivalence administrative privileges node connectivity shared stored accessibility If any checks fail append -verbose to display more information

2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Trace & Diagnostics

To enable trace in CLUVFY use:


export SRVM_TRACE = true

Trace files are written to the $CV_HOME/cv/log directory By default this directory is removed immediately after CLUVFY is execution On Linux/Unix comment out the following line in runcluvfy.sh
# $RM -rf $CV_HOME

Pathname of CV_HOME directory is based on operating system process e.g:


/tmp/18124

It can be useful to echo value of CV_HOME in runcluvfy.sh:


echo CV_HOME=$CV_HOME

2008 Julian Dyke

juliandyke.com

Oracle Universal Installer (OUI) Trace & Diagnostics

On Unix/Linux to launch the OUI with tracing enabled use:


./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2

Log files will be written to $ORACLE_BASE/oraInventory/logs To trace root.sh execute it using:


sh -x root.sh

Note that it may be necessary to cleanup the CRS installation before executing root.sh again

10

2008 Julian Dyke

juliandyke.com

DBCA Trace & Diagnostics

To enable trace for the DBCA in Oracle 9.0.1 and above

Edit $ORACLE_HOME/bin/dbca and change


# Run DBCA $JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin -mx64m -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca $ARGUMENTS

to
# Run DBCA $JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin -mx64m -DTRACING.ENABLED=true -DTRACING.LEVEL=2 -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca $ARGUMENTS

Redirect standard output to a file e.g.


$ dbca > dbca.out &

11

2008 Julian Dyke

juliandyke.com

Oracle Clusterware

12

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Overview

Provides Node membership services (CSS) Resource management services (CRS) Event management services (EVM) In Oracle 10.1 and above resources include Node applications ASM Instances Database Instances Services Node applications include: Virtual IP (VIP) Listeners Oracle Notification Service (ONS) Global Services Daemon (GSD)

13

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Virtual IP (VIP)

Node application introduced in Oracle 10.1 Allows Virtual IP address to be defined for each node All applications connect using Virtual IP addresses If node fails Virtual IP address is automatically relocated to another node Only applies to newly connecting sessions

14

2008 Julian Dyke

juliandyke.com

Oracle Clusterware VIP (Virtual IP) Node Application


Before After

VIP1 Listener1

VIP2 Listener2

VIP1 Listener1

VIP1

VIP2

Listener2

Instance1
Node 1

Instance2
Node 2

Instance1 Node 1

Instance2 Node 2

15

2008 Julian Dyke

juliandyke.com

Oracle Clusterware VIP (Virtual IP) Node Application

On Linux during normal operation, each node will have one VIP address. For example:

[root@server3]# ifconfig eth0 Link encap:Ethernet HWaddr 00:11:D8:58:05:99 inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::211:d8ff:fe58:599/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6814 errors:0 dropped:0 overruns:0 frame:0 TX packets:10326 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:684579 (668.5 KiB) TX bytes:1449071 (1.3 MiB) Interrupt:217 Base address:0x8800 Link encap:Ethernet HWaddr 00:11:D8:58:05:99 inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:217 Base address:0x8800

eth0:1

The resource for VIP address for 192.168.2.203 is initially running on server3

16

2008 Julian Dyke

juliandyke.com

Oracle Clusterware VIP (Virtual IP) Node Application

If Oracle Clusterware on server3 is shutdown, the VIP resource is transferred to another node (in this case server11)

[root@server11]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21d:7dff:fea3:a55/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2792 errors:0 dropped:0 overruns:0 frame:0 TX packets:4097 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:329891 (322.1 KiB) TX bytes:593615 (579.7 KiB) Interrupt:177 Base address:0x2000 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.211 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x2000 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x2000

eth0:1

eth0:2

17

2008 Julian Dyke

juliandyke.com

Oracle Clusterware VIP Failover


VIP addresses can occasionally be failed over incorrectly. For example:


Target -----application application application application State ----ONLINE ONLINE ONLINE ONLINE

HA Resource ----------ora.server11.vip ora.server12.vip ora.server3.vip ora.server4.vip

on on on on

server11 server12 server11 server4

[root@server3]# ./crs_relocate ora.server3.vip -c server3 Attempting to stop `ora.server3.vip` on member `server11` Stop of `ora.server3.vip` on member `server11` succeeded. Attempting to start `ora.server3.vip` on member `server3` Start of `ora.server3.vip` on member `server3` succeeded. HA Resource ----------ora.server11.vip ora.server12.vip ora.server3.vip ora.server4.vip 18 Target -----application application application application State ----ONLINE ONLINE ONLINE ONLINE

on on on on

server11 server12 server3 server4

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Logging

In Oracle 10.2, Oracle Clusterware log files are created in the $CRS_HOME/log directory can be located on shared storage $CRS_HOME/log directory contains subdirectory for each node e.g. $CRS_HOME/log/server6 $CRS_HOME/log/<node> directory contains: Oracle Clusterware alert log e.g. alertserver6.log client - logfiles for OCR applications including CLSCFG, CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG crsd - logfiles for CRS daemon including crsd.log cssd - logfiles for CSS daemon including ocssd.log evmd - logfiles for EVM daemon including evmd.log racg - logfiles for node applications including VIP and ONS

19

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Log Files

Log File locations in $ORA_CRS_HOME


$ORA_CRS_HOME

log

<nodename>

client

crsd

cssd

evmd

racg

alert<nodename>.log

racgeut

racgimon

racgmain

20

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Log Files

Log File locations in $ORACLE_HOME (RDBMS and ASM)


$ORACLE_HOME

log

<nodename>

client

racg

racgeut

racgimon

racgmain

racgmdb

21

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Troubleshooting


If OCR or voting disk are not available, error files may be created in /tmp e.g. /tmp/crsctl.4038 For example, if OCR cannot be found:

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] OCR is inaccessible - no CRS daemons will start No errors written to log files

If Voting Disk has incorrect ownership

clsscfg_vhinit: unable(1) to open disk (/dev/raw/raw2) Internal Error Information: Category: 1234 Operation: scls_block_open Location: statfs Other: statfs failed /dev/raw/raw2 Dep: 2 Failure 1 checking the Cluster Synchronization Services voting disk '/dev/raw/raw2'. Not able to read adequate number of voting disks 22

2008 Julian Dyke

juliandyke.com

Oracle Clusterware racgwrap

Script called on each node by SRVCTL to control resources Copy of script in each Oracle home $ORA_CRS_HOME/bin/racgwrap $ORA_ASM_HOME/bin/racgwrap $ORACLE_HOME/bin/racgwrap Sets environment variables Invokes racgmain executable Generated from racgwrap.sbs Differs in each home Sets $ORACLE_HOME and $ORACLE_BASE environment variables for racgmain Also sets $LD_LIBRARY_PATH Enable trace by setting _USR_ORA_DEBUG to 1


23

2008 Julian Dyke

juliandyke.com

Oracle Clusterware racgwrap


In Unix systems the Oracle SGA is located in one or more operating system shared memory segments Each segment is identified by a shared memory key Shared memory key is generated by the application Each shared memory key maps to a shared memory ID Shared memory ID is generated by operating system Shared memory segments can be displayed using ipcs -m

[root@server3] # ipcs -m ------ Shared Memory Segments -------key shmid owner perms 0x8a48ff44 131072 oracle 640 0x17d04568 163841 oracle 660

bytes nattch 94371840 20 2099249152 246

status

Oracle generates the shared memory key from the values of $ORACLE_HOME $ORACLE_SID

24

2008 Julian Dyke

juliandyke.com

Oracle Clusterware racgwrap

If instance is currently running e.g.

[oracle@server3]$ ps -ef | grep pmon_PROD1 oracle 8653 1 0 16:13 ? 00:00:00 ora_pmon_PROD1

But SQL*Plus cannot connect to the instance

[oracle@server3]$ export ORACLE_SID=PROD1 [oracle@server3]$ sqlplus / as sysdba ... Connected to idle instance

Compare $ORACLE_HOME environment variable to ORACLE_HOME variable in $ORACLE_HOME/bin/racgwrap

[oracle@server3]$ echo $ORACLE_HOME /u01/app/oracle/product/10.2.0/db_1 [oracle@server3]$ grep "^ORACLE_HOME" $ORACLE_HOME/bin/racgwrap ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1/

25

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Process Monitor (OPROCD)

Process Monitor Daemon Provides Cluster I/O Fencing

Implemented on Unix systems Not required with third-party clusterware Implemented in Linux in 10.2.0.4 and above In 10.2.0.3 and below hangcheck timer module is used Provides hangcheck timer functionality to maintain cluster integrity Behaviour similar to hangcheck timer Runs as root Locked in memory Failure causes reboot of system See /etc/init.d/init.cssd for operating system reboot commands

26

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Process Monitor (OPROCD)

OPROCD takes two parameters -t - Timeout value Length of time between executions (milliseconds) Normally defaults to 1000 -m - Margin Acceptable margin before rebooting (milliseconds) Normally defaults to 500 Parameters are specified in /etc/init.d/init.cssd OPROCD_DEFAULT_TIMEOUT=1000 OPROCD_DEFAULT_MARGIN=500 Contact Oracle Support before changing these values

27

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Process Monitor (OPROCD)

/etc/init.d/init.cssd can increase OPROCD_DEFAULT_MARGIN based on two CSS variables reboottime (mandatory) diagwait (optional) Values can for these be obtained using
[root@server3]# crsctl get css reboottime [root@server3]# crsctl get css diagwait

Both values are reported in seconds The algorithm is


If diagwait > reboottime then OPROCD_DEFAULT_MARGIN := (diagwait - reboottime) * 1000

Therefore increasing diagwait will reduce frequency of reboots e.g


[root@server3]# crsctl set css diagwait 13

28

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats

CSS maintains two heartbeats Network heartbeat across interconnect Disk heartbeat to voting device

Disk heartbeat has an internal I/O timeout (in seconds) Varies between releases In Oracle 10.2.0.2 and above disk heartbeat timeout can be specified by CSS disktimeout parameter Maximum time allowed for a voting file I/O to complete If exceeded file is marked offline Defaults to 200 seconds
crsctl get css disktimeout crsctl set css disktimeout <value>

29

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats


Network heartbeat timeout can be specified by CSS misscount parameter Default values (Oracle Clusterware 10.1 and 10.2) are:
Linux
Unix Windows

60 seconds
30 seconds 30 seconds

Default value for vendor clusterware is 600 seconds crsctl get css misscount crsctl set css misscount <value>

30

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats

Relationship between internal I/O timeout (IOT), MISSCOUNT and DISKTIMEOUT varies between releases
Version 10.1.0.3 10.1.0.4 10.1.0.5
10.1.0.6 10.2.0.1 10.2.0.2

Description IOT = MISSCOUNT - 15 seconds IOT = MISSCOUNT - 15 seconds IOT = MISSCOUNT - 3 seconds
IOT = DISKTIMEOUT during normal operations IOT = MISSCOUNT during initial cluster formation or reconfiguration IOT = MISSCOUNT - 3 seconds IOT = DISKTIMEOUT during normal operations IOT = MISSCOUNT during initial cluster formation or reconfiguration

31

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats

If disktimeout supported CSS will not evict a node from the cluster when I/O to voting disk takes more than MISSCOUNT seconds unless during during initial cluster formation slightly before reconfiguration Nodes will not be evicted as long as voting disk operations are completed within DISKTIMEOUT seconds
Disk Heartbeat Completes within DISKTIMEOUT seconds Takes more than DISKTIMEOUT seconds Completes within MISSCOUNT seconds Reboot No Yes Yes

Network Heartbeat Completes within MISSCOUNT seconds Completes within MISSCOUNT seconds Takes more than MISSCOUNT seconds

32

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRSCTL

CRSCTL can also be used to enable and disable Oracle Clusterware To enable Clusterware use: # crsctl enable crs

To disable Clusterware use:

# crsctl disable crs

These commands update the following file: /etc/oracle/scls_scr/<node>/root/crsstart

33

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRSCTL


In Oracle 10.2, CRSCTL can be used to check the current state of Oracle Clusterware daemons To check the current state of all Oracle Clusterware daemons # crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy To check the current state of individual Oracle Clusterware daemons # crsctl check cssd CSS appears healthy # crsctl check crsd CRS appears healthy # crsctl check evmd EVM appears healthy

34

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRSCTL


CRSCTL can be used to manage the CSS voting disk To check the current location of the voting disk use:

# crsctl query css votedisk 0. 0 /dev/raw/raw3 1. 0 /dev/raw/raw4 2. 0 /dev/raw/raw5

To add a new voting disk use:

# crsctl add css votedisk <path_name>

To delete an existing voting disk use: # crsctl delete css votedisk <path_name>

35

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging

In Oracle 10.2 and above Oracle Clusterware debugging can be enabled and disabled for CRS CSS EVM Resources Subcomponents Debugging can be controlled statically using environment variables dynamically using CRSCTL Debug settings can be persisted in OCR for use in subsequent restarts

36

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging

To list modules available for debugging use: # crsctl lsmodules crs # crsctl lsmodules css # crsctl lsmodules evm

In Oracle 11.1 modules include:


CRS CRS,EVM CRS,CSS,EVM CRS,CSS,EVM CRS CRSMAIN CRSOCR CRSPLACE CRSRES CRSRTI CRSTIMER CRSUI CSSCLNT CRS CRS,EVM CRS CRS CRS CRS CRS CRS,EVM CSSD EVMAGENT EVMAPP EVMCOMM EVMD CSS EVM EVM EVM EVM

CLSVER CLUCLS COMMCRS COMMNS CRSAPP

CRSCOMM
CRSD CRSEVT

CRS
CRS CRS

EVMDMAIN
EVMEVT

EVM
EVM

37

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging

To debug individual modules use:


# crsctl debug log crs <module>:<level>[,<module>:<level>]

For example:
# crsctl debug log crs "CRSCOMM:2,COMMCRS:2,COMMNS:2" Set CRSD Debug Module: CRSCOMM Level: 2 Set CRSD Debug Module: COMMCRS Level: 2 Set CRSD Debug Module: COMMNS Level: 2

Values only apply for current node Stored within OCR in SYSTEM.crs.debug.<node>.<module> For example:
# ocrdump -stdout -keyname SYSTEM.crs.debug.vm1.CRSCOMM

Log will be written to: $ORA_CRS_HOME/log/<node>/crsd/crsd.log

38

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging

To debug an individual resource use:


# crsctl debug log res <resource>:<level>

For example:
# crsctl debug log res ora.vm1.vip:5 Set Resource Debug Module: ora.vm1.vip Level: 5

To disable debugging again set level 0 e.g.:


# crsctl debug log res ora.vm1.vip:0 Set Resource Debug Module: ora.vm1.vip Level: 0

OCR debug value is stored in USR_ORA_DEBUG To check current debug value set in OCR for ora.vm1.vip use:
# ocrdump -stdout -keyname \ CRS.CUR.ora\!vm1\!vip.USR_ORA_DEBUG

39

Log will be written to $ORA_CRS_HOME/log/<node>/racg/<resource>.log

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging


Debugging for CRSD and EVMD can also be configured using environment variables To enable tracing for all modules use ORA_CRSDEBUG_ALL For example:
# export ORA_CRSDEBUG_ALL=5

To enable tracing for individual modules use ORA_CRSDEBUG_<module> For example:


# export ORA_CRSDEBUG_CRSOCR=5

Note that these environment variables have not been implemented in OCSSD or OPROCD

40

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging


In Oracle 10.1 and above debugging can also be configured in $ORA_CRS_HOME/srvm/admin/ocrlog.ini By default this file contains:
# "mesg_logging_level" is the only supported parameter currently. # level 0 means minimum logging. Only error conditions are logged mesg_logging_level = 0

# The last appearance of a parameter will override the previous value. # For example, log level will become 3 when the following value is uncommented. # Change to log level 3 for detailed logging from Oracle Cluster Registry # mesg_logging_level = 3
# Component log and trace level specification template #comploglvl="comp1:3;comp2:4" #comptrclvl="comp1:2;comp2:1"

Component level logging can be configured in this file e.g.:


comploglvl="OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5"

41

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging


Component level logging can also be configured in the OCR For example:
crsctl debug log crs OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5

Components include: OCRAPI - OCR Abstraction Component OCRCAC - OCR Cache Component OCRCLI - OCR Client Component OCRMAS - OCR Master Thread Component OCRMSG - OCR Message Component OCRSRV - OCR Server Component OCRUTL - OCR Util Component

42

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Debugging

CRSCTL can also generate state dumps


crsctl debug statedump crs crsctl debug statedump css crsctl debug statedump evm

CSS dump is written to $ORA_CRS_HOME/log/<node>/cssd/ocssd.log Dump contents can be made more readable e.g.:
cut -c58- < ocssd.log > ocssd.dmp

43

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OLSNODES


The olsnodes utility lists all nodes currently running on the cluster With no arguments olsnodes lists the nodes e.g.
$ olsnodes london1 london2

In Oracle 10.2 and above, with -p argument olsnodes lists node names and private interconnect
$ olsnodes -p london1 london1-priv london2 london2-priv

In Oracle 10.2 and above, with -i argument olsnodes lists node names and VIP address
$ olsnodes -i london1 london1-vip london2 london2-vip

44

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCONFIG

In Oracle 10.1 and above the OCRCONFIG utility performs various administrative operations on the OCR including: displaying backup history configuring backup location restoring OCR from backup exporting OCR importing OCR upgrading OCR downgrading OCR In Oracle 10.2 and above OCRCONFIG can also manage OCR mirrors overwrite OCR files repair OCR files

45

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCONFIG

Options include
Description Display help message Version 10.1+

Option -help

-showbackup
-backuploc -restore -export -import -upgrade -downgrade -replace -overwrite -repair

Display automatic OCR physical backup history


Change OCR physical backup location Restore OCR from automatic physical backup Export contents of OCR to operating system file Import contents of OCR from operating system file Upgrade OCR from a previous version Downgrade OCR to a previous version Add/replace/remove OCR file or mirror Overwrite OCR configuration on disk Repair local OCR configuration

10.1+
10.1+ 10.1+ 10.1+ 10.1+ 10.1+ 10.1+ 10.2+ 10.2+ 10.2+

46

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCONFIG

In Oracle 10.1 and above OCR is automatically backed up every four hours Previous three backup copies are retained Backup copy retained from end of previous day Backup copy retained from end of previous week Check node, times and location of previous backups using the showbackup option of OCRCONFIG e.g.
# ocrconfig -showbackup london1 2005/08/04 11:15:29 london1 2005/08/03 22:24:32 london1 2005/08/03 18:24:32 london1 2005/08/02 18:24:32 london1 2005/07/31 18:24:32
/u01/app/oracle/product/10.2.0/crs/cdata/crs /u01/app/oracle/product/10.2.0/crs/cdata/crs /u01/app/oracle/product/10.2.0/crs/cdata/crs /u01/app/oracle/product/10.2.0/crs/cdata/crs /u01/app/oracle/product/10.2.0/crs/cdata/crs

47

ENSURE THAT YOU COPY THE PHYSICAL BACKUPS TO TAPE AND/OR REDUNDANT STORAGE

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCONFIG

In Oracle 11.1 and above OCR can be backed up manually using:


# ocrconfig -manualbackup

Backups will be written to the location specified by:


# ocrconfig -backuploc <directory_name>

Manual backups can be listed using:


# ocrconfig -showbackup manual

Automatic backups can be listed using:


# ocrconfig -showbackup auto

48

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCONFIG

To restore the OCR from a physical backup copy Check you have a suitable backup using:
# ocrconfig -showbackup

Stop Oracle Clusterware on each node using:

# crsctl stop crs

Restore the backup file using

# ocrconfig -restore <filename>

For example:

# ocrconfig -restore $ORA_CRS_HOME/cdata/crs/backup00.ocr

Start Oracle Clusterware on each node using:

# crsctl start crs


49

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRCHECK

In Oracle 10.1 and above, you can verify the configuration of the OCR using the OCRCHECK utility
# ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 7752 Available space (kbytes) : 254392 ID : 1093363319 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded /dev/raw/raw2 Device/File integrity check succeeded Cluster registry integrity check succeeded

In Oracle 10.1 this utility does not print the ID and Device/File Name information

50

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRDUMP


In Oracle 10.1 and above, you can dump the contents of the OCR using the OCRDUMP utility For example:
# ocrdump

This command writes its output to a file called OCRDUMPFILE in the current working directory You can specify an output file name using:
# ocrdump <dump_file_name>

For example:
# ocrdump ocr_cluster1

51

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCRDUMP


In Oracle 10.2 and above, you can write OCRDUMP output to stdout For example:
# ocrdump -stdout

In Oracle 10.2 and above, you can optionally restrict output by specifying a key For example:
# ocrdump -stdout SYSTEM # ocrdump -stdout SYSTEM.css # ocrdump -stdout SYSTEM.css.misscount

In Oracle 10.2 and above, you can optionally format output in XML. For example:
# ocrdump -stdout SYSTEM.css.misscount -xml

52

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT

The CRS_STAT utility reports the current status of resources managed by Oracle Clusterware Resources include: databases instances services ASM instances node applications gsd ons vip listeners

53

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT

With no arguments CRS_STAT lists all resources currently configured e.g.:


$ crs_stat NAME=ora.RAC.RAC1.inst TYPE=application TARGET=ONLINE STATE=ONLINE on london1 NAME=ora.RAC.RAC2.inst TYPE=application TARGET=ONLINE STATE=ONLINE on london2 NAME=ora.RAC.SERVICE1.RAC1.srv TYPE=application TARGET=OFFLINE STATE=OFFLINE etc...

54

If a node has failed, the STATE field will show which node the applications have failed over to

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT

With the -t option, crs_stat lists resources together with their state and the current node
Name Type Target State Host -----------------------------------------------------------ora....T1.inst application ONLINE ONLINE server3 ora....T2.inst application ONLINE ONLINE server4 ora....T3.inst application ONLINE ONLINE server11 ora....T4.inst application ONLINE ONLINE server12 ora.TEST.db application ONLINE ONLINE server3 ora....SM3.asm application ONLINE ONLINE server11 ora....11.lsnr application ONLINE ONLINE server11 ora....r11.gsd application ONLINE ONLINE server11 ora....r11.ons application ONLINE ONLINE server11 ora....r11.vip application ONLINE ONLINE server11 ora....SM4.asm application ONLINE ONLINE server12 ora....12.lsnr application ONLINE ONLINE server12 ora....r12.gsd application ONLINE ONLINE server12 ora....r12.ons application ONLINE ONLINE server12 ora....r12.vip application ONLINE ONLINE server12

55

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT

With the -ls option, crs_stat lists resources together with their owner, group and permissions.
Name Owner Primary PrivGrp Permission ----------------------------------------------------------------ora....T1.inst oracle oinstall rwxrwxr-ora....T2.inst oracle oinstall rwxrwxr-ora....T3.inst oracle oinstall rwxrwxr-ora....T4.inst oracle oinstall rwxrwxr-ora.TEST.db oracle oinstall rwxrwxr-ora....SM3.asm oracle oinstall rwxrwxr-ora....11.lsnr oracle oinstall rwxrwxr-ora....r11.gsd oracle oinstall rwxr-xr-ora....r11.ons oracle oinstall rwxr-xr-ora....r11.vip root oinstall rwxr-xr-ora....SM4.asm oracle oinstall rwxrwxr-ora....12.lsnr oracle oinstall rwxrwxr-ora....r12.gsd oracle oinstall rwxr-xr-ora....r12.ons oracle oinstall rwxr-xr-ora....r12.vip root oinstall rwxr-xr--

56

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT


CRS_STAT abbreviates resource names Oracle provides an AWK script that includes complete resource names Metalink Note: 259301_1 CRS and 10g RAC
#!/bin/bash RSC_KEY=$1 QSTAT=-u AWK=/usr/bin/awk $AWK \ 'BEGIN {printf "%-45s %-10s %-18s\n","HA Resource", "Target", "State"; printf "%-45s %-10s %-18s\n","-----------", "------", "-----";}' $ORA_CRS_HOME/bin/crs_stat $QSTAT | $AWK \ 'BEGIN { FS="="; state = 0; } $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; state == 0 {next;} $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} $1~/STATE/ && state == 2 {appstate = $2; state=3;} state == 3 {printf "%-45s %-10s %-18s\n", appname,apptarget,appstate;state = 0;}'

57

2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT


#!/usr/bin/perl $s = "."; if ($#ARGV >= 0) { $s = $ARGV[0]; chomp $s;} printf ("%-45s %-12s %-18s\n","HA Resource","Target","State"); printf ("%-45s %-12s %-18s\n","-----------","------","-----"); open (CRS_STAT,"crs_stat -u|"); while ($line = <CRS_STAT>) { if ($line =~ m/=/) { chomp $line; ($n,$v) = split (/=/,$line); if ($n eq "NAME") { $name = $v; } elsif ($n eq "TYPE") { $type = $v; } elsif ($n eq "STATE") { $state = $v; if ($name =~ m/$s/) { printf ("%-45s %-12s %-18s\n",$name,$type,$state); } } } } 58

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Permissions


The CRS_GETPERM and CRS_SETPERM utilities can be used to check and modify Oracle Clusterware permissions For example to change the owner of an instance to oracle and group to oinstall

Check the current permissions


[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:root:rwx,pgrp:root:r-x,other::r--,

Set the new permissions


[root@server11]# crs_setperm ora.TEST.TEST3.inst -o oracle [root@server11]# crs_setperm ora.TEST.TEST3.inst -g oinstall

Check the new permissions


[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:oracle:rwx,pgrp:oinstall:r-x,other::r--,

59

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCR Corruptions

Oracle Cluster Registry Vulnerable to corruption Versions experiencing OCR corruptions have included: 10.1.0.3 10.2.0.2 10.2.0.3 11.1.0.6 Also experienced by many Oracle employees about 20% of UKOUG RAC & HA SIG delegates Typical symptom is "placement error" May be related to configuration of services Corruption may occur at an earlier date May occur when service is configured on non-master node

60

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCR Corruptions

Recovery of corrupt OCR:

If mirror is configured: Restore from mirror using ocrconfig -overwrite See Administration and Deployment Guide for details If backup is available: Restore from backup using ocrconfig -restore If no backup is available: Rebuild OCR using procedure described in Metalink Note 399482.1 - How to recreate OCR/Voting disk accidentally deleted

61

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCR Corruptions

Rebuild procedure (adapted from Note 399482.1): On each node shutdown Oracle Clusterware
[root@server3]# crsctl stop crs

Check that all Clusterware processes have stopped On each node execute rootdelete.sh
[root@server3]# $ORA_CRS_HOME/instance/rootdelete.sh

On first node execute rootdeinstall.sh


[root@server3]# $ORA_CRS_HOME/instance/rootdeinstall.sh

Note that for a corrupt OCR it may be necessary to zero the OCR. For example:
[root@server3]# dd if=/dev/zero of=/dev/ocr bs=1M

62

2008 Julian Dyke

juliandyke.com

Oracle Clusterware OCR Corruptions

Rebuild procedure (adapted from Note 399482.1) continued: On first node execute root.sh
[root@server3]# $ORA_CRS_HOME/root.sh

On remaining nodes execute root.sh


[root@server4]# $ORA_CRS_HOME/root.sh

Use srvctl to add ASM instances Database Instance Services Use netca to add listener Execute cluvfy to verify CRS configuration
[oracle@server4]$ cluvfy stage -post crsinst -n node1,node2

63

2008 Julian Dyke

juliandyke.com

ASM and RDBMS


64

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics Modules and Actions

In Oracle 8.0 and above it is possible to specify a module and action for any session Modules and actions allow inefficient SQL statements to be identified and isolated more efficiently Modules and actions are reported in STATSPACK / AWR / ASH reports V$SESSION V$SQL V$ACTIVE_SESSION_HISTORY Current module and action for a session is reported in V$SESSION.MODULE V$SESSION.ACTION

65

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

To specify a module and action use


DBMS_APPLICATION_INFO.SET_MODULE ( MODULE_NAME => 'MODULE1', ACTION_NAME=> 'ACTION1' );

To specify a new action within the current module use:


DBMS_APPLICATION_INFO.SET_ACTION ( ACTION_NAME=> 'ACTION2' );

Modules and actions can also be specified using OCI calls

66

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR


Introduced in Oracle 10.1 Contains the following subroutines SESSION_TRACE_ENABLE SESSION_TRACE_DISABLE DATABASE_TRACE_ENABLE DATABASE_TRACE_DISABLE CLIENT_ID_TRACE_ENABLE CLIENT_ID_TRACE_DISABLE CLIENT_ID_STAT_ENABLE CLIENT_ID_STAT_DISABLE SERV_MOD_ACT_TRACE_ENABLE SERV_MOD_ACT_TRACE_DISABLE SERV_MOD_ACT_STAT_ENABLE SERV_MOD_ACT_STAT_DISABLE

67

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

Trace is enabled using the following subroutines: SESSION_TRACE_ENABLE DATABASE_TRACE_ENABLE CLIENT_ID_TRACE_ENABLE SERV_MOD_ACT_TRACE_ENABLE By default event 10046 level 8 trace will be enabled Includes wait events In Oracle 11.1 these subroutines have an additional PLAN_STATS parameter which specifies when row source statistics are dumped. Possible values are NEVER FIRST_EXECUTION (default) ALL_EXECUTIONS

68

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR


Introduced in Oracle 10.1 To enable trace in the current session use:


EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE;

To disable trace in the current session use:


EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE;

To enable trace in session 42 use:


EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE (SESSION_ID => 42);

To disable trace in session 42 use:


EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE (SESSION_ID => 42);

69

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR


Introduced in Oracle 10.2 To enable trace for the entire database use:
EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE;

To disable trace for the entire database use:


EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE;

To enable trace for instance RAC1 use:


EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE (INSTANCE_NAME => 'RAC1');

To disable trace for instance RAC1 use:


EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE (INSTANCE_NAME => 'RAC1');

70

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

Trace can be enabled for using client identifiers Useful when many sessions connect using the same Oracle user Useful with connection caches To set a client identifier use DBMS_SESSION.SET_IDENTIFIER For example:
BEGIN DBMS_SESSION.SET_IDENTIFIER ('CLIENT42'); END;

The client identifier for a specific session is reported in V$SESSION.CLIENT_IDENTIFIER

71

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

To enable trace for CLIENT42 use:


BEGIN DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE (CLIENT_ID => 'CLIENT42'); END;

To statistics collection for CLIENT42 use:


BEGIN DBMS_MONITOR.CLIENT_ID_STAT_ENABLE (CLIENT_ID => 'CLIENT42'); END;

Client statistics are reported in V$CLIENT_STATS

72

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

Trace can be enabled for a specific service service and module service, module and action To enable trace for SERVICE1 use:
BEGIN DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE (SERVICE_NAME => 'SERVICE1'); END;

To disable trace for SERVICE1 use:


BEGIN DBMS_MONITOR.SERV_MOD_ACT_TRACE_DISABLE (SERVICE_NAME => 'SERVICE1'); END;

73

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

To enable trace for MODULE1 use:


BEGIN DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE ( SERVICE_NAME => 'SERVICE1', MODULE_NAME => 'MODULE1' ); END;

To enable trace for ACTION1 use:


BEGIN DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE ( SERVICE_NAME => 'SERVICE1', MODULE_NAME => 'MODULE1', ACTION_NAME => 'ACTION1' ); END;

74

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics DBMS_MONITOR

To enable statistics collection for MODULE1 use:


BEGIN DBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE ( SERVICE_NAME => 'SERVICE1', MODULE_NAME => 'MODULE1' ); END;

To enable statistics collection for ACTION1 use:


BEGIN DBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE ( SERVICE_NAME => 'SERVICE1', MODULE_NAME => 'MODULE1', ACTION_NAME => 'ACTION1' ); END;

75

Statistics are externalized in V$SERV_MOD_ACT_STATS

2008 Julian Dyke

juliandyke.com

Trace & Diagnostics DBMS_MONITOR

In Oracle 10.1 and above, current trace configuration is reported in DBA_ENABLED_TRACES

TRACE_TYPE column can be CLIENT_ID SERVICE SERVICE_MODULE SERVICE_MODULE_ACTION DATABASE Currently enabled trace aggregations are reported in DBA_ENABLED_AGGREGATIONS

76

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics Automatic Diagnostic Repository

In Oracle 11.1 and above the diagnostics area has been redesigned Diagnostics area is located in $ORACLE_BASE/diag and includes the following top-level directories asm clients crs diagtool lsnrctl netcman ofm rdbms tnslsnr

77

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics Automatic Diagnostic Repository

Trace directory includes server (foreground) process trace files background process trace files alert log (text) All trace files and alert log are written to $ORACLE_BASE/diag/rdbms/<database>/<instance>/trace For example for database TEST $ORACLE_BASE/diag/rdbms/test/TEST1/trace BACKGROUND_DUMP_DEST and USER_DUMP_DEST both specify same trace directory by default Deprecated in Oracle 11.1

78

2008 Julian Dyke

juliandyke.com

Trace and Diagnostics Automatic Diagnostic Repository


V$DIAG_INFO dynamic performance view Introduced in Oracle 11.1 Returns values for the following diagnostics
Example Value /u01/app/oracle /u01/app/oracle/diag/rdbms/test/TEST 2 1

Name ADR Base ADR Home Active Incident Count Active Problem Count

Default Trace File


Diag Alert Diag Cdump Diag Enabled Diag Incident Diag Trace Health Monitor

/u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_14003.trc
/u01/app/oracle/diag/rdbms/test/TEST/alert /u01/app/oracle/diag/rdbms/test/TEST/cdump TRUE /u01/app/oracle/diag/rdbms/test/TEST/incident /u01/app/oracle/diag/rdbms/test/TEST/trace /u01/app/oracle/diag/rdbms/test/TEST/hm

79

2008 Julian Dyke

juliandyke.com

Trace & Diagnostics SRVCTL

In Oracle 10.1 and above, to enable trace in SRVCTL use


export SRVM_TRACE = true

By default trace is written to standard output In Oracle 10.1 and above, the same environment variable can be used to trace: NETCA VIPCA SRVCONFIG GSDCTL CLUVFY CLUUTIL

80

2008 Julian Dyke

juliandyke.com

References

Metalink Notes

265769.1 - Troubleshooting CRS Reboots 240001.1 - Troubleshooting CRS root.sh problems 341214.1 - How to cleanup after a failed (or successful) Oracle Clusterware installation 294430.1 - MISSCOUNT Definition and Default Values 357808.1 - CRS Diagnostics 272331.1 - CRS 10g Diagnostic Guide 330358.1 - CRS 10g R2 Diagnostic Collection Guide 331168.1 - Clusterware consolidated logging in 10gR2 357808.1 - Diagnosibility for CRS/EVM/RACG 289690.1 - Data Gathering for Troubleshooting RAC and CRS Issues 284752.1 - Increasing CSS Misscount, Reboottime and Disktimeout 462616.1 - Reconfiguring the CSS disktimeout of 10gR2 Clusterware for proper LUN failover 317628.1 - How to replace a corrupt OCR mirror file 279793.1 - How to restore a lost voting disk in 10g

81

2008 Julian Dyke

juliandyke.com

Thank you for your interest

82

2008 Julian Dyke

juliandyke.com