Hierarchy of the logs, traces
Proper Tools to verify the Exadata components health check
Written by Syed Jaffar Hussain Database role transition performance
demonstrated with hands-on examples to apply a patch at various layers of Exadata. This part of the series would focus on
determining the cell health check verifications, collecting right information from various logs/trace/dump files for troubleshooting Database Tuning
cell and InfiniBand issues. Additionally, you will also learn the automated file deletion policy on the Cell server.
Hierarchy of the logs, traces
Oracle keep track of all useful information into various log files, and dumps the critical information into trace or dump files.
Reviewing these files time-to-time is strongly recommended as they would provide the glimpse and current state for Cell, database, Systems (ECM)
RAC and etc. This part of the segment will take you through the hierarchy of logs on Exadata cell server, and explain the importance
Every cell has /var/log/oracle file system, as shown in the below picture: Exadata

Verifying network topology:

To verify spine/Leaf switch status, topology and errors, use the following command:
/opt/oracle.SupportTools/ibdiagtools/verify-topology Like 0

InfiniBand Link details

Run the iblinkinfo command to review the InfiniBand Link details on the cell:
CA: uso17 S, HCA-1:
0x0010e00001495101 5 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 10[ ] "SUN DCS 36P QDR uso27" ( )
0x0010e00001495102 6 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 10[ ] "SUN DCS 36P QDR uso28" ( )
Switch: 0x0010e04071e5a0a0 SUN DCS 36P QDR uso28
1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 2[ ] "uso19 C, HCA-1" ( )
1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 4 2[ ] "uso18 C, HCA-1" ( )
1 3[ ] ==( Down/ Polling)==> [ ] "" ( )
1 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 10 2[ ] "uso20 C, HCA-1" ( )
1 5[ ] ==( Down/ Polling)==> [ ] "" ( )
1 6[ ] ==( Down/ Polling)==> [ ] "" ( )
1 7[ ] ==( Down/ Polling)==> [ ] "" ( )
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 8 2[ ] "uso26 S, HCA-1" ( )
1 9[ ] ==( Down/ Polling)==> [ ] "" ( )
1 10[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 6 2[ ] "uso17 S, HCA-1" ( )
1 11[ ] ==( Down/Disabled)==> [ ] "" ( )
1 12[ ] ==( Down/ Polling)==> [ ] "" ( )
1 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 14[ ] "SUN DCS 36P QDR uso27" ( )
1 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 13[ ] "SUN DCS 36P QDR uso27" ( )
1 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 16[ ] "SUN DCS 36P QDR uso27" ( )
1 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 15[ ] "SUN DCS 36P QDR uso27" ( )
1 17[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 18[ ] "SUN DCS 36P QDR uso27" ( )
1 18[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 17[ ] "SUN DCS 36P QDR uso27" ( )


Review the IB status, speed details using the ibstauts command:

Diagnostic collection

Collecting right information is always important to troubleshoot or diagnose any issue. However, when the information needed to
gather from dozens of different files from different servers, like Cell and DB, it is going to be time consuming. Oracle has provided
couple of utilities/tools to gather diagnostic information from logs/traces across all Cells/DB servers together at one time. You will
see below the tools that can do the job:

The Sundiag.sh is available under /opt/oracle.SupportTools location on each cell. The tool is used to collect the information from
Cell server and DB server, need to run the script as root user.
root> ./sundiag.sh

Oracle Exadata Database Machine - Diagnostics Collection Tool

Gathering Linux information

Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.

Generating diagnostics tarball and removing temp directory

Done. The report files are bzip2 compressed in /tmp/sundiag_usdwilo11_1418NML055_2016_02_07_13_53.tar.bz2

The *.tzr.bz2 file contains several files, including alert.log and celldisk details etc.

Automated Cell File Management

Like automated Cluster file management deletion policy, there is automated cell maintenance which perform a file deletion policy
based on the date. The feature has the following characteristics:
Management Server (MS) service is responsible to run through a file delete policy.
The retention for ADR is 7 days
Older than 7days metric history files will be deleted
Alert.log file will be renamed once it reaches to 10MB.
MS also triggers the deletion policy when the file system utilization become high.
If the /root and the /var/log/oracle directory utilization reaches to 80%, automatic deletion policy will be applied
The automatic deletion policy will be applied on the /opt/oracle file when the utilization reaches to 90%
Files over 5MB or one day older under the / file system, /tmp, /var/crash, /var/spool will be deleted

This part has explained the hierarchy of the logs/trace files on Cell server. What are the important tools that can be used to view the
status of various Exadata components, such as Cell, InfiniBand, Disks etc. In the next Part, you will learn the best approach to
Exadata migration.
First published by
Syed Jaffar Hussain
8 Feb 2016 11:11 PM

Last revision by
Syed Jaffar Hussain
10 Mar 2016 3:12 AM

Revisions: 2 Comments: 0

