Вы находитесь на странице: 1из 4

Search  

LOGIN | REGISTER

PRODUCT COMMUNITIES DATABASE INSIGHTS TRAINING DOWNLOADS

Home » Platforms » Oracle » Oracle Wiki » Managing & Troubleshooting Exadata – Part 2

Oracle Community Join

Overview Blog Wiki

Article History Contents


Oracle Wiki

Oracle - Wiki
Managing & Troubleshooting Exadata – Part 2 Agile Development / DevOps
Amazon
Hierarchy of the logs, traces
APEX
Proper Tools to verify the Exadata components health check
Diagnostic collection Big Data SQL
Automated Cell File Management
Data Replication
Conclusion
Database Administration
Written by Syed Jaffar Hussain Database role transition performance
Part 1 has talked about the significance of Exadata patching and explained the cell and DB node patching concepts, patching tools and analysis - Time Management Interface
demonstrated with hands-on examples to apply a patch at various layers of Exadata. This part of the series would focus on
determining the cell health check verifications, collecting right information from various logs/trace/dump files for troubleshooting Database Tuning
cell and InfiniBand issues. Additionally, you will also learn the automated file deletion policy on the Cell server.
Docker
Hierarchy of the logs, traces
Enterprise Content Management
Oracle keep track of all useful information into various log files, and dumps the critical information into trace or dump files.
Reviewing these files time-to-time is strongly recommended as they would provide the glimpse and current state for Cell, database, Systems (ECM)
RAC and etc. This part of the segment will take you through the hierarchy of logs on Exadata cell server, and explain the importance
of the files. Enterprise Manager

Every cell has /var/log/oracle file system, as shown in the below picture: Exadata

Configuring Oracle Golden Gate on


Exadata and RAC Database
Discover Exadata Database Machine
in Oracle Enterprise Manager 12c
Exadata – Clone RAC RDBMS Home
You will find the following sub-directories underneath of /var/log/oracle:
Exadata – Configure Cisco Switch and
diag PDU SNMP for OEM 12c Monitoring
cellos (Post Exadata Discovery Setups)
crashfiles
Exadata – Configure ZFS Shares On
deploy
Exadata Using ESBU Utility
Exadata – Create Database Using
Cell alert.log DBCA
Like database and Oracle Cluster, each cell maintains its own alert.log file where it keep track of cell start/stop, services information Exadata – Create Snapshot-Based
and other important details. Whenever there is any issue with the Exadata services, this is the first file to be reviewed to get useful Backup of Oracle Linux Database
information.
Server
Location: /opt/oracle/cell/log/diag/asm/cell/{cellname}/trace
Name : alert.log Exadata – Discover Cluster and
Database in OEM 12c

MS logfile Exadata - Drop an ASM Disk Group


and Add Space to another ASM Disk
Review the below log whenever you encounter issues with Management Server (MS) service: Group
Location: /opt/oracle/cell/log/diag/asm/cell/{cellname}/trace Exadata - Extend root System
Name : ms-odl.log
Partition on Exadata Compute nodes

Crash and Core files Exadata – Remove Storage Cell from


Exadata Database Machine
By default the crash core files are dumped at the following location on Exadata cell:
Exadata – Resize RECO Griddisks In
/var/log/oracle/crashfiles Rolling Fashion
In order to modify the crash core file location, you can modify the following configuration files on the cell: Exadata - Shutdown or Reboot
/etc/kdump.conf – change the path to new location. Exadata Storage Cell Without
Affecting ASM

Cell patching log files Exadata Flash Cache Compression


For any cell patching related log files, you should review files under the following location:
Exadata Flash-Based Grid disk
/var/log/oracle/cellos
Exadata Image 12.1.2.x.x – October
2015 CPU Java Vulnerabilities
OS log file
Exadata Serial Numbers – How to
All OS related messages can be reviewed in the following: obtain and Update Serial Numbers

/var/log/messages Exadata Storage Cell Reports - RS-


7445 [Serv MS Leaking Memory]
The image below depicts which patching tool is used to patch the Exadata stack:
Error
Exadata System Disk Replacement
Disk controller Firmware logs
Issue
Battery capacity, feature properties can be viewed through the following command:
Exadata Write-back Flash Cache
/opt/MegaRAID/MegaCli/MegaCli64 -fwtermlog -dsply -a0
Extend /u01 File System On Exadata
Compute Node
Extending the Logical volume and
File systems (root and /u01) on
Exadata DB Nodes
Installing Enterprise Manager 12c
Agent on Exadata Using Agent
Automation Kit
Managing and Troubleshooting
Exadata
Managing & Troubleshooting
Exadata - Part 1 - Upgrading &
Patching Exadata
Managing & Troubleshooting
Exadata – Part 2
Managing & Troubleshooting
alerthistory & cell details
Exadata Part 3 – Migrating
The alerthistory is the another powerful command which giving significantly useful information about the cell. Strongly recommend Databases to Exadata database
to run through alerthistory on each cell from time-to-time. machine best practices

Manually Create Cell Disk And Grid


Disk On Exadata
Oracle Enterprise Manager Cloud
Another power command on the cell is to determine the health state of the cell is executing the following: Control 13c:- Oracle Exadata
Database Machine Discovery
Oracle Exadata Database Machine -
DCLI Introduction and Setup
Oracle Exadata Deployment
Assistance (OEDA)
Oracle Exadata Patching
Rolling RECO data disk group resize
activity for Oracle Exadata Database
Machine
Upgrade Exadata Compute Nodes –
Using patchmgr

EXPLAIN PLAN
Network Management
NoSQL & Hadoop
Oracle Applications
Oracle Architecture
Oracle Cloud
Oracle Clusterware
Oracle Development
To ensure the stability of the cell, verify the health status of a cell, ensure the fanstatus, powerstatus, cell status, and CellSrv/MS/RS
services status is up and running. Oracle R Enterprise
Oracle12c
Proper Tools to verify the Exadata components health check
Peoplesoft
It is essential to know the proper tools on Exadata to verify the Cell components health status. Following are a few important tools
which can be used to verify the status of different components, such as, cell boot location/files, InfiniBand status etc. PL/SQL Development
Scripting Languages
Imageinfo SQL Reference
Imageinfo provides crucial information of the cell software, rolling back to previous image possibilities and the location/file for User-Supplied Applications
CELL boot usb partition, especially useful before/after patching on the cell servers:
13 HOURS TO 3 MINUTES SQL QUERY
TUNING
Converting Standard to Flex ASM
GROUP BY ISSUE
Identity Column in Oracle 12c
Introduction to Application Containers in
Oracle Database 12cR2
Near Zero Downtime PDB Relocation in
Oracle Database 12cR2
ORA-08103: object no longer exists
Pre-Purchase Building & Pest
inspections
Table level Recovery using RMAN -
Oracle 12c
Using MySQL and Oracle Databases on
Amazon RDS

Verifying network topology:


Share Tweet
To verify spine/Leaf switch status, topology and errors, use the following command:
/opt/oracle.SupportTools/ibdiagtools/verify-topology Like 0

Wikis - Links
Share this
View history
Article

InfiniBand Link details

Run the iblinkinfo command to review the InfiniBand Link details on the cell:
CA: uso17 S 192.168.2.112,192.168.2.113 HCA-1:
0x0010e00001495101 5 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 10[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
0x0010e00001495102 6 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 10[ ] "SUN DCS 36P QDR uso28 10.0.9.92" ( )
Switch: 0x0010e04071e5a0a0 SUN DCS 36P QDR uso28 10.0.9.92:
1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 2[ ] "uso19 C 192.168.2.116,192.168.2.117 HCA-1" ( )
1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 4 2[ ] "uso18 C 192.168.2.114,192.168.2.115 HCA-1" ( )
1 3[ ] ==( Down/ Polling)==> [ ] "" ( )
1 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 10 2[ ] "uso20 C 192.168.2.118,192.168.2.119 HCA-1" ( )
1 5[ ] ==( Down/ Polling)==> [ ] "" ( )
1 6[ ] ==( Down/ Polling)==> [ ] "" ( )
1 7[ ] ==( Down/ Polling)==> [ ] "" ( )
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 8 2[ ] "uso26 S 192.168.2.110,192.168.2.111 HCA-1" ( )
1 9[ ] ==( Down/ Polling)==> [ ] "" ( )
1 10[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 6 2[ ] "uso17 S 192.168.2.112,192.168.2.113 HCA-1" ( )
1 11[ ] ==( Down/Disabled)==> [ ] "" ( )
1 12[ ] ==( Down/ Polling)==> [ ] "" ( )
1 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 14[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
1 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 13[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
1 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 16[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
1 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 15[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
1 17[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 18[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )
1 18[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 17[ ] "SUN DCS 36P QDR uso27 10.0.9.91" ( )

Ibstatus

Review the IB status, speed details using the ibstauts command:

Diagnostic collection

Collecting right information is always important to troubleshoot or diagnose any issue. However, when the information needed to
gather from dozens of different files from different servers, like Cell and DB, it is going to be time consuming. Oracle has provided
couple of utilities/tools to gather diagnostic information from logs/traces across all Cells/DB servers together at one time. You will
see below the tools that can do the job:

sundiag.sh
The Sundiag.sh is available under /opt/oracle.SupportTools location on each cell. The tool is used to collect the information from
Cell server and DB server, need to run the script as root user.
root> ./sundiag.sh

Oracle Exadata Database Machine - Diagnostics Collection Tool

Gathering Linux information

Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.

/tmp/sundiag_usdwilo11_1418NML055_2016_02_07_13_53
Generating diagnostics tarball and removing temp directory

==============================================================================
Done. The report files are bzip2 compressed in /tmp/sundiag_usdwilo11_1418NML055_2016_02_07_13_53.tar.bz2
==============================================================================

The *.tzr.bz2 file contains several files, including alert.log and celldisk details etc.

Automated Cell File Management


Like automated Cluster file management deletion policy, there is automated cell maintenance which perform a file deletion policy
based on the date. The feature has the following characteristics:
Management Server (MS) service is responsible to run through a file delete policy.
The retention for ADR is 7 days
Older than 7days metric history files will be deleted
Alert.log file will be renamed once it reaches to 10MB.
MS also triggers the deletion policy when the file system utilization become high.
If the /root and the /var/log/oracle directory utilization reaches to 80%, automatic deletion policy will be applied
The automatic deletion policy will be applied on the /opt/oracle file when the utilization reaches to 90%
Files over 5MB or one day older under the / file system, /tmp, /var/crash, /var/spool will be deleted

Conclusion
This part has explained the hierarchy of the logs/trace files on Cell server. What are the important tools that can be used to view the
status of various Exadata components, such as Cell, InfiniBand, Disks etc. In the next Part, you will learn the best approach to
Exadata migration.
Exadata, Oracle

2253 0 /

First published by
Syed Jaffar Hussain
8 Feb 2016 11:11 PM

Last revision by
Syed Jaffar Hussain
10 Mar 2016 3:12 AM

Revisions: 2 Comments: 0

About Toad World Quest Communities Privacy Policy Terms of Use Contact Us Send Feedback About Quest

Toad World is sponsored by   Copyright © 2018 Quest Software Inc. ALL RIGHTS RESERVED.