Вы находитесь на странице: 1из 18

Oracle Exadata Database Machine Monitoring

Oracle, Inc.
www.oracle.com

Document Version: 1.2.0


Last Updated:
18 March 2011

Copyrights
Oracle Exadata Database Machine Monitoring
Copyright 2011, Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are
protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use,
copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form,
or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is
prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors,
please report them to us in writing.
If this software or related documentation is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government,
the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S.
Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal
Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and
adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent
applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software
License (December 2007). Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software is developed for general use in a variety of information management applications. It is not developed or intended for use
in any inherently dangerous applications, including applications which may create a risk of personal injury. If you use this software in
dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to
ensure the safe use of this software. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this
software in dangerous applications.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.
This software and documentation may provide access to or information on content, products, and services from third parties. Oracle
Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party
content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred
due to your access to or use of third-party content, products, or services.

Oracle Exadata Database Machine Monitoring


Table of Contents
Overview..................................................................................................................................................... 1
Monitoring Exadata Storage Servers....................................................................................................... 2
Metrics.................................................................................................................................................... 2
Alerts...................................................................................................................................................... 2
What to Monitor, How to Monitor............................................................................................................. 3
Third Party Monitoring Tools..............................................................................................................................5
Monitoring Database Servers................................................................................................................... 6
Hardware................................................................................................................................................ 6
Metrics.................................................................................................................................................................6
Alerts...................................................................................................................................................................7
Operating System................................................................................................................................... 7
Metrics.................................................................................................................................................................7
Alerts...................................................................................................................................................................8
Oracle Grid Infrastructure....................................................................................................................... 8
Metrics.................................................................................................................................................................8
Alerts...................................................................................................................................................................8
What to Monitor, How to Monitor.......................................................................................................................8
Oracle Database..................................................................................................................................... 8
Metrics.................................................................................................................................................................8
Alerts...................................................................................................................................................................8
What to Monitor, How to Monitor.......................................................................................................................8
Enterprise Manager Agent...................................................................................................................... 9
Metrics.................................................................................................................................................................9
Alerts...................................................................................................................................................................9
Monitoring the InfiniBand Network.......................................................................................................... 9
InfiniBand Switches................................................................................................................................ 9
Sun InfiniBand Switch software version 1.0.1.....................................................................................................9
Sun InfiniBand Switch software version 1.1.3...................................................................................................10
InfiniBand Ports.................................................................................................................................... 10
Exadata Storage Servers...................................................................................................................................10
Database Servers...............................................................................................................................................10
Enterprise Manager User-Defined Metric NETwork InterFace State (emudm_netif_state.sh)..............11
Installation Instructions.....................................................................................................................................11
InfiniBand Fabric................................................................................................................................... 12
Monitoring the Cisco Catalyst Ethernet Switch....................................................................................13
Metrics.................................................................................................................................................. 13
Alerts..................................................................................................................................................... 13
How to Monitor, What to Monitor........................................................................................................... 13
Monitoring the Sun Power Distribution Units (PDUs)...........................................................................14
Monitoring the Avocent MergePoint Unity KVM....................................................................................15

Oracle Exadata Database Machine Monitoring

iii

Oracle Exadata Database Machine Monitoring


This document discusses how to monitor all components of a Oracle Exadata Database Machine, including both
hardware and software.

Overview
Oracle Exadata Database Machine consists of the following components:

Exadata Storage Servers

Database servers

InfiniBand network

Cisco Catalyst Ethernet switch

Sun Power Distribution Units

Avocent MergePoint Unity KVM (only on Oracle Exadata Database Machine X2-2)

Some of the components are separated into multiple monitoring categories. The following table summarizes how
each component is monitored, divided into multiple categories, if applicable. Details of monitoring each component
are provided in the sections below, including definitions of the abbreviations used in the table.

Component

Exadata Storage Servers

Database servers

Category

Monitor Summary

Hardware

Monitored by ILOM and MS


Alert notification by EMGC

Operating system

Monitored by MS
Alert notification by EMGC

InfiniBand server ports

Monitored by MS
Alert notification by EMGC

Exadata software

Monitored by MS
Alert notification by EMGC

Hardware

Monitored by EM Agent
Alert notification by EMGC

Operating system

Monitored by EM Agent
Alert notification by EMGC

InfiniBand server ports

Monitored by EM Agent
Alert notification by EMGC

Oracle Grid Infrastructure

Monitored by EM Agent
Alert notification by EMGC

Oracle Database

Monitored by EM Agent
Alert notification by EMGC

Sun InfiniBand switches

Monitored by EM Agent
Alert notification by EMGC

InfiniBand fabric

Monitored directly

All

Monitored by EM Agent
Alert notification by EMGC

All

Monitored by EM Agent
Alert notification by EMGC

InfiniBand network

Cisco Catalyst Ethernet switch


Sun Power Distribution Units

Oracle Exadata Database Machine Monitoring

Component
Avocent MergePoint Unity KVM

Category

Monitor Summary

All

Monitored by EM Agent
Alert notification by EMGC

This document describes how each component of Oracle Exadata Database Machine is monitored, and explains
what should be monitored for each component. Refer to the EM Exadata Launchpad Deployment document for
instructions about configuring Oracle Enterprise Manager Grid Control (EMGC) to monitor components of Oracle
Exadata Database Machine.

Monitoring Exadata Storage Servers


An Exadata Storage Server is monitored as a single target in Enterprise Manager Grid Control (EMGC), which
covers the following components:

Hardware

Operating system

Exadata Storage Server software

Exadata Storage Servers are independent units that are each identified as separate targets in EMGC. However,
storage servers are grouped together under the system dashboard for a Oracle Exadata Database Machine so they are
monitored together as a group.

Metrics
There are two different types of related metrics for Exadata Storage Server: storage server metrics and, when
monitored with Exadata Storage Server Plug-In, Enterprise Manager (EM) metrics. In most cases there is a one-toone mapping between the two. Exadata Storage Server Management Server (MS) collects, computes, and manages
storage server metrics. These storage server metrics are then gathered by Exadata Storage Server Plug-In from a
storage server and presented to the user in EMGC as EM metrics.

Alerts
All Exadata Storage Server alerts are delivered by the storage server to EMGC using Simple Network Management
Protocol (SNMP). The communication between the Exadata Storage Server and EMGC is done through the Exadata
Storage Server Plug-In.
There are two types of server alerts that come from Exadata Storage Server:

For Integrated Lights Out Manager (ILOM)-monitored hardware components, ILOM reports a failure
or threshold exceeded condition as an SNMP trap, which is received by MS. MS processes the trap, creates
an alert for the storage server, and delivers the alert via SNMP to EMGC through the Exadata Storage
Server Plug-In.

For MS-monitored hardware and software components, MS processes a failure or threshold exceeded
condition for these components, creates an alert, and delivers the alert via SNMP to EMGC through the
Exadata Storage Server Plug-In.

From an end-user perspective there is no difference between these two kinds of alerts. An alert message contains
corrective action to perform to resolve the alert. For example, the circled area in the following screen shot indicates
the action to take for this specific alert.

Oracle Exadata Database Machine Monitoring

The fix for bug 8814019 should be installed to resolve incorrect text in the Action portion of Exadata alerts.
Alerts may also be delivered directly (i.e. not through EMGC) via email or SNMP to other SNMP managers with the
proper configuration using the CELLCLI ALTER CELL command. See Exadata Storage Server Software User's
Guide for details.

What to Monitor, How to Monitor


The table below contains a list of what should be monitored from an administrator perspective, and how it should be
monitored.
Monitoring summary table

Oracle Exadata Database Machine Monitoring

What to monitor
Hardware failure
and sensor state

How to monitor
Monitored automatically by
ILOM and MS

Comment
The hardware of an Exadata Storage Server
is monitored collectively by Sun Integrated
Lights Out Management (ILOM) and the
Exadata Storage Server software component
Management Server (MS). Together they
provide full hardware monitoring and alerting.
ILOM monitors availability and sensor state
using preset thresholds for hardware
components of Exadata Storage Server, such
as system motherboard, processors, memory,
power supplies, fans, and network interface
controllers.
MS monitors other hardware components
directly, including the following: disk
controller, hard disk drives, flash accelerator
cards, and InfiniBand host controller adapter
(HCA).

Exadata Storage
Server availability

Monitored automatically by
Exadata Storage Server
Management Plug-In

Built in to the EMGC Oracle Exadata Storage


Server Management Plug-In

Undelivered alerts
in cell
ALERTHISTORY

Periodically run the following


command in CellCLI:

If communication is disrupted between a


storage server and the EM agent where the
Exadata Storage Server Management Plug-In
is deployed, alerts processed by MS may not
be delivered to EMGC.

Disk I/O errors

list alerthistory where


notificationState like
'[023]' and severity like
'[warning|critical]' and
examinedBy = NULL;

In CellCLI, create a warning


threshold with the following
command:
create threshold
CD_IO_ERRS_MIN warning=1,
comparison='>=',
occurrences=1, observation=1;

When an alert condition has been reviewed


The purpose of the warning threshold for this
metric is to identify disk I/O errors. Disk I/O
errors are typically automatically handled by
Oracle software. No action is required if this is
the only alert produced. However, many
warnings for this metric for a single drive or a
single cell may indicate a precursor to a
problem.
If this warning is reported and a critical alert is
also generated by MS for a disk component,
follow the action specified in the alert.

Network errors

In Exadata Storage Server


Plug-In, for metrics
Host MB Dropped Per Sec
Host RDMA MB Dropped
Per Sec
Set the following:
Warning threshold to 1
Collection Schedule to
Repeat Every 5 Minutes

The purpose of the warning threshold for this


metric is to identify network errors. Network
errors are typically automatically handled by
Oracle software. No action is required if this is
the only alert produced. However, many
warnings for this metric for a single cell may
indicate a precursor to a problem.
If a critical alert is also generated by MS for a
network component, follow the action
specified in the alert.

Upload Interval 3 Collections.

Oracle Exadata Database Machine Monitoring

What to monitor
File system free
space

How to monitor
Monitored automatically by MS

Comment
Do not set metric threshold for file system
free space in either CellCLI or in Exadata
Storage Server Plug-In. MS monitors file
system free space, generates critical alerts
when free space becomes low, and takes
corrective action to free used space.

Metrics across
storage servers
CPU and memory
utilization
I/O latency and
throughput

In Exadata Storage Server


Plug-In, view Realm
Performance report to identify a
storage server with higher
resource utilization or I/O
latency than others.

Database files should be spread evenly


across all storage servers, therefore work
performed by storage servers to satisfy
database requests should be evenly spread
across all storage servers. Significant
imbalance in CPU or memory utilization, or
I/O performance for a single storage server
compared to the others warrants additional
investigation.

Metrics within a
storage server
CPU and memory
utilization
I/O latency and
throughput

User defined

CPU, memory, and I/O metric values on


Exadata Storage Server can be higher than a
typical database or general-purpose server,
yet still be in proper operating range. For
example, CPU may be 100% utilized while
performing offloaded queries of EHCC
compressed data that is being read from both
disk drives and flash cache. CPU being
100% utilized in this case is not problematic
and no alert should be generated based on a
default threshold.

Useful for comparison against a


previously captured metric
baseline.

Threshold values for resource utilization


within a storage server are only useful and
relevant if baseline values are captured
during normal workload and used to compare
against current metric values.

Third Party Monitoring Tools


It is not permissible to install any additional software, including third party monitoring agents, on Exadata Storage
Server.

Oracle Exadata Database Machine Monitoring

Monitoring Database Servers


Database server monitoring is divided into the following sections:

Hardware

Operating system

Oracle Grid Infrastructure

Oracle Database

Enterprise Manager Agent

Hardware
ILOM monitors availability and sensor state using preset thresholds for hardware components of database server,
such as system motherboard, processors, memory, power supplies, fans, and network interface controllers. The
availability and sensor state can be monitored using the Oracle Exadata Database Server Integrated Lights Out
Management (ILOM) plug-in. Please refer to System Monitoring Plug-in Installation Guide for Oracle Exadata
ILOM for instructions on installing and configuring the plug-in.
Metrics
There are no Exadata-specific thresholds to set for database server hardware monitoring. Failure conditions and
threshold settings for hardware sensor readings for the components monitored by ILOM are preset in ILOM and are
sufficient for the level of monitoring necessary for Exadata.
To view current sensor readings, log in to Enterprise Manager and navigate to All Metrics from the plug-in home
page.

To view current component status, including those that have a Faulted status, expand the Sensor Alert section under
All Metrics and review the metrics for each sensor. Components in Faulted status will generate an alert. Any active
alert will be visible on the home page of the target plug-in.

Oracle Exadata Database Machine Monitoring

Alerts
To view the history of alerts that have been generated by ILOM navigate to the target home page and expand the
Sensor Alert section under All Metrics and review the metrics for each sensor. A history of each sensor state is
available for up to 31 days.

Alerts generated by database server ILOM and captured by the plug-in may be delivered via Enterprise Manager in
the same fashion as an alert generate by any Enterprise Manager Target. Refer to Enterprise Manager Connector
Integrators Guide for details on configuring and forwarding alerts.

Operating System
Database server operating system, Oracle Enterprise Linux, is viewed in EMGC as Host target.
Metrics
There are no Exadata-specific monitoring requirements. The metrics and default thresholds provided by EMGC are
sufficient for the level of monitoring necessary for Exadata, except for the cases noted below. Thresholds may be
changed or set in the Metric and Policy Settings page in EM to handle site-specific requirements. The list of metrics
and default thresholds in EMGC for a Host target is available in the Oracle Enterprise Manager Framework, Host,
and Services Metric Reference Manual.
Database disk I/O is done to Exadata Storage Servers through the InfiniBand network over iDB protocol. Therefore,
monitoring metric thresholds on database servers relating to disk I/O (e.g. CPU in I/O Wait (%), Disk Device Busy
(%), or Average Disk I/O Service Time (ms)) will provide no value for monitoring database performance. Disk I/O
must be monitored at Exadata Storage Servers through the Exadata Storage Server Plug-In.
Oracle Exadata Database Machine Monitoring

Alerts
Database server operating system alerts are generated directly by EMGC based on default metric thresholds set for
the Host target.

Oracle Grid Infrastructure


Oracle Automatic Storage Management, Oracle Listener, and Oracle Clusterware are viewed in EMGC as Automatic
Storage Management, Listener, and Cluster targets, respectively.
Metrics
The metrics and default thresholds provided by EMGC are sufficient for the level of monitoring necessary for
Exadata. Thresholds may be changed or set in the Metric and Policy Settings page in EM to handle site-specific
requirements. There are additional Exadata-specific monitoring requirements, as indicated below. The list of
metrics and default thresholds in EMGC for Automatic Storage Management, Listener, and Cluster targets is
available in the Oracle Enterprise Manager Oracle Database and Database-Related Metric Reference Manual.
Alerts
Database server operating system alerts are generated directly by EMGC based on default metric thresholds set for
the Automatic Storage Management, Listener, and Cluster targets.
What to Monitor, How to Monitor
Currently monitoring for the following items is not integrated with EMGC. In the future these may be supplied as
user-defined metrics.
What to monitor
ASM: Connectivity
issues to cells

How to monitor
ASM instance alert.log contains
connect: ossnet: connection failed to
server <ipaddr>, result=5 (login:
sosstcpreadtry failed)

Comment
Check every 5 minutes.

Oracle Database
Oracle Databases are viewed in EMGC as Cluster Database and Database Instance targets.
Metrics
The metrics and default thresholds provided by EMGC are sufficient for the level of monitoring necessary for
Exadata, except for the cases noted below. Thresholds may be changed or set in the Metric and Policy Settings page
in EM to handle site-specific requirements. The list of metrics and default thresholds in EMGC for Cluster Database
and Database Instance targets is available in the Oracle Enterprise Manager Oracle Database and DatabaseRelated Metric Reference Manual.
Alerts
Database alerts are generated directly by EMGC based on default metric thresholds set for the Cluster Database and
Database Instance target.
What to Monitor, How to Monitor
What to monitor
Connectivity issues
to cells

How to monitor
Alert.log contains connect: ossnet:
connection failed to server <ipaddr>,
result=5 (login: sosstcpreadtry failed)

Oracle Exadata Database Machine Monitoring

Comment
Check every 5 minutes.

Enterprise Manager Agent


EM agents are viewed in EMGC as Agent target.
Metrics
There are no Exadata-specific monitoring requirements. The metrics and default thresholds provided by EMGC are
sufficient for the level of monitoring necessary for Exadata. Thresholds may be changed or set in the Metric and
Policy Settings page in EM to handle site-specific requirements. The list of metrics and default thresholds in EMGC
for an Agent target is available in the Oracle Enterprise Manager Framework, Host, and Services Metric Reference
Manual.
Alerts
Database server operating system alerts are generated directly by EMGC based on default metric thresholds set for
the Agent targets.

Monitoring the InfiniBand Network


InfiniBand network monitoring is divided into three areas for monitoring purposes: InfiniBand switches, InfiniBand
ports, and InfiniBand fabric.
Currently monitoring for the InfiniBand network is not integrated with EMGC.

InfiniBand Switches
Monitoring of the Sun Datacenter InfiniBand Switch 36 InfiniBand switches provided with Oracle Exadata Database
Machine requires checking for failed hardware components and sensors that have exceeded preset thresholds on the
switch, and checking for port errors that have occurred on switch ports. The table below shows what to monitor on
InfiniBand switches. These checks should be run approximately every 60 to 120 seconds.
What to monitor
Hardware failure
and sensor state

How to monitor

Switch port
errors

ibqueryerrors.pl -s
RcvSwRelayErrors,RcvRemotePhysErrors,Xm
tDiscards,XmtConstraintErrors,RcvConstr
aintErrors,ExcBufOverrunErrors,VL15Drop
ped

showunhealthy
checkpower

Comment
See Sun InfiniBand Switch software
version 1.0.1 section below for
details.
SymbolErrors or RcvErrors or
LinkIntegrityErrors should not
increase without LinkDowned
increasing.
A single invocation of this command
will report on all switch ports on all
switches.
Run this check from a database
server or a switch.

Sun InfiniBand Switch software version 1.0.1


Sun IB switches running switch software version 1.0.1 do not have SNMP support. Switch status can be obtained by
logging into the switch as root user and running the showunhealthy and checkpower commands.
# ssh root@dm01sw-ib2
root@dm01sw-ib2s password:
[root@dm01sw-ib2 ~]# showunhealthy
OK - No unhealthy sensors

The showunhealthy command should produce the output OK - No unhealthy sensors. If output differs, run
the command env_test to get detailed status information for all switch sensors. Note that showunhealthy
will indicate OK - No unhealthy sensors if a power supply is offline, so they must be checked separately.

Oracle Exadata Database Machine Monitoring

[root@dm01sw-ib2 ~]# checkpower


PSU 0 present status: OK
PSU 1 present status: OK

The checkpower command should indicate that both power supplies have status OK.
Sun InfiniBand Switch software version 1.1.3
Sun IB switch software version 1.1.3, which supports SNMP. To leverage SNMP support, an EM Sun InfiniBand
Switch Management Plug-In is planned so that switch status can be monitored within EMGC. If it is not possible to
monitor the switch using theplug-in, monitor the switch using the instructions for switch software version 1.0.1.
The plug-in will show two pieces of information on the home page, a ping response time graph and a response
metric that is determined by polling an aggregrate of sensor information from the switch. If at any point, one of the
sensors on the switch that is a part of the aggregrate asserts or shows an error, the overall response metric will
change to down. At this point, run the get unhealthy script as documented above in the v 1.01 section and contact
Oracle support for further instructions. A screen shot of the plug-in is shown below.

Additional information for Sun Datacenter InfiniBand Switch 36 is available at http://docs.sun.com/source/8350784-04/index.html.

InfiniBand Ports
InfiniBand port monitoring checks the health of InfiniBand network ports and interfaces on database servers and
Exadata Storage Servers.
Exadata Storage Servers
InfiniBand port monitoring on storage servers is performed by MS. No additional IB monitoring is required on
storage servers. If an IB port is not functioning correctly, MS creates an alert, and delivers the alert via SNMP to
EMGC through the Exadata Storage Server Plug-In. Alert messages contain corrective action to perform to resolve
the alert. Refer to the Monitoring Exadata Storage Servers section for additional details.
Database Servers
Database servers have no built in IB monitoring. The table below shows what to monitor on each database server.
These checks should be run approximately every 60 to 120 seconds.
What to monitor
Port state, physical port state,
and rate (ib0, ib1)

Oracle Exadata Database Machine Monitoring

How to monitor
/usr/sbin/ibstatus

Comment
Expected output for each port:
state:
4: ACTIVE
phys state: 5: LinkUp
rate:
40 Gb/sec (4X QDR)

10

What to monitor
Port errors (ib0, ib1)

How to monitor

Interface state (bond0, ib0,


ib1)

/sbin/ifconfig bond0
/sbin/ifconfig ib0
/sbin/ifconfig ib1

Connectivity to other hosts

/bin/ping <remoteIBhost>
/usr/bin/rds-ping
<remoteIBhost>

/usr/sbin/perfquery

Comment
SymbolErrors or RcvErrors or
LinkIntegrityErrors should not
increase without LinkDowned
increasing.
All interfaces should be UP.
A database server should have
connectivity via ping and rds-ping to
all storage servers and database
servers over the IB network.

These checks can be automated using User Defined Metrics (UDMs) in Grid Control. There are two user-defined
metric (UDM) scripts provided for database servers. These scripts can be downloaded from MOS note 1110675.1
Enterprise Manager User-Defined Metric InfiniBand net CONNECTivity check (emudm_ibconnect.sh)
This UDM script will monitor connectivity over the IB network to other database servers and storage servers. This
list of database servers is built from ocrdump SYSTEM.crs.e2eport key. The list of cells is built from
cellip.ora. This script will not validate connectivity to additional devices on the IB network such as media servers.
Note that ibhosts is not be used because of root permissions. The approach used in this script is preferred as
scope is limited to servers that are relevant to the Exadata devices, which may not be the whole IB network.

Enterprise Manager User-Defined Metric NETwork InterFace State (emudm_netif_state.sh)


This UDM script will monitor the network interface state (Ethernet or InfiniBand). If interface is InfiniBand, it will
check the underlying IB port, state, rate, and 3 error counters as defined above. If the interface is the master for
Linux bonding, then it will verify state of the slave interfaces.
The error counters monitored are RcvErrors, SymbolErrors, and LinkIntegrityErrors. They are reported only when
they have increased since the last invocation of the script and only when LinkDowned has not increased. Previous
values are cached in temporary file /var/tmp/ibCounterCache_<device><port>.
$1 - interface to check state (e.g. bond0, eth0)
There will be one instance of this UDM per network port to be monitored.
Installation Instructions
On each database server, perform the following actions to install the supplied user-defined metrics (UDMs).
1.

Place the UDM scripts in /u01/app/oracle/product/11.1.0/udm_scripts (this is a default


location, the scripts are not location dependent)

2.

Set proper ownership and permission


chown
chown
chmod
chmod

oracle emudm_ibconnect.sh emudm_netif_state.sh


oracle emudm_ibconnect.sh emudm_ibconnect.sh
u+x emudm_ibconnect.sh emudm_netif_state.sh
u+x emudm_ibconnect.sh emudm_ibconnect.sh

3.

Login to Enterprise Manager

4.

Navigate to the Host target screen of the database server

5.

In the bottom right of the screen click User-Defined Metrics

6.

Click Create to create UDMs based on the guidelines below.

Create one UDM for InfiniBand network connectivity. Use the following values for the fields in the
Create User-Defined Metric screen:

Oracle Exadata Database Machine Monitoring

11

Metric name: InfiniBand network connectivity


Metric Type: String
Command Line: /u01/app/oracle/product/11.1.0/udm_scripts/emudm_ibconnect.sh
User Host Name: (host user name)
Host User Password: (host user password)
Comparison Operator: MATCH
Warning: WARNING
Critical: CRITICAL
Schedule
Repeat every 15 minutes

Create one UDM for each network interface to be monitored. For example, if interfaces bond0, eth0,
and bond1 are configured, then create 3 UDMs. Use the following values for the fields in the Create
User-Defined Metric screen, replacing bond0 with the proper the interface name:
Metric name: bond0 network interface status
Metric Type: String
Command Line: /u01/app/oracle/product/11.1.0/udm_scripts/emudm_netif_state.sh bond0
User Name: oracle
Password: welcome
Comparison Operator: MATCH
Warning: WARNING
Critical: CRITICAL
Schedule
Repeat every 5 minutes

InfiniBand Fabric
The table below shows what to monitor for the InfiniBand fabric. Run these checks from only one database server.
What to
monitor
Subnet
manager
(SM)
master
location

How to monitor

Comment

/usr/sbin/sminfo reports SM
master is running on IB switch returned
by /usr/sbin/ibswitches

The following example shows the SM master


is running on a switch:
# sminfo
sminfo: sm lid 1 sm guid
0x21283a8516a0a0, activity count 933330
priority 5 state 3 SMINFO_MASTER
# ibswitches
Switch : 0x0021283a8983a0a0
Switch : 0x0021283a8516a0a0
Switch : 0x0021283a89bda0a0

In a full rack or multi-rack configuration that


has one or more spine switches, SM master
should run on a spine switch. Spine switch is
a switch that has only other switches
connected to it (i.e. no servers connected
directly). The following command will identify
spine switches:
# ibnetdiscover -p | awk
/^SW +[0-9]+ +[0-9]+ +0x[0-9\
a-e]+ +[0-9]+x .DR - [SW|CA].*/ {
if (spine[$4]==) spine[$4]=yes
if ($8 == CA) spine[$4]=no
}

Oracle Exadata Database Machine Monitoring

12

What to
monitor

How to monitor

Comment
END {
for (val in spine)
if (spine[val]==yes)
print val
}

Network
topology

/
opt/oracle.SupportTools/ibdiagtools/
verify-topology

Link status

/usr/sbin/iblinkinfo.pl -Rl

Run this check once per day.


verify-topology should return SUCCESS for all
checks performed.
Run this check once per day.
Current iblinkinfo.pl output should match
previously saved output taken when IB
network was in known good state.
Run this check every 1 to 5 minutes.

Monitoring the Cisco Catalyst Ethernet Switch


The Cisco Ethernet switch is not in the critical data path. It does not participate in client connectivity to the database
or database connectivity to the storage servers. However, monitoring and administrative traffic does depend on the
availability of the Cisco Ethernet switch. The primary goal of monitoring the Cisco Ethernet switch is to identify
hardware component failure and environmental conditions that can lead to switch malfunction. The switch monitors
availability and sensor thresholds for its hardware components.

Metrics
Cisco Ethernet switch metrics are provided by the switch to the EM Cisco Switch Management Plug-In. Instruction
for installing the plug-in are contained in the System Monitoring Plug-in Installation Guide for Oracle Exadata
Cisco Switch. documentation. The metrics and default thresholds provided by the plug-in are sufficient for the level
of monitoring necessary to ensure switch availability. Thresholds may be changed or set in the Metric and Policy
Settings page in EM to handle site-specific requirements.

Alerts
Cisco Ethernet switch reports an alert condition as an SNMP trap, which is received by EMGC through the EM
Cisco Switch Management Plug-In.

How to Monitor, What to Monitor


The conditions monitored are determined by the Cisco switch notification types enabled. The following switch
notification types are recommended:
Switch notification type to
enable
envmon

Description
Environmental monitor notification that sends alert for fan,
shutdown, power supply, temperature.

Additional information about the Cisco Ethernet switch is available at


http://www.cisco.com/en/US/products/hw/switches/ps4324/products_installation_and_configuration_guides_list.htm
l.

Oracle Exadata Database Machine Monitoring

13

Monitoring the Sun Power Distribution Units (PDUs)


The PDU metering unit enables you to monitor the current being used by equipment connected in the Oracle
Exadata Database Machine rack. You can monitor the current in person by viewing the LCD screen on the PDU
itself or remotely from a system on the network.
To remotely monitor PDUs, connect each PDU to the management network. PDUs should be connected with an
Ethernet network cable directly to a data center switch, not to the Cisco Ethernet switch on the rack. Network
configuration steps and instructions for monitoring PDUs are available at http://docs.sun.com/source/820-476015/toc.html.
Access to the PDU can be monitored using Enterprise Manager and the Exadata Power Distribution Unit Plug-in.
The instructions for performing the install are in the Oracle Enterprise Manager System Monitoring Plug-In
Installation Guide for Exadata Power Distribution Unit documentaton.

The PDU plug-in requires the PDU to be using version 1.0.2 or greater of the PDU firmware. Instructions for
determining the firmware are available in the firmware documentation. See Oracle Database Machine Monitoring
Best Practices (Doc ID 1110675.1) on My Oracle Support for information on the latest version of the firmware.
Once the plug-in is installed, it needs to be configured with the appropriate monitoring thresholds. Those thresholds
depend on several factors (the size of the Exadata configuration, the power and voltage type). Determine the values
for those factors and then update the PDU monitoring information in the plug-in metrics using the values and
procedures documented in the My Oracle Support note PDU Threshold Settings for Oracle Exadata Database
Machine (Doc ID 1299851.1).

Oracle Exadata Database Machine Monitoring

14

Monitoring the Avocent MergePoint Unity KVM


The KVM provides remote console access to the Exadata Storage Servers and database servers in the rack. Its
primary purpose is to provide keyboard, video, and mouse control for servers when performing administrative
functions when present at the Oracle Exadata Database Machine in the data center.
Access to the KVM can be monitored installing the Exadata Avocent MergePoint Unity Switch plug-in. The Avocent
MergePoint Unity Switch Plug-in enables Enterprise Manager Grid Control to monitor KVM (keyboard, video or
visual display unit, mouse) targets. The plug-in provides status of the KVM and the event occurrences like Factory
Defaults Set, Fan Failure, Aggregated TargetDevice Status, Power Supply Failure, Power Supply Restored, Reboot
Started, Temperature Out of Range on the KVM target.
The instructions for installing the plug-in are documented in the Oracle Enterprise Manager System Monitoring
Plug-In Installation Guide for Exadata Avocent MergePoint Unity Switch Documentation.

Refer to http://pcs.mktg.avocent.com/@@content/manual/590883501c.pdf for additional details.

Oracle Exadata Database Machine Monitoring

15

Вам также может понравиться