Вы находитесь на странице: 1из 27

IBM Tivoli Netcool/OMNIbus 7.3.1 Key Performance Indicators Presented by Christina Grimes clgrimes@us.ibm.

com

IBM Tivoli Netcool/OMNIBUS Key Performance Indicators

2010 IBM Corporation

Agenda
Introduction Object Server Key Performance Indicators Probe and Gateway Key Performance Indicators Q&A

2010 IBM Corporation

Introduction
To check that Omnibus is performing well, there are several key performance indicators that can be monitored. When using the key performance indicators, first establish a baseline on the system when it is under normal load and operation. The Key Performance indicators can be used to measure performance when changes are made to the environment, by comparing the baseline to the KPI measurements after the change.

2010 IBM Corporation

Object Server Key Performance Indicators


Object Server Granularity Profiler Report Trigger Stats Number of rows in alerts.status, alerts.journal and alerts.details Number of inserts in the alerts.status table Number of Connections CPU usage of nco_objserv Memory Usage of nco_objserv Memstore usage

2010 IBM Corporation

Object Server Granularity


The Object Server property Granularity specifies the update interval for IDUC Insert, Delete, Update, Control notifications to clients. The Default Granularity period is 60 seconds. It is not recommended to change the default. The Object Server should service all client and trigger requests within the Granularity period. Client and trigger requests may temporarily take more time than the Granularity period. If client and trigger requests consistently take more time than the Granularity period, Object Server performance will be degraded. Resulting in: Client disconnections Delays or errors in logging into the Object Server Delays in events being propagated between multiple Object Servers Object Server will eventually become unresponsive

2010 IBM Corporation

Object Server Profiler Report


Profile statistics are also logged to a profile log file $NCHOME/omnibus/log/ObjectServername_profiler_report.log#. The profiler report shows a breakdown of the time spent for each client connection and the total time spent by client type, for each granularity period You can use the profile log file to analyze how the ObjectServer time is spent during each granularity period and calculate the percentage of time used. For example, if the granularity period is set to 60 seconds and the total time spent for all the connections during a particular period was 30 seconds, you can calculate that the ObjectServer spent 50% of its available time on running SQL commands from client connections.

2010 IBM Corporation

Example Profiler Report


Mon Oct 12 17:39:46 2009: Individual user profiles: Mon Oct 12 17:39:46 2009: 'Administrator' (uid = 0) time on adminhost: 0.000000s Mon Oct 12 17:39:46 2009: 'isql' (uid = 0) time on omnihost1.ibm.com: 3.770000s Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on probehost.ibm.com: 5.010000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 0) time on omnihost1.ibm.com: 10.010000s Mon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 45) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 45) time on omnihost1.ibm.com: 9.870000s Mon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 55) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 55) time on omnihost1.ibm.com: 6.020000s Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.270000s Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on omnihost1.ibm.com: 3.010000s Mon Oct 12 17:39:46 2009: Grouped user profiles: Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'PROBE': 8.020000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'GATEWAY': 0.270000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'c@xxxxx@xxxxx:11.0': 0.000000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'e@c0B4D@c0142:11.0': 25.93000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'isql': 3.77000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'Administrator': 0.000000s [ Mon Oct 12 17:39:46 2009: Total time in the report period (59.275782s): 29.980000s

2010 IBM Corporation

Analyzing the Profiler Report


Goal: Profiler Report Total time in the report period + Trigger Statistics Time for all triggers < Object Server Granularity period of 60 seconds. Current Total time in the report period of 29.980000s indicates ~50% of Object Server Granularity period is taken services client connections. Identify the highest clients in the report period: Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 0) time on omnihost1.ibm.com: 10.010000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 45) time on omnihost1.ibm.com: 9.870000s High probes can indicate an event flood or large number of status or details inserts High event lists or WebGUI clients can indicate inefficient filters High Object Server gateway clients can indicate transfer of many events or a resync

2010 IBM Corporation

Object Server Trigger Statistics


Trigger statistics are also logged to the file $NCHOME/omnibus/log/ObjectServername_trigger_stats.log#. The trigger statistics log file shows the amount of time that each trigger has used in the last profiling period. You can use this log file for automation debugging, and to determine which triggers are slow due to slow-running SQL queries.

2010 IBM Corporation

Example Trigger Statistics


Mon Oct 12 18:03:56 2009: Trigger Profile Report . Mon Oct 12 18:03:56 2009: Trigger Group 'primary_only' Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Trigger Group 'iduc_triggers' Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Trigger time for 'disconnect_iduc_missed': 0.000000s Trigger time for 'iduc_stats_update': 0.000949s Trigger time for 'iduc_messages_tblclean': 0.000089s Trigger time for 'deduplicate_iduc_stats': 0.000000s Trigger time for 'iduc_stats_insert': 0.000000s Trigger time for 'backup_succeeded': 0.000000s Trigger time for 'backup_failed': 0.000000s Trigger time for 'backup_state_integrity': 0.000000s Trigger time for 'resync_finished': 0.000000s Trigger time for 'generic_clear': 5.879707s Trigger time for 'expire': 0.008233s Trigger time for 'delete_clears': 0.007219s Trigger time for 'enrich_and_correlate': 23.007219s

Mon Oct 12 18:03:56 2009: Trigger Group 'automatic_backup_system'

Mon Oct 12 18:03:56 2009: Trigger Group 'gateway_triggers' Mon Oct 12 18:03:56 2009: Time for all triggers in report period (60s): 29.789663s

10

2010 IBM Corporation

Analyzing Trigger Statistics


Goal: Profiler Report Total time in the report period + Trigger Statistics Time for all triggers < Granularity period of 60 seconds. Current Profiler report Total time in the report period of 29.980000s + Trigger Statistics Time for all triggers 29.789663s indicates Object Server is at 100% utilization of 60 second Granularity period. Identify the highest triggers: Mon Oct 12 18:03:56 2009: Mon Oct 12 18:03:56 2009: Trigger time for 'enrich_and_correlate': 23.007219s Trigger time for 'generic_clear': 5.879707s

High generic_clear or deduplication triggers can indicate high event throughput or high number of resident events. Ensure best practices are used in creating custom triggers http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc _7.3.1/omnibus/wip/admin/reference/omn_adm_per_bestpracticestriggers.html Ensure trigger execution time is kept to a minimum, no other writes can be performed in the Object Server when a trigger is executed.

11

2010 IBM Corporation

Number of Rows
alerts.status and alerts.journal alerts.details alerts.details table should only be used when alerts.status is not enough to hold enough information for a specific alarm or during rules file development. On production systems, it is suggested to keep the alerts.details table below 5,000 rows. If you have a large number of rows in alerts.details, the ObjectServer performance can be degraded. Details statement in probe rules file are used to generate records into alerts.details table. details($*) will record each token as one row into alerts.details. If you have details($*) enabled in your rules file, for each event in alerts.status table, you might have 10~50 rows in alerts.details table. Details can be disabled by commenting out any details($*) statements in all your probe rules file, restarting all probes, clearing the current records in details table (with "delete from alerts.details")

12

2010 IBM Corporation

Manage volume of alerts.details


Throughput of events on the alerts.status table Deduplication triggers prevent the number of rows from increasing, but still take processing time in the Object Server.

13

2010 IBM Corporation

Object Server Connections


The maximum permitted number of connections is determined by the ObjectServer Connections property, with a default of 30. The Connections property of the Object Server should always be increased in a production environment. Only a finite number of connections can be made to the ObjectServer. When the maximum number of connections is reached, new connections are refused. A refused connection might result in the temporary loss of access to data or loss of input to probes or gateways. The maximum number of connections is 1024. An error is logged in the Object Server log when connections are exceeded. Connections can be monitored through the stats_triggers in the master.stats table
14 2010 IBM Corporation

Monitoring Row Count, Volume and Connections


Enable the stats_triggers trigger group In the default ObjectServer configuration, the stats_triggers group uses triggers to gather several statistics and metrics. You can enable the trigger group in a production environment. The triggers in the stats_triggers trigger group count the Current event counts on alerts.status, alerts.details and alerts.journal number of inserts to the alerts.status, alerts.details and alerts.journal number of deduplications and new inserts to alerts.status number of client connections

15

2010 IBM Corporation

Stats triggers
The data gathered by this trigger group and automation is written periodically to the master.stats table. The default write interval is 300 seconds; this value is configurable in the statistics_gather trigger 5 days of data is retained by default

16

2010 IBM Corporation

CPU usage
Monitor the CPU usage of the nco_objserv process If the Object Server is under heavy load, this will be reflected in CPU usage Profiler report and trigger statistics logs will show the source of the heavy load Sizing considerations https://www.ibm.com/developerworks/mydeveloperworks/wikis/home? lang=en#/wiki/Tivoli%20Netcool%20OMNIbus/page/OMNIbus%20Sizing %20Guide

17

2010 IBM Corporation

Memory Usage
Memory usage of nco_objserv process The memory usage of the process increases proportionally to increases in the number of rows in the alerts.status table, alerts.details table, and the alerts.journal table (or any additional tables you have defined), to increases in the number of connections, and increased usage by clients. The memory usage should remain stable over time, and any increases should correspond to increases in the numbers of table rows or additional clients. Sizing considerations https://www.ibm.com/developerworks/mydeveloperworks/wikis/home? lang=en#/wiki/Tivoli%20Netcool%20OMNIbus/page/OMNIbus%20Sizing %20Guide

18

2010 IBM Corporation

Object Server Memstore Usage


To monitor the memstore, inspect the content of the catalog.memstores table. For each row, compare the value of the UsedBytes column with the values of the SoftLimit column and the HardLimit column. Memstores are containers that are maintained by the ObjectServer, they contain ObjectServer data and tables in the memory. Memstores have a finite size, and, when full, do not permit any further data to be inserted. Consequently, you must ensure that the memstores do not become full. Once a memstore soft limit is exceeded, an error is logged to the Object Server log file. Once the hard limit is exceeded, the Object Server shuts down.

19

2010 IBM Corporation

Summary Monitoring Object Server KPI


Object Server Profiler report and trigger statistics logs Object Server Granularity Profiler Report Trigger Stats Enable Stats triggers Number of rows in alerts.status, alerts.journal and alerts.details Number of inserts in the alerts.status table Number of Connections System monitors CPU usage of nco_objserv Memory Usage of nco_objserv Object Server catalog.memstores table Memstore usage

20

2010 IBM Corporation

Probe and Gateway KPI


Probe Key Performance Indicators CPU Usage Memory Usage Number of events received Average Time spent processing rules Gateway Key Performance Indicators CPU Usage Memory Usage

21

2010 IBM Corporation

Probe Self Monitoring


Probe self monitoring can log the following information in the Probe log and/or at the Object Server CPU time Rules File Processing Time Memory Used Number of Events Received Number of Events Discarded from Rules Number of Events Generated by Self Monitoring http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm. netcool_OMNIbus.doc_7.3.1/omnibus/wip/probegtwy/concept/omn _prb_enablingroi.htm

22

2010 IBM Corporation

Probe Self Monitoring example


Probe Log File Info MessageLevel --- ProbeStats report at Heartbeat --Timestamp: 1294869063 Probe: glf Host: hostname PID: 29976 ProbeID: glf@hostname NumberEventsProcessed: 8785 NumberEventsDiscarded: 0 NumberEventsGenerated: 0 DiscardPercentage: 0.000000 TotalRulesTime: 0.785492 sec TotalCPUTime: 43.089477 sec ProbeUpTime: -1294869003 sec AverageEventsPerSecond: 8785 AverageRulesFileTime: 89 usec --- ProbeStats report end ---

23

2010 IBM Corporation

Summary Monitoring Probe and Gateway KPI


Probe Key Performance Indicators System Monitors and/or Probe Self Monitoring CPU Usage Memory Usage Probe Self Monitoring Number of events received Average Time spent processing rules Gateway Key Performance Indicators System Monitors CPU Usage Memory Usage

24

2010 IBM Corporation

Summary
Object Server Key Performance Indicators Object Server Granularity Profiler Report Trigger Stats Number of rows in alerts.status, alerts.journal and alerts.details Number of inserts in the alerts.status table Number of Connections CPU usage of nco_objserv Memory Usage of nco_objserv Memstore usage Probe Key Performance Indicators Number of events received CPU Usage Memory Usage Average Time spent processing rules Gateway Key Performance Indicators CPU Usage Memory Usage
25 2010 IBM Corporation

Omnibus Best Practice References


Omnibus 7.3.1 Best Practices Guide: https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communityview? communityUuid=cdd16df5-7bb8-4ef1-bcb9cefb1dd40581#fullpageWidgetId=W05de62601548_4e85_8940_81bb58657a85&file=b7d8ad85-715247e8-926a-802246204e8c Sizing Guide: https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Tivoli%20Netcool %20OMNIbus/page/OMNIbus%20Sizing%20Guide Multitier Architecture: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/w ip/install/concept/omn_esf_configuringdeploymultitieredarch.html Best Practices for Performance Tuning: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/w ip/admin/reference/omn_adm_per_bestpracticespertuning.html SQL Query Guidelines: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/w ip/admin/reference/omn_adm_per_sqlqueryguidelines.html Best Practices for Creating Triggers: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/ wip/admin/reference/omn_adm_per_bestpracticestriggers.html
26 2010 IBM Corporation

Questions?

27

2010 IBM Corporation

Вам также может понравиться