Вы находитесь на странице: 1из 171

INDEX

1 Index

2 EMS Solution Architecture

3 Consolidate

4 Reports

5 Report Samples

6 Wintel Parameters

7 Exchange 2007 Parameters

8 Exchange 2003 Parameters

9 Linux Solaris AIX HPUX Parameters

10 Oracle Parameters

11 SQL Server Parameters

12 MySQL

13 PostgreSQL

14 Security

15 Network

16 Backup Monitoring Parameters

17 Storage Monitoring Parameters

18 Middleware Monitoring Parameters

19 Lotus Monitoring Parameters

20 Sun Messaging Monitoring Parameters


EMS SOLUTION ARCHITEC

Signature
EMS Engineer
Name:
Signature:
Date:
SOLUTION ARCHITECTURE

GIS Engineer
Name:
Signature:
Date:
Index/Sum

SL.No Domain / Technology # Servers


1 Linux
2 Sun Solaris
3 AIX
4 HPUX
5 Oracle DB Instances
6 SQL Server DB
7 MySQL
8 Exchange (2003/2007)
9 Network
10 Security
11 Wintel Servers
12 Storage
13 Backup
14 Middleware
15 Lotus

Signature
EMS Engineer
Name:
Signature:
Date:
Index/Summary

EMS Agent Configured and reflecting


EMS Agent installed
in MGT console

GIS Engineer
Name:
Signature:
Date:
List of the Reports (Y/N) Remarks
S.No
1
2
3
4
5
6
7
8
9
10
11
12
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Signature
EMS Engineer
Name:
Signature:
Date:

S.No

10
11

Signature
EMS Engineer
Name:
Signature:
Date:

S.No

6
7
8
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2

3
4
5
6
7
8
9
10
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5

Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3

Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7
8

Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7
8
9
10
Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6

Signature
EMS Engineer
Name:
Signature:
Date:

S.No
1
2
3
4
5
6
7

Signature
EMS Engineer
Name:
Signature:
Date:

* Additional reports might need to be added after the KAP as per customer requirement
Report Name
Call dump and dashboard - Daily report for Wintel Server.
Disk Space Report of FTP and File Server.
CPU Percent Busy
Top Disk Busy
File System Utilization
Top Physical Memory Utilization
Top Network Busy System
Monthly Server Uptime Report
High Call Server Trent Report
Server Hardware Incident Report
Server Incident SLA Exception Report
Server Network Traffic Summary Report

Engineer
:
ture:

Report Name
Call dump and dashboard - Daily report for Wintel Server.
Call dump and dashboard - Daily report for unix Server.
Disk Utilization
CPU Utilization
File System Utilization
Memory Utilization
Swap Utilization
Monthly Server Uptime Report
Error in Log Files
Critical Services Alert
Server Hardware Incident Report
Cluster Failover report
Server Incident SLA Exception Report
Server Network Traffic Summary Report

Engineer
:
ture:

Report Name
Call dump and dashboard - Daily report for Wintel Server.
Call dump and dashboard - Daily report for unix Server.
Disk Utilization
CPU Utilization
File System Utilization
Memory Utilization
Swap Utilization
Monthly Server Uptime Report
Error in Log Files
Critical Services Alert
Server Hardware Incident Report
Cluster Failover report
Server Incident SLA Exception Report
Server Network Traffic Summary Report

Engineer
:
ture:

Report Name

Oracle Availability Histogram

Oracle Availability Details

Oracle Instance Size Trend - Top 20

Oracle Tablespace Size Trend - Top 20

Oracle Segment Size Trend - Top 20

Oracle Tablespace Size - Top 20

Oracle Logons

Buffer Cache Hit Ratio

% of Library Cache Misses To Execution

% of Chained Rows Fetched


% of Dictionary Cache Hits

Engineer
:
ture:

Report Name

MS SQL Availability Details

MS SQL Availability Histogram

MS SQL Databsesize

MS SQL I/O - Top 20

MS SQL Sessions

MS SQL Transactions

Engineer
:
ture:

Report Name
Database Size
Database Availibility:
- Master
- Slave
Query Cache hit ratio.
Key Cache hit ratio.
InnoDB buffer pool Cache hit ratio.
Trend of Temp Table on disk.
Table Cache Utilization.
% of free space on data file device.
Thread Cache Utilization Report.
Incident SLA exception report.
Engineer
:
ture:

Report Name
Database Size
Database Availibility
Tablespace Size
Incident SLA Exception Report
% of free space on Tablespace Drive

Engineer
:
ture:

Report Name
Daily Backup Status Report
Incident SLA Exception Report
Backup Application uptime

Engineer
:
ture:

Report Name
Storage Up time Report
Storage Utilization report
Performance Summary

Engineer
:
ture:

Report Name
Call dump and dashboard - Daily report for Network devices
Device Availabity reports
Memory util Report
CPU Util Reports
Link Uptime Reports
Link Bandwidth Util Report
Interface/ port Availability Report

Engineer
:
ture:

Description
Call dump and dashboard - Daily report for Security devices
Device Availabity reports
Memory util Report
CPU Util Reports
Reports of respective native tools

Engineer
:
ture:

Report Name
Monthly Server Uptime Report
Messaging Incident SLA Exception Report
Mailbox Size trend report
Folder Usage Trends
Top Mailboxes.
Transaction Logs
Inactive Mailboxes
Mailbox store stats

Mailbox Summery

Engineer
:
ture:

Report Name
Middleware Server Availability Report
Application Availability report
Middleware Incident SLA Exception Report
thread pool usage trend report
JVM usage trend report
Connection pool usage trend report
Transactions trend report
Web application session report
EJB invocation reports
Health reports for all application servers based on response time
Engineer
:
ture:

Report Name
Lotus Server Availability Report
Lotus Server Availability Index report
Call Analysis Report
Lotus Server Uptime Report
Port Queue Analysis Report
Lotus Incident SLA Exception Report

Engineer
:
ture:

Messa
Report Name
Messaging Incident SLA Exception Report
SMTP Mail Trafiic
User ID Creation Deletion & Tranfer
User ID Reconcilliation
Top Ten Users
Messaging Server Uptime Report
Mail store stats

Mailbox Summery

Engineer
:
ture:

ditional reports might need to be added after the KAP as per customer requirement
Reports

Wintel Server
Description Frequency
Daily
Daily
Daily
Daily
Daily
Daily
Daily
Monthly
Quarterly
Monthly
Weekly
Daily

GIS Engineer
Name:
Signature:
Date:

Linux Server
Description Frequency
Daily
Daily
Daily
Daily
Daily
Daily
Monthly
Quarterly
Daily
Monthly
Monthly
Weekly

GIS Engineer
Name:
Signature:
Date:

Sun Server
Description Frequency
Daily
GIS Engineer
Name:
Signature:
Date:

Oracle DB
Description Frequency

This report contains daily histograms showing Instance uptime Daily


and downtime in percentage and minutes .

This report contains spectrum graphs showing minutes of Daily


uptime by day and hour for each instance.

This report shows the number of megabytes used in Database,


Daily
which had the most dynamic space usage over the reporting
interval.

This report shows the number of megabytes used in


Daily
tablespaces, which had the most dynamic space usage over
the reporting interval.

This report shows the number of megabytes used in


Daily
tablespaces which had the most dynamic space usage over the
reporting interval.

This report shows Oracle physical I/O (reads plus writes) by


tablespace by day for the tablespaces with the most physical Daily
I/O during the reporting interval. The report is sorted by total
physical I/O.

This report shows the number of current user logons per Daily
instance. Data was collected periodically throughout the day.
This below report shows, which the current percentage of
buffer cache reads to physical, reads off the all Databases in a Monthly
Report.

This below report shows the percentage of Library cache Quarterly


misses to execution of all the databases.
This below report shows the percentage of Chained Rows
Fetched. Monthly
This below report shows the percentage of Dictionary Chained
Weekly
Rows Fetched.

GIS Engineer
Name:
Signature:
Date:

SQL Server DB
Description Frequency

This report contains spectrum graphs showing minutes of upti


me by day and hour for each instance

This report contains daily histograms showing the number of


instances in each range based on percent uptime and tables
below each histogram showing the minutes and percentage of
uptime, downtime and unknown time for each instance

This report shows megabytes allocated, megabytes used and


percentage used for the top databases in each category.

This report shows SQL Server physical I/O (reads plus writes)
by instance by day for the instances with the most physical I/O
during the reporting interval.
This report shows the number of total connections for MS SQL
Server instances.

This report shows transaction volume for SQL Server instances

GIS Engineer
Name:
Signature:
Date:

MySQL
Description Frequency
Weekly

Weekly

Monthly
Monthly
Monthly
Monthly
Monthly
Monthly
Monthly
Monthly
GIS Engineer
Name:
Signature:
Date:

PostgreSQL
Description Frequency
Weekly
Weekly
Monthly
Daily
Monthly

GIS Engineer
Name:
Signature:
Date:

Backup
Description Frequency
Daily
Monthly
Monthly

GIS Engineer
Name:
Signature:
Date:

Storage Reports
Description Frequency
Daily
Monthly
Monthly

GIS Engineer
Name:
Signature:
Date:

Network Reports
Description Frequency
Daily/Weekly/Monthly
IP Availability of the Network device Daily/Weekly/Monthly
Memory Utilisation performance report Daily/Weekly/Monthly
CPU Utilisation performance report Daily/Weekly/Monthly
Link Utilisation performance report Daily/Weekly/Monthly
Bandwidth Utilisation performance report Daily/Weekly/Monthly
Interface availability report Daily/Weekly/Monthly

GIS Engineer
Name:
Signature:
Date:

Security Reports
Description Frequency
Daily/Weekly/Monthly
IP Availability of the security device Daily/Weekly/Monthly
Memory Utilisation performance report Daily/Weekly/Monthly
CPU Utilisation performance report Daily/Weekly/Monthly
Native reports Daily/Weekly/Monthly

GIS Engineer
Name:
Signature:
Date:

Messaging - Exchange
Description Frequency
Monthly
Daily
Weekly
Monthly
Monthly
Monthly
Weekly
Monthly
Details summery for mailbox usage. Summery of each mailbox
Monthly
use, no of mails in a mailbox per month

GIS Engineer
Name:
Signature:
Date:

Middleware
Description Frequency
Monthly
Monthly
Monthly
Weekly
Weekly
Weekly
Weekly
Weekly
Weekly
Weekly
GIS Engineer
Name:
Signature:
Date:

Lotus
Description Frequency
Monthly
Monthly
Monthly
Monthly
Monthly
Monthly

GIS Engineer
Name:
Signature:
Date:

Messaging - Sun Messaging


Description Frequency
Monthly
Report of Weekly or Daily Mail Traffic Monthly
Monthly Report of User ID Creation and Deletion Report and Tra Monthly
Reconciliation of User ID's with HR Data Monthly
Top Ten Users which are generating mail traffic Monthly
Uptime Report of all the Messaging Servers and LDAP Server Monthly
Monthly
Details summery for mailbox usage. Summery of each mailbox
Monthly
use, no of mails in a mailbox per month

GIS Engineer
Name:
Signature:
Date:
Manual/ Automated Configured (Y/N) Remarks
Automated
Manual
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated

Manual/ Automated Configured (Y/N) Remarks


Automated
Manual
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated

Manual/ Automated Configured (Y/N) Remarks


Automated
Manual
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated

Manual/ Automated Configured (Y/N) Remarks

Automated

Manual

Automated

Automated

Automated

Automated

Automated

Automated

Automated

Automated
Automated

Manual/ Automated Configured (Y/N) Remarks

Automated

Automated

Automated

Automated

Automated

Automated

Configured (Y/N) Remarks


Configured (Y/N) Remarks

Manual/ Automated Configured (Y/N) Remarks

Manual/ Automated Configured (Y/N) Remarks

Manual/ Automated Configured (Y/N) Remarks


Automated
Automated
Automated
Automated
Automated
Automated
Automated

Manual/ Automated Configured (Y/N) Remarks


Automated
Automated
Automated
Automated
Automated

Manual/ Automated Configured (Y/N) Remarks


Automated
Automated
Automated
Automated
Automated
Automated
Automated
Automated

Automated

Configured (Y/N) Remarks


Configured (Y/N) Remarks

Configured (Y/N) Remarks


REPORT SAMPLES

OS Reports

Database Reports
Availability Details

Availability Histogram

Exchange Report
Websphere Report

Signature
EMS Engineer GIS Engineer
Name: Name:
Signature: Signature:
Date: Date:
SAMPLES

eports

se Reports
ge Report
ere Report

IS Engineer
S.No Parameters Monitored Warning Threshold
1 Availability Un-Reachable
2 CPU Utilization 75%
3 Memory Utilization 75%
4 Application Log Monitering Warning
5 System Log Monitering Warning
6 Disk Space 75%
7 Services Down
8 Event Log Monitoring Error
Event log file size
Luminescence (in Lux) )
ftp server monitoring
Process Monitor
Server Temperature Monitor

Critical Services for Wintel

S.No Domain Type Service Display Name


Remote Procedure Call (RPC)
1 Windows server without AD
Workstation

Remote Procedure Call (RPC)


DNS Client (if the server added to Domain)
2 Windows server in Domain
Workstation
Net Logon
Windows Time

Remote Procedure Call (RPC)


DNS Server
Workstation
Distributed File System
3 Windows Servers with AD Server
Net Logon
File Replication
Kerberos Key Distribution Center
Windows Time

4 DNS DNS Server


5 DHCP DHCP Server

6 Cluster Server Cluster Services

7 IIS IIS Admin Service

8 Terminal Server Terminal Services

Microsoft Firewall
Microsoft ISA Server Control
9 ISA Microsoft ISA Server Job Scheduler
Microsoft ISA Server Storage
Net Logon

OracleServiceSID
10 Oracle service Oracle Listener Service

BlackBerry Alert
BlackBerry Attachment Service
BlackBerry Controller
BlackBerry Dispatcher
11 Blackberry Server
BlackBerry MDS Connection Service
BlackBerry Policy Service
BlackBerry Router
BlackBerry Synchronization Service
Signature
EMS Engineer
Name:
Signature:
Date:
WINTEL PARAMETERS

Critical Threshold Polling Intervel


chable 10 Min
85% 15 Min
85% 15 Min
Critical 15 min
Critical 15 min
85% 15 Min
15 Min
10 Min

Service App.Name
RpcSs
lanmanworkstation

RpcSs
Dnscache
lanmanworkstation
Netlogon
W32Time

RpcSs
DNS
lanmanworkstation
Dfs
lanmanserver
Netlogon
NtFrs
kdc
W32Time

DNS
dhcp

ClusSvc

IISADMIN

TermServices

fwsrv
isactrl
isasched
ISASTG
Netlogon

OracleServiceSID
Oracle Listener Service
WINTEL PARAMETERS

Description
Server Availability
CPU Utilization
Memory Utilization
Application Log Monitering
System Log Monitering
Space availability in the Disk
Critical service monitoring
Error Events monitoring
GIS Engineer
Name:
Signature:
Date:
Remarks

Depend on the Application requirement.


Complaince-Yes/Partially/NO
(Partially & No are NC)
S.No Parameters Monitored

1 MSExchangeIS: RPC Requests

2 MSExchangeIS: RPC Averaged Latency

3 MSExchangeIS\RPC Num. of Slow Packets

4 MSExchangeIS\RPC Operations/sec

5 MSExchangeIS Mailbox: Message Queued for Submission

6 MSExchangeIS Mailbox: Receive Queue Size

7 MSExchangeIS mailbox: Messages SubGISted/sec

MSExchangeTransport Queues
8
( Total)\Active Mailbox Delivery Queue Length

MSExchangeTransport Queues
9
( Total)\Active Remote Delivery Queue Length

10 MSExchangeTransport DSN\Failure DSNs Total

MSExchangeTransport Queues
11
( Total)\Submission Queue Length
12 MSExchange OWA\Failed Requests /Sec

13 Disk Quota Checking

14 Services

S.No Domain type

1 Exchange 2007
1 Exchange 2007

2 Exchange 2007

3 Exchange 2007

4 Exchange 2007
* Also need to monitor OS level services

Signature
EMS Engineer
Name:
Signature:
Date:
Server Roles

All Servers

Mailbox Server

Mailbox Server

Mailbox Server

Mailbox Server

Mailbox Server

Mailbox Server

Hub & Edge Transport Server

Hub & Edge Transport Server

Hub & Edge Transport Server

Hub & Edge Transport Server

CAS Server

Mailbox Server

All Exchange Servers

Critical Services for Exchange 2007

Role

Exchange Edge Server


Exchange Edge Server

Exchange Hub & CAS Server

Exchange HUB

Exchange Mailbox Server


EXCHANGE 2007 PARAMETERS

Warning Threshold

80

80

N/A

250

250

N/A

250

250

250

N/A

N/A

UP/Down

Critical Services for Exchange 2007

Server Display Name


Netlogon
Workstation
Server
DNS Client
IIS Admin Service
Microsoft Exchange ADAM
Microsoft Exchange Credential Service
Microsoft Exchange Anti-spam Update
Microsoft Exchange Monitoring
Microsoft Exchange Transport
Microsoft Exchange Transport Log Search

Netlogon
Workstation
Server
DNS Client
IIS Admin Service
Microsoft Exchange Active Directory Topology Service
Microsoft Exchange Anti-spam Update
Microsoft Exchange EdgeSync
Microsoft Exchange Monitoring
Microsoft Exchange Transport
World Wide Web Publishing Service
Microsoft Exchange Transport Log Search

Netlogon
Workstation
Server
DNS Client
IIS Admin Service
Microsoft Exchange Active Directory Topology Service
Microsoft Exchange File Distribution
Microsoft Exchange IMAP4
Microsoft Exchange Monitoring
Microsoft Exchange POP3
World Wide Web Publishing Service
Microsoft Exchange Service Host

Netlogon
Workstation
Server
DNS Client
Cluster Service
Microsoft Exchange Active Directory Topology Service
Microsoft Exchange Information Store
Microsoft Exchange Mailbox Assistants
Microsoft Exchange Mail Submission
Microsoft Exchange Monitoring
Microsoft Exchange Replication Service
Microsoft Exchange System Attendant
Microsoft Exchange Search Indexer
Microsoft Exchange Service Host
Microsoft Exchange Transport Log Search
Microsoft Search (Exchange)
World Wide Web Publishing Service
2007 PARAMETERS

Critical Threshold

90

90

N/A

300

300

N/A

300

300

10

300

N/A

N/A

UP/Down

Service App. Name


Netlogon
lanmanworkstation
lanmanserver
Dnscache
IISADMIN
Dsamain
EdgeCredentialSvc
MSExchange Antispam Update
MSExchangeMonitoring
MSExchangeTransport
MSExchangeTransportLogSearch

Netlogon
lanmanworkstation
lanmanserver
Dnscache
IISADMIN
MSExchangeADTopology
MSExchangeAntispamUpdate
MSExchangeEdgeSync
MSExchangeMonitoring
MSExchangeTransport

MSExchangeTransportLogSearch

Netlogon
lanmanworkstation
lanmanserver
Dnscache
IISADMIN
MSExchangeADTopology
MSExchangeFDS
MSExchangeImap4
MSExchangeMonitoring
MSExchangePop3

MSExchangeServiceHost

Netlogon
lanmanworkstation
lanmanserver
Dnscache
ClusSvc
MSExchangeADTopology
MSExchangeIS
MSExchangeMailboxAssistants
MSExchangeMailSubmission
MSExchangeMonitoring
MSExchangeRepl
MSExchangeSA
MSExchangeSearch
MSExchangeServiceHost
MSExchangeTransportLogSearch
msftesql-Exchange

GIS Engineer
Name:
Signature:
Date:
Polling Intervel

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins

15Mins
Description

Indicates the number of MAPI RPC requests currently being serviced


by the Microsoft Exchange Information Store service.

Displays the average latency of the RPC requests.

Indicates the number of RPC packets in the past 1024 that have
latencies that are longer than two seconds

Indicates the current number of RPC operations that are subGISted to


the Information Store each second.

Displays the current number of subGISted messages that are not yet
processed by transport.

Displays the number of items waiting to move from the SMTP queue to
the mailbox store.

Indicates the rate that messages are subGISted by clients.

Displays the number of items in the active mailbox queues.

Displays the number of items in the active remote delivery queues.

Displays the number of failure delivery status notifications (DSNs) that


have been generated.

Displays the items in the submission queue.


Failed Requests/sec Failed is the number of Outlook Web Access
requests that failed, per second.
Given the present status of the of the mailbox size.

Critical service monitoring


Remarks Complaince-Yes/Partially/NO
(Partially & No are NC)
S.No Parameters Monitored

1 Database\Database Cache Size


2 MSExchangeIS\RPC Requests

3 MSExchangeIS\RPC Operations/sec
4 MSExchangeIS\RPC Averaged Latency

5 MSExchangeIS\RPC Num. of Slow Packets


6 Services

Critical Services for Exchange 20

S.No Domain type

1 Exchange 2003

* Also need to monitor OS level services

Signature
EMS Engineer
Name:
Signature:
Date:
EXCHANGE 20

Warning Threshold Critical Threshold

512 MB 256 MB
80 90

N/A N/A
80 ms 90 ms

2/Sec 5/Sec
Down

Critical Services for Exchange 2003

Role Server Display Name


IIS Admin Service
Microsoft Exchange IMAP4
Microsoft Exchange Information Store
Microsoft Exchange Management
Microsoft Exchange MTA Stacks
NA Microsoft Exchange System Attendant
Message Queuing
Microsoft Exchange POP3
Microsoft Exchange Routing Engine
Simple Mail Transfer Protocol (SMTP)
World Wide Web Publishing Service
EXCHANGE 2003 PARAMETERS

Polling Interval

15 Min
15 Min

15 Min

15 Min

Service App. Name


IISADMIN
IMAP4Svc
MSExchangeIS
MSExchangeMGMT
MSExchangeMTA
MSExchangeSA
MSMQ
POP3Svc
RESvc
SMTPSVC
W3SVC

GIS Engineer
Name:
Signature:
Date:
PARAMETERS

Description
Indicates the number of MAPI RPC requests currently being
serviced by the Microsoft Exchange Information Store service.
It reports the no of RPC request being serviced by the Information store
Indicates the current number of RPC operations that are subGISted
to the Information Store each second.
It reports the latency of remote procedure calls that are serviced by information store
Indicates the number of RPC packets in the past 1024 that have
latencies that are longer than two seconds.
Critical service monitoring
Complaince-Yes/Partially/NO
Remarks (Partially & No are NC)
S.No Parameters Monitored Warning Threshold
1 Availability Un-Reachable
2 CPU Utilization 70%
3 Memory Utilization 75%
4 Log File Monitoring 75%
5 Disk Utilisation 75%
6 Services DOWN

Critical Services Linux

S.No Domain type Technology

Linux Alone
1 Common Servies
(os releted Services)

2 Linux Cluster

3 Linux Internet Netwrok addresses

NIS

4 Linux
4 Linux

5 Linux DNS

6 Linux DHCP

7 Linux Volume Manager

Sqiud
8 Linux

9 Linux oracle
Signature
EMS Engineer
Name:
Signature:
Date:
Critical Threshold Polling Intervel
Un-Reachable 10 Min
90% 15 Min
90% 15 Min
90% 15 Min
90% 15 Min
DOWN 10 Min

Linux

To be monitor services

syslogd

ntpd
sshd

automountd
crond
Xinetd

clvmd
rgmanager
cman

portmap

ypserv
ypbind

ypxfrd

named

dhcpd

lvm

squid

ora_lgwr_*
ora_dbw*_*
ora_pmon_*
ora_smon_*
tnslsnr
ora_arc*_*
Remarks
Server Availability
CPU Utilization
Memory Utilization
Monitoring the Success or Failure of any job done.
Space availability in the Disk
Critical service monitoring

S.no

3
4

9
LINUX SOLARIS AIX HPUX PARAMETERS

Complaince-Yes/Partially/NO
(Partially & No are NC)

Critical Services for Solaris

Domain type Technology

Solaris Alone
Common Servies
( os releted services)

Solaris with Zones Zones

Solaris with Cluster Cluster


Solaris with Netwrok NIS

Solaris DNS

Solaris DHCP

Solaris Volume Manager

Solaris Sqiud

Solaris oracle
GIS Engineer
Name:
Signature:
Date:
X SOLARIS AIX HPUX PARAMETERS

rvices for Solaris

To be monitor services

/usr/lib/sysevent/syseventd
/usr/sbin/mdmonitord
/usr/lib/ssh/sshd
/usr/sbin/syslogd
/usr/lib/inet/xntpd

zshed

zoneadmd

zshed
zoneadmd

/usr/cluster/lib/sc/clexecd
/usr/cluster/lib/sc/failfastd
/usr/cluster/lib/sc/rgmd
/usr/cluster/lib/sc/scdpmd
/usr/cluster/lib/sc/rtreg_proxy_serverd
ypserv
ypbind
statd
ypxfrd

named

dhcpd

mdmonitord
rpc.metd
rpc.metadd

squid

ora_lgwr_*ora_dbw*_*
ora_pmon_*
ora_smon_*
tnslsnr
ora_arc*_*
Critical Services for AIX

S.no Domain type Technology To be monitor services

syslogd

AIX Alone
1 ( os releted Common Servies ntpd
services)

automountd
crond
inetd

2 AIX LPAR lpard


ypserv

3 AIX NIS ypbind


ypxfrd

4 AIX DNS named

5 AIX DHCP dhcpd

6 AIX Volume Manager lvm

7 AIX Sqiud squid

ora_lgwr_*
ora_dbw*_*
ora_pmon_*
ora_smon_*
tnslsnr
8 AIX oracle ora_arc*_*
Critical Services for HP-UX

S.no Domain type Technology To be monitor services

syslogd

HP-UNIX Alone
1 (os releted Common Servies ntpd
services)

automountd
crond
inetd

2 HP-UNIX VPAR vparhbd


gsd
3 HP-UNIX Cluster
(Global services Daemon

ypserv

4 HP-UNIX NIS
ypbind
ypxfrd

5 HP-UNIX DNS named

6 HP-UNIX DHCP dhcpd

7 HP-UNIX Volume Manager lvm

8 HP-UNIX Sqiud squid


ora_lgwr_*
ora_dbw*_*
ora_pmon_*
ora_smon_*
tnslsnr
9 HP-UNIX oracle ora_arc*_*
OR

S.No. DB Monitoring Parameters


1 Database Status

2 DB BackGround Process Check

3 Listener BackGround Process Check

4 Checking alert log file for ORA-errrors.

5 Checking listener log file for TNS-errrors.

6 Table space usage

7 Current Buffer Cache Hit Ratio

8 No of Invalid objects

9 No of segments approaching max extent

10 Checking chain count from tables

11 Tablespace Offline/Online

12 Index unusable

13 Database Job Status

14 Alert log size

15 % of free space on archive device


16 Maximum number of Sessions since startup

17 datafiles not ONLINE

18 Oracle CRS service (RAC Specific)

19 Oracle CRS log file (RAC Specific)

Oracle Alert Log


Sno ORA-#
1 ORA-%
2 shutting down archive processes
3 corrupt
4 alter
5 Hex
6 shutting down instance (abort)
7 shutting down instance (immediate)
8 shutting down instance (normal)
9 Starting ORACLE instance (normal)
10 Starting up Oracle RDBMS

Listener Log
Sno TNS-#
1 TNS-%
2 error

EMS Engineer
Name:
Signature:
Date:
ORACLE PARAMETERS

Warning -Threshold Critical -Threshold


Down Down

Down Down

Down Down

Check Below Check Below

Check Below Check Below

> 80% > 90%

<90% < 85%

>0 >0

8 10

>0 >0

Offline Offline

>0 >0

>0 >0

4.5MB 5MB

< 20% < 15%


>80% > 85 %

>0 >0

Down Down
shutdown,error,offlin
shutdown,error,offline
e

Alert Log
Description
All ORA-error Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search
Keyword for search

ener Log
Description
Listener errors
All error keyword

Signature
PARAMETERS

Polling Interval Description


5 min Check the database is up or not
This parameter monitor the database
5 min
background process are running or not

This parameter monitor the listener


background process are running or not. If
5 min
listener is not running users will not able to
connect to the database

This parameter monitor the Oracle alert log


Every 10 min
file for ORA errors.
This parameter monitor the listener log file
Every 10 min
for TNS errors.
This counter monitor the table space usage
15 min
percentage

The buffer hit ratio (BHR) indicates the


current ratio of buffer cache hits to total
requests, essentially the probability that a
4 hr
data block will be in-memory on a
subsequent block re-read. A correctly tuned
buffer cache can significantly improve overall

This counter monitor number of invalid


4 hr
object present in the database.
This counter monitor number of segment
4 hr
reaching max extent
This counter montior cpunt of chain row in
Once Weekly
tables
10 Mins This counter monitor tablespace status

120 Mins This counter montior index status

10 Mins This counter montior index status

This counter monitor size of the alert log file.


1day If alert log size is more database will take
more time to read the alert log

This counter monitor the free space available


15m
in archive log destination
It will provide information maximum number
1hour of session reached since startup from
v$resorce_liGIS

5min This counter monitor data files status

5min This counter monitor CRS services

5min This counter monitor CRS log file

Background Process
ora_lgwr_*
ora_dbw*_*
ora_reco_*
ora_dbw*_*
ora_pmon_*
ora_smon_*
ora_ckpt_*
ora_cjq*_*
ora_arc*_*
tnslsnr
ora_lmon*_* RAC Specific Background Process
ora_lck*_* RAC Specific Background Process
ora_lmd*_* RAC Specific Background Process
crsd RAC Specific Background Process

Signature
GIS Engineer
Name:
Signature:
Date:
Complaince-Yes/Partially/NO
Remarks
(Partially & No are NC)

See below for services

Background process *tns*

It's depends on application teams


needs

all tablespaces

This is for job running as oracle


dbms jobs
SQL S

S.No Parameters Monitored

1 SQLServer.Extended-SQLServer %Processor Time

2 SQLServer:Cache Manager -- Cache Hit Ratio

3 SQLServer:Databases-Percent Log Used


4 SQLServer:General Statistics-User Connections

5 SQLServer:Memory Manager -- Lock Blocks

6 SQLServer:Access Methods -- Full Scans/sec

7 SQLServer:Access Methods-Index Searches/sec

8 SQLServer:Databases -- Transactions/sec

9 SQLServer:Locks-Number of Deadlocks/sec

10 SQLServer:General Statistics-Logins/sec

11 "%err%" pattern in sql error log


12 % datafile used
13 Services

Critical Services for SQL

S.No Domain Type


1
SQL
SQL
2

Signature
EMS Engineer
Name:
Signature:
Date:
SQL SERVER PARAMETERS

Warning Threshold Critical Threshold Polling Intervel

65% 70% 30 minutes

90% 80% 1 hr

75% 80% 30 minutes


75% 80% 30 minutes

2 4 15 Min

10/Sec 15/Sec 1 hr

250/sec 300/sec 60 minutes

150/Sec 200/Sec 1 hr

2 4 15 Min

25/sec 30/sec 60 minutes

%err% %err% 15 minutes


85% 90% 10 minutes
Down 10 Min

QL

Service Display Name Service App.Name


MSSQLSERVER
SQLSERVERAGENT

GIS Engineer
Name:
Signature:
Date:
Description Remarks
This counter monitor % Processor Time shows a SUM of all the
processors for SQL Server instance.

This counter monitors the ratio between cache hits and lookups.
A rate of 90 percent or higher is desirable. Add more memory until the
value is
consistently greater than 90 percent, indicating that more than 90
percent of all requests for data were satisfied

This counter monitor SQL Log file is full or not


This counter shows the amount of user connections on your SQL Server

This counter monitors the current number of lock blocks in use on the
server.
This counter is refreshed periodically. A lock block represents an
individual locked resource, such as a table, page, or row.

Number of unrestricted full scans per second. These can be either base-
table or full-index scans.
Explanation:-

This parameters value is stored in the inside the Database itself for per
second. We are checking in every 1 hour using tool the last 1 hour o

This counter shows the number of index searches SQL Server is


performing
Number of transactions started for the database per second

Number of lock requests per second that resulted in a deadlock.

This counter monitor Total number of logins started/connected per


second. This does not include pooled connections.
Checks for error in the SQL Server error log file
This counter monitor datafiles are full or not
This monitor the sql server services are running or not
Complaince-Yes/Partially/NO
(Partially & No are NC)
S.No. DB Monitoring Parameters Warning -Threshold Critical -Threshold

1 Database Status Down Down

Key word need to be Key word need to be


monitored. monitored.

1. Error 1. Error

Log file check(Log file will be


2 2. [ERROR] 2. [ERROR]
located in DATADIR by default)

3. ended 3. ended

4. [WARNING]

3 Query cache hit ratio. <90% < 85%

4 InnoDB buffer pool ratio <90% < 85%

5 Error Log Size 4.5MB 5MB

6 Slaves Status Down Down

<Depending on Application <Depending on Application


7 Threads Connected
Requirement> Requirement>

% of free space on Data file


8 < 20% < 15%
device

Services (In Windows) Processes (In Linux)

Sl.No. Service Name Sl. No.

1 MySQL 1
2

Signature
EMS Engineer
Name:
Signature:
Date:
MySQL PARAMETERS

Polling Interval

15 min

15 min

4 hr

4 hr

1day

15 min

1hour

15m

Processes (In Linux)

Processes
/usr/bin/mysqld_safe
--datadir=<data dir path>
--pid-file=<pid file path>
/usr/sbin/mysqld
--basedir=/
--datadir=<data dir path>
--user=mysql --pid-
file=<pid file path> --skip-
external-locking

GIS Engineer
Name:
Signature:
Date:
MySQL PARAMETERS

Description

This is used to monitor MySQL database services in windows and unix.

This counter is used to monitor the database log for Mysql.

The Query cache hit ratio indicates for high value that queries in cache are being
reused by other threads; a low rate shows either not enough memory allocated to
query cache or identical queries are not repeatedly issued to the server.

The innodb buffer pool ratio indicates the current ratio of cache hits to total requests,
essentially the probability that a data block will be in-memory on a subsequent block
re-read. A correctly tuned pool can significantly improve overall performance.

This counter monitor size of the Error log file. If Error log size is more database will
take more time to read the alert log.

This counter indicates the status of slave configured for replication.

It will provide information maximum number of session reached.

This counter monitor the free space available in Data file destination partition.
Remarks
Complaince-Yes/Partially/NO
(Partially & No are NC)
S.No. DB Monitoring Parameters Warning -Threshold Critical -Threshold

1 Database Status Down Down

postgreSQL Log file monitoring


2 FATAL,shutting down PANIC,FATAL,shutting down
(pg_log file)

Services (In Windows) Processes (In Lin

Sl.No. Service Name Sl. No.


1 *postgresql* 1
2
3
4

Signature
EMS Engineer
Name:
Signature:
Date:
PostgreSQL PARAMETERS

Polling Interval

10 Mins

15 Mins

Processes (In Linux)

Processes
postgres
postgres: logger process
postgres: writer process
postgres: stats collector proces

GIS Engineer
Name:
Signature:
Date:
PostgreSQL PARAMETERS

Description

This is used to monitor PostgreSQL database services in windows and unix.

Monitor postgreSQL error log


Remarks
Complaince-Yes/Partially/NO
(Partially & No are NC)
SECURITY PARAME

Monitoring - Warning
SL No Parameters Monitored
Threshold
Firewalls/SMTP Gateway/Web Proxy/AAA Server/Security Analyser/IPS/IDS
1 Availability N.A

2 Memory Util 60

3 CPU Utilization 60

4 Interface status NA

* Apart from this all the native tools parameneters needs to be configured

Signature
EMS Engineer
Name:
Signature:
Date:
SECURITY PARAMETERS

Monitoring - CRITICAL Monitoring Polling


Description
Threshold Interval
alyser/IPS/IDS
When Node Down 5 Min Device IP Availability

75 5 Min Device Memory Utilisation

75 5 Min Device CPU Utilization.

When Node Down 5 Min Device Interface down

Signature
GIS Engineer
Name:
Signature:
Date:
Complaince-Yes/Partially/NO
Remarks (Partially & No are NC)
SL No Parameters Monitored

1 Availability

2 Memory Util

3 CPU Utilization

4 Interface Status

5 Bandwidth Utilization - IN & Out

6 Interface Errors - In & Out

7 Packet Drops

Sw

1 Availability

2 Memory Util

3 CPU Utilization

4 Critical Ports (Uplinks / Severs)

1 Availability

EMS Engineer
Name:
Signature:
Date:
NETWORK PARAMETERS

Monitoring - Warning Threshold Monitoring - CRITICAL Threshold

Routers

N.A When Node Down

60 75

60 75

N.A When Interface Down

60% 75%

0.50% 1%

0.50% 1%

Switches - Layer 3 & Layer 2 and Radware LBs

N.A When Node Down

60 75

60 75

NA When node Down

Wireless Access Points


N.A When Node Down

Signature
AMETERS

Monitoring Polling
Description
Interval

5 Min Router IP Availability

5 Min Router Memory Utilisation

5 Min Router CPU Utilization.

5 Min Input & Output Q drops

Bandwidth Utilization - In &


5 Min
Out

3 Min Interface Errors - In & Out

2 Min Packet Loss

and Radware LBs

5 Min Switch IP Availability

5 Min Switch IP Memory Util

5 Min Switch CPU Utilization.

2 Min Port Availability

Points
5 Min IP Availability

GIS Engineer
Name:
Signature:
Date:
Remarks

Link and Link protocal both needs to be checked & Link


description should be in place

Description should be in place for each port


Complaince-Yes/Partially/NO
(Partially & No are NC)
B

1 Backup Status(Pass/Fail) Failure Failure

2 Backup Server Services Down Down

2 Backup Client Services Down Down

S.No Domain Type

1
Backup
2

* Apart from this all the native tools parameneters needs to be integrated

Signature
EMS Engineer
Name:
Signature:
Date:
BACKUP PARAMETERS

As per backup timing

15 Mins

15 Mins

Critical Services for Backup

Service Name

GIS Engineer
Name:
Signature:
Date:
KUP PARAMETERS

tical Services for Backup

Description
Complaince-Yes/Partially/NO
(Partially & No are NC)
S.No. Monitoring Parameters Warning -Threshold

1 Availability of Storage Down

2 Utilization (%)

3 Queue Length

4 Response Time (ms)

Total Bandwidth
5
(MB/s)

Total Throughput
6
(I/O/sec)

7 Utilization (%)

8 Queue Length

9 Response Time (ms)

Total Bandwidth
10
(MB/s)

Total Throughput
11
(I/O/sec)

12 Utilization (%)

13 Queue Length
14 Response Time (ms)

Total Bandwidth
15
(MB/s)

Total Throughput
16
(I/O/sec)

* Need to check with tools team how it's going to monitor using SNMP/Native tool integration

Signature
EMS Engineer
Name:
Signature:
Date:

* Monitoring parameters and threshold valuse specified are default and purely depends on customer set
STORAGE PARAMETERS

Critical -Threshold Polling Interval

Down 5 Min

SP Performance Char

LUN Performance Charecterstics

DISK Performance Ch
using SNMP/Native tool integration

GIS Engineer
Name:
Signature:
Date:

ed are default and purely depends on customer setup.


STORAGE PARAMETERS

Description

SP Performance Charecterstics

The percentage of time during which the


SP is servicing any requests
The average number of requests within a
polling interval that are waiting to be
serviced by the SP, including the one
currently in service.
The average time in milliseconds that it
takes for one request to pass through the
SP, including any waiting time.
The average amount of host data in Mbytes
that is passed through the SP per second.
This includes both, read and write requests.
The average number of host requests that
is passed through the SP per second. This
includes both, read and write requests.

LUN Performance Charecterstics

The fraction of an observation period during


which a LUN has any outstanding requests.
The average number of requests within a
polling interval that are outstanding to this
LUN.
The average time, in milliseconds, that a
request to this LUN is outstanding,
including its waiting time.
The average amount of host data in Mbytes
that is passed through the LUN per second.
This includes both read and write requests.
The average number of host requests that
is passed through the LUN per second.
This includes both read and write requests.
DISK Performance Charecterstics
The percentage of time during which the
disk is servicing any requests.
The average number of requests within a
polling interval that are waiting to be
serviced by the disk, including the one
currently in service.
The average time, in milliseconds, that it
takes for one request to pass through the
disk, including any waiting time
The average amount of data in Mbytes that
is transferred to or from the disk per
second. Total bandwidth includes both,
read and write requests.
The average number of requests to the disk
per second. Total throughput includes both,
read and write requests.
Remarks
Complaince-Yes/Partially/NO
(Partially & No are NC)
MIDDL

SL.No Monitoring Parameters Warning -Threshold

1 Website Availability Unreachable

CPU Utilization for Web Server /


2 0.75
process

3 Memory Utilization 0.75

Application Server / WebServer


4 Warning
Log Monitering

5 System Log Monitering Warning


6 Disk Space 0.75

7 Services Down

Event Log Monitoring ( When Application


8 server / web server installed on Windows Error
OS )

9 JVM Memory Utilization 0.75

10 App / Web Server Queue Length 0.75

11 JDBC Connection Pool Utilization/Leakage >15

12 Website Response Time > 120 sec


>200 (Based on
13 Thread pool
application usgae)

14 EJB transactions >2 mins

>1000( Based on
15 web application sessions
application usgae)

16 server response time >30 sec

17 Low Memory GC Threshold < 25% free

Signature
EMS Engineer
Name:
Signature:
Date:

* Monitoring parameters and threshold valuse specified are default and purely depends on customer setup.
MIDDLEWARE PARAMETER

Critical -Threshold Polling Interval

Unreachable 10 Min

0.85 15 Min

0.85 15 Min

Critical 15 min

Critical 15 min
0.85 15 Min

Down 15 Min

Error 10 Min

0.85 10 Min

0.85 10 Min

>25 15 Min

> 150 sec 15 Min


>300 5 mins

>5 mins 10 mins

>2000 10 mins

>1 min 10 mins

< 20% free 30 Min

GIS Engineer
Name:
Signature:
Date:

ely depends on customer setup.


AMETER

Description

Server Availability

Ability to monitor CPU load by web server instance / process

server instance

Ability to monitor web server / AppServer log files


for error/Exception condition such as 403,404 ,500,Out of
Memory Error ,SQL Exception,I / O Exception etc/STUCK

System Log Monitering


Space availability in the Disk

Critical service monitoring ( Monitoring availiability of server


and services deployed in the application
Server/web server availiability )

Error Events monitoring

Ability to monitor load number of incoming connections

Ability to monitor the length of


incoming request waiting to
be services by the Application server

Monitor the JDBC Connection Pool/leakage

Ability to monitor website response time


in real-time and prvide operrations team access
to this data in real time .
Thread pool of the application server/web server

EJB transactional monitor

Number of user sessions on the particular application server.

Particular JVM response time

Monitor the heap free percentage size, lookup the cache size
Complaince-Yes/Partially/NO
Remarks
(Partially & No are NC)

When sustained consistently above


80% of CPU

Generate alerts when error or


exception
detected

whenever the server / services down .

Website Not Available

Queue >85% full

Connection Pool utilization reaches 100


%/
Connection Leakage occurs .

Website Response Time above 10


seconds
Performance issue
LOTU

SL.No Monitoring Parameters Warning -Threshold

1 Server Availability Index <50

2 Dead Mail >75


3 Pending Mail >150
4 Domino Services monitoring Down

5 ACL Change in Names.nsf Any

6 Cluster Replication Failure Any

Domino Services

Sl.No.

Signature
EMS Engineer
Name:
Signature:
Date:
LOTUS PARAMETER

Critical -Threshold Polling Interval

<10 10 mins

>100 10 mins
>350 10 mins
Down 10 mins

Any 10 mins

Any 10 mins

Domino Services

Service Name

nserver

nreplica

nrouter

Amgr

Calcon

Collector

smtp

pop3

adminp

GIS Engineer
Name:
Signature:
Date:
RAMETER

Description

The server availability index is approximately equal to the


percentage of the total server capacity that is still
available.The server availability index (AI) used to determine
server workload, that is based on the average response time

No of dead mail
No of pending mail
Checking domino services

Any access control list changes in names.nsf

Any failure in cluster replication


Complaince-Yes/Partially/NO
Remarks
(Partially & No are NC)
S.No Parameters Monitored

1 Message Queued for Submission

2 Receive Queue Size

3 Messages SubGISted/sec

4 Mailbox Delivery Queue Length

5 Failed Requests /Sec

6 Background Process
Messaging Server Disk Capacity ( MT Store Server ) Message Store
7 Partition
8 Message Queue Performance
9 Message Queue Availibility

Critical Servic

S.No Domain Type


1
2
3 Sun Messaging
4
5

Signature
EMS Engineer
Name:
Signature:
Date:

Reports f

S.No Domain Type


1 Sun Messaging
2 Sun Messaging
3 Sun Messaging
4 Sun Messaging
5 Sun Messaging

* Monitoring parameters and threshold valuse specified are default and purely depends on customer setu
Warning Threshold

250

250

N/A

250

N/A

UP/Down

10 GB
NA
NA

Critical Services for Sun Messaging

Service Name

All Messaging services in messaging Server


Services in MTStore Server
Services in MTA Server
Web Mail Server MSHTTPD DAEMON
LDAP Services in LDAP Server

Reports for Sun Messaging

Report Type
SMTP Mail Trafiic
User ID Creation Deletion & Tranfer
User ID Reconcilliation
Top Ten Users
Messaging Server Uptime Report

fault and purely depends on customer setup.


SUN MESSAGING

Critical Threshold

300

300

N/A

300

N/A

UP/Down

5GB

Sun Messaging

Description

Mesaaging Services in all Sun Messaging Servers


Services to monitor Message store funtionality
Services to monitor SMTP and LMTP Mail
Web mail server services
Services to monitor LDAP

Messaging

Description
Report of Weekly or Daily Mail Traffic
Monthly Report of User ID Creation and Deletion Report and Transfer
Reconciliation of User ID's with HR Data
Top Ten Users which are generating mail traffic
Uptime Report of all the Messaging Servers and LDAP Server
ING

Polling Intervel Description

Displays the current number of subGISted


15Mins messages that are not yet processed by
transport.

Displays the number of items waiting to move


15Mins
from the SMTP queue to the mailbox store.

Indicates the rate that messages are subGISted


15Mins
by clients.
Displays the number of items in the server
15Mins
queues.
Failed Requests/sec Failed is the number of
15Mins Outlook Web Access requests that failed, per
second.
15Mins Critical service monitoring

15Mins
15Mins
15Mins

GIS Engineer
Name:
Signature:
Date:
Remarks
Complaince-Yes/Partially/NO
(Partially & No are NC)