Академический Документы
Профессиональный Документы
Культура Документы
Designing Manageable
Applications
August 2008
Table of Contents
Introduction .................................................................................................................................... 12
Intended Audiences ................................................................................................................ 12
How This Guide Is Organized ................................................................................................ 12
Chapter Outline ...................................................................................................................... 13
Scenarios Discussed in This Guide ....................................................................................... 14
Worked Example Used in This Guide .................................................................................... 14
Northern Electronics Shipping Application ......................................................................... 14
The Dynamic IT Systems Initiative (DSI) ............................................................................... 15
Patterns and Practices ........................................................................................................... 15
Feedback and Support ........................................................................................................... 15
Acknowledgments .................................................................................................................. 15
Section 1 ........................................................................................................................................ 17
Introduction to Manageable Applications ................................................................................... 17
Chapter 1 ....................................................................................................................................... 18
Understanding Manageable Applications .................................................................................. 18
Application Perspectives ........................................................................................................ 19
Operating Business Applications............................................................................................ 19
Application Dependencies ...................................................................................................... 20
Core Principles for Designing Manageable Applications ....................................................... 22
Northern Electronics Scenario................................................................................................ 23
Operations Challenges ....................................................................................................... 24
Development Challenges.................................................................................................... 25
Summary ................................................................................................................................ 25
Chapter 2 ....................................................................................................................................... 26
A High-Level Process for Manageable Applications .................................................................. 26
Roles Participating in the High-Level Process ....................................................................... 27
Understanding the Process .................................................................................................... 29
Designing the Manageable Application .............................................................................. 29
Developing the Manageable Application ............................................................................ 30
Deploying the Manageable Application .............................................................................. 30
Operating the Manageable Application .............................................................................. 30
Facilitating the Process Guidance and Artifacts.................................................................. 31
Northern Electronics Scenario................................................................................................ 33
Summary ................................................................................................................................ 34
Section 2 ........................................................................................................................................ 36
Architecting for Operations ........................................................................................................ 36
Chapter 3 ....................................................................................................................................... 37
Architecting Manageable Applications ....................................................................................... 37
Designing Manageable Applications ...................................................................................... 37
Representing Applications as Managed Entities .................................................................... 39
Advantages of Using Managed Entities ................................................................................. 40
Providing an Operations View of an Application................................................................. 41
Ensuring That Instrumentation Is Sufficient ........................................................................ 41
Close Mapping to Configuration ......................................................................................... 41
Benefits of Defining a Management Model for the Application .............................................. 42
Designing, Developing, Deploying, and Maintaining Manageable Applications: Refining the
Process................................................................................................................................... 42
Northern Electronics Scenario................................................................................................ 43
Summary ................................................................................................................................ 44
Chapter 4 ....................................................................................................................................... 45
Creating Effective Management Models .................................................................................... 45
Benefits of Using Management Models ................................................................................. 46
Management Model Views ..................................................................................................... 46
Comprehensive Management Models ................................................................................... 47
Configuration Modeling ....................................................................................................... 47
Task Modeling .................................................................................................................... 48
Instrumentation Modeling ................................................................................................... 49
Health Modeling .................................................................................................................. 49
Performance Modeling ........................................................................................................ 50
Modeling Instrumentation and Health .................................................................................... 51
Effective Instrumentation Modeling .................................................................................... 51
Types of Instrumentation................................................................................................. 51
Performance Counters .................................................................................................... 52
Events ............................................................................................................................. 52
Determining What to Instrument ......................................................................................... 53
Granularity of Instrumentation ......................................................................................... 54
Performance Considerations........................................................................................... 54
Building Effective Health Models ............................................................................................ 54
Health States ...................................................................................................................... 55
Health State Hierarchies ..................................................................................................... 56
Managed Entity Hierarchies ............................................................................................ 56
Aggregate Aspects .......................................................................................................... 58
Rolling Up Aspects into Managed Entities ...................................................................... 59
Monitoring and Troubleshooting Workflow ......................................................................... 60
Detection ......................................................................................................................... 60
Verification....................................................................................................................... 61
Diagnostics ...................................................................................................................... 61
Resolution ....................................................................................................................... 62
Re-verification ................................................................................................................. 62
Structure of a Health Model ................................................................................................ 62
Mapping Requirements to Individual Indicators.................................................................. 64
Multiple Distributed Managed Entities ................................................................................ 64
Northern Electronics Scenario................................................................................................ 65
Instrumentation Model ........................................................................................................ 66
Health Model ....................................................................................................................... 66
Summary ................................................................................................................................ 68
Chapter 5 ....................................................................................................................................... 69
Proven Practices for Application Instrumentation ...................................................................... 69
Events and Metrics ................................................................................................................. 69
Architectural Principles for Effective Instrumentation ............................................................. 69
Create a Flexible Instrumentation Architecture .................................................................. 70
Create Instrumentation That Operations Staff Easily Understands.................................... 70
Support Existing Operations Processes and Tools ............................................................ 70
Create Applications That Are Not Self-Monitoring .............................................................. 71
Support Flexible Configuration of Instrumentation ............................................................. 71
Using Instrumentation Levels to Specify Instrumentation Granularity ............................ 71
Using Infrastructure Trust Levels to Specify Instrumentation Technologies ................... 73
Designing Application Instrumentation ................................................................................... 73
Use the Capabilities of the Underlying Platform ................................................................. 73
Provide Separate Instrumentation for Each Purpose ......................................................... 74
Isolate Abstract Instrumentation from Specific Instrumentation Technologies ................... 74
Create an Extensible Instrumentation Architecture ............................................................ 75
Use Base Events for Instrumentation ................................................................................. 75
Use Event Names and Event IDs Consistently .................................................................. 75
Ensure Events Provide Backward Compatibility................................................................. 76
Support Logging to Remote Sources ................................................................................. 76
Consider Distributed Event Correlation .............................................................................. 76
Developing the Instrumentation.............................................................................................. 76
Minimize Resource Consumption ....................................................................................... 77
Consider the Security of the Event Information .................................................................. 77
Supply Appropriate Context Data ....................................................................................... 77
Record the Times Events Are Generated ........................................................................... 77
Provide Resolution Guidance ............................................................................................. 78
Building and Deploying Instrumentation ................................................................................. 78
Automate Implementation of Instrumentation ..................................................................... 78
Automate the Build and Deploy Process ............................................................................ 78
Monitor Applications Remotely ........................................................................................... 78
Summary ................................................................................................................................ 79
Chapter 6 ....................................................................................................................................... 80
Specifying Infrastructure Trust Levels........................................................................................ 80
Infrastructure Model Scenarios .............................................................................................. 81
In-House Application Scenario ........................................................................................... 81
ISV or Shrink-Wrap Application Scenario ........................................................................... 81
Privilege and Trust Considerations ........................................................................................ 82
Tools for Infrastructure Modeling............................................................................................ 84
Standalone Tools ................................................................................................................ 85
Integrated Tools .................................................................................................................. 85
Infrastructure Modeling with the TSMMD ........................................................................... 85
Instrumentation Technologies Supported by the TSMMD .............................................. 86
Northern Electronics Scenario................................................................................................ 87
Summary ................................................................................................................................ 87
Chapter 7 ....................................................................................................................................... 88
Specifying a Management Model Using the TSMMD Tool ........................................................ 88
Requirements for the TSMMD................................................................................................ 88
Creating a Management Model .............................................................................................. 89
The TSMMD Guided Experience ........................................................................................ 89
Creating the TSMMD File ................................................................................................... 89
Graphically Modeling an Operations View of the Application ............................................. 91
Executable Application .................................................................................................... 92
Windows Service ............................................................................................................. 92
ASP.NET Application ...................................................................................................... 93
ASP.NET Web Service ................................................................................................... 94
Windows Communication Foundation (WCF) Service .................................................... 95
Defining Target Environments for the Application .............................................................. 96
Defining Instrumentation for the Application ....................................................................... 97
Defining Abstract Instrumentation ................................................................................... 97
Defining Instrumentation Implementations .................................................................... 100
Discovering Existing Instrumentation in an Application .................................................... 105
Creating Health Definitions ............................................................................................... 109
Validating the Management Model ................................................................................... 111
Management Model Guidelines............................................................................................ 112
Northern Electronics Scenario.............................................................................................. 112
Summary .............................................................................................................................. 115
Section 3 ...................................................................................................................................... 116
Developing for Operations ....................................................................................................... 116
Chapter 8 ..................................................................................................................................... 117
Creating Reusable Instrumentation Helpers ............................................................................ 117
Creating Instrumentation Helper Classes ............................................................................ 117
Instrumentation Solution Folder ........................................................................................... 118
API Projects ...................................................................................................................... 119
Technology Projects ......................................................................................................... 120
Event Log Project .......................................................................................................... 120
Windows Eventing 6.0 Project ...................................................................................... 121
WMI Project ................................................................................................................... 121
Performance Counter Project........................................................................................ 121
Using the Instrumentation Helpers ....................................................................................... 121
Verifying That Instrumentation Code is called from the Application..................................... 122
Summary .............................................................................................................................. 123
Chapter 9 ..................................................................................................................................... 124
Event Log Instrumentation ....................................................................................................... 124
Installing Event Log Functionality ......................................................................................... 125
Event Sources .................................................................................................................. 125
Using the EventLogInstaller class .................................................................................... 127
Writing Events to an Event Log ............................................................................................ 129
Using the WriteEntry Method ............................................................................................ 129
The WriteEvent Method .................................................................................................... 130
Reading Events from Event Logs ......................................................................................... 131
Creating and Configuring an Instance of the EventLog Class.......................................... 131
Using the Entries Collection to Read the Entries.............................................................. 132
Clearing Event Logs ............................................................................................................. 133
Deleting Event Logs ............................................................................................................. 133
Removing Event Sources ..................................................................................................... 134
Creating Event Handlers ...................................................................................................... 135
Using Custom Event Logs .................................................................................................... 135
Writing to a Custom Log ................................................................................................... 136
Installing the Custom Log.............................................................................................. 136
Writing Events to the Custom Log ................................................................................ 137
Other Custom Log Tasks .................................................................................................. 137
Summary .............................................................................................................................. 137
Chapter 10 ................................................................................................................................... 138
WMI Instrumentation ................................................................................................................ 138
WMI and the .NET Framework ............................................................................................. 138
Benefits of WMI Support in the .NET Framework............................................................. 139
Limitations of WMI in the .NET Framework ...................................................................... 140
Using WMI.NET Namespaces .......................................................................................... 141
Publishing the Schema for an Instrumented Assembly to WMI ........................................... 142
Republishing the Schema ................................................................................................. 143
Unregistering the Schema ................................................................................................ 143
Instrumenting Applications Using WMI.NET classes ........................................................... 143
WMI .NET Classes ........................................................................................................... 144
Accessing WMI Data Programmatically ............................................................................... 144
Summary .............................................................................................................................. 145
Chapter 11 ................................................................................................................................... 147
Windows Eventing 6.0 Instrumentation ................................................................................... 147
Windows Eventing 6.0 Overview .......................................................................................... 147
Reusable Custom Views ................................................................................................... 147
Command Line Operations ............................................................................................... 148
Event Subscriptions .......................................................................................................... 149
Integration with Task Scheduler ....................................................................................... 149
Online Event Information .................................................................................................. 150
Publishing Windows Events ................................................................................................. 150
Event Types and Event Channels .................................................................................... 151
Event Types and Channel Groups ................................................................................ 151
Serviced Channel .......................................................................................................... 152
Direct Channel .............................................................................................................. 152
Channels Defined in the Winmeta.xml File ................................................................... 152
Creating the Instrumentation Manifests ............................................................................ 153
Elements in the Instrumentation Manifest ..................................................................... 153
Using Templates for Events .......................................................................................... 157
Using the Message Compiler to produce development files ............................................ 157
Writing Code to Raise Events ........................................................................................... 158
Compiling and Linking Event Publisher Source Code ...................................................... 162
Installing the Publisher Files ............................................................................................. 162
Consuming Event Log Events .............................................................................................. 163
Querying for Events .......................................................................................................... 163
Querying Over Active Event Logs .................................................................................... 163
Querying Over External Files............................................................................................ 163
Reading Events from a Query Result Set ......................................................................... 164
Subscribing to Events ....................................................................................................... 164
Push Subscriptions ........................................................................................................... 165
Pull Subscriptions ............................................................................................................. 168
Summary .............................................................................................................................. 171
Chapter 12 ................................................................................................................................... 172
Performance Counters Instrumentation ................................................................................... 172
Performance Counter Concepts ........................................................................................... 172
Categories......................................................................................................................... 172
Instances........................................................................................................................... 173
Types ................................................................................................................................ 173
Installing Performance Counters .......................................................................................... 174
Writing Values to Performance Counters ............................................................................. 176
Connecting to Existing Performance Counters .................................................................... 178
Performance Counter Value Retrieval ................................................................................. 178
Raw, Calculated, and Sampled Data ................................................................................ 178
Comparing Retrieval Methods .............................................................................................. 180
Summary .............................................................................................................................. 180
Chapter 13 ................................................................................................................................... 181
Building Install Packages ......................................................................................................... 181
Section 4 ...................................................................................................................................... 182
Managing Operations ............................................................................................................... 182
Chapter 14 ................................................................................................................................... 183
Deploying and Operating Manageable Applications ................................................................ 183
Deploying the Application Instrumentation ........................................................................... 183
Running the Instrumented Application ................................................................................. 183
Event Log Instrumentation ................................................................................................ 185
Performance Counter Instrumentation ............................................................................. 187
WMI................................................................................................................................... 188
Trace File Instrumentation ................................................................................................ 188
Summary .............................................................................................................................. 189
Chapter 15 ................................................................................................................................... 190
Monitoring Applications ............................................................................................................ 190
Distributed Monitoring Applications ...................................................................................... 190
Management Packs.............................................................................................................. 192
Rules and Rule Groups ........................................................................................................ 192
Monitoring the Example Application ..................................................................................... 196
Monitoring the Remote Web Service ................................................................................... 201
Summary .............................................................................................................................. 209
Chapter 16 ................................................................................................................................... 210
Creating and Using Microsoft Operations Manager 2005 Management Packs ...................... 210
Importing a Management Model from the MMD into Operations Manager 2005 ................. 210
Viewing the Management Pack ........................................................................................ 212
Guidelines for Importing a Management Model from the Management Model Designer . 218
Creating and Configuring a Management Pack in the Operations Manager 2005
Administrator Console .......................................................................................................... 219
Guidelines for Creating and Configuring a Management Pack in the Operations Manager
2005 Administrator Console ............................................................................................. 232
Editing an Operations Manager 2005 Management Pack ................................................... 233
Editing Rule Groups and Subgroups ................................................................................ 233
Editing Event Rules, Alert Rules, and Performance Rules ............................................... 236
Editing Computer Groups and Rollup Rules..................................................................... 242
Creating and Editing Operators, Notification Groups and Notifications ........................... 246
Viewing and Editing Global Settings ................................................................................. 249
Guidelines for Editing an Operations Manager 2005 Management Pack ........................ 251
Create an Operations Manager 2005 Computer Group and Deploy the Operations Manager
Agent and Rules ................................................................................................................... 251
Guidelines for Creating an Operations Manager 2005 Computer Group and Deploying the
Operations Manager Agent and Rules ............................................................................. 257
View Management Information in Operations Manager 2005 .............................................. 258
Guidelines for Viewing Management Information in Operations Manager 2005 .............. 266
Create Management Reports in Operations Manager 2005 ................................................ 267
Guidelines for Creating Management Reports in Operations Manager 2005 .................. 269
Summary .............................................................................................................................. 269
Chapter 17 ................................................................................................................................... 270
Creating and Using System Center Operations Manager 2007 Management Packs ............. 270
Convert and Import a Microsoft Operations Manager 2005 Management Pack into
Operations Manager 2007.................................................................................................... 270
Guidelines for Converting and Importing a Microsoft Operations Manager 2005
Management Pack into Operations Manager 2007 .......................................................... 271
Creating a Management Pack in the Operations Manager 2007 Operations Console ........ 272
Guidelines for Creating a Management Pack in the Operations Manager 2007 Operations
Console ............................................................................................................................. 294
Editing an Operations Manager 2007 Management Pack ................................................... 295
Guidelines for Editing an Operations Manager 2007 Management Pack ........................ 303
Deploying the Operations Manager 2007 Agent .................................................................. 303
Best Practices for Deploying the Operations Manager 2007 Agent ................................. 306
Viewing Management Information in Operations Manager 2007 ......................................... 306
Guidelines for Viewing Management Information in Operations Manager 2007 .............. 311
Creating Management Reports in Operations Manager 2007 ............................................. 312
Guidelines for Creating Management Reports in Operations Manager 2007 .................. 315
Summary .............................................................................................................................. 316
Section 5 ...................................................................................................................................... 317
Technical References .............................................................................................................. 317
Appendix A .................................................................................................................................. 318
Building and Deploying Applications Modeled with the TSMMD ........................................... 318
Consuming the Instrumentation Helper Classes .................................................................. 318
Verifying Instrumentation Coverage ..................................................................................... 320
Deploying the Application Instrumentation ........................................................................... 322
Installing Event Log Functionality ..................................................................................... 322
Installing Windows Eventing 6.0 Functionality.................................................................. 323
Publishing the Schema for an Instrumented Assembly to WMI ....................................... 323
Installing Performance Counters ...................................................................................... 323
Using a Batch File to Install Instrumentation .................................................................... 324
Using the Event Messages File ........................................................................................ 324
Specifying the Runtime Target Environment and Instrumentation Levels ........................... 324
Generating Management Packs for System Center Operations Manager 2007 ................. 328
Importing a Management Pack into System Center Operations Manager 2007 ................. 330
Prerequisite Management Packs ...................................................................................... 331
Creating a New Distributed Application ................................................................................ 331
Appendix B .................................................................................................................................. 333
Walkthrough of the Team System Management Model Designer Power Tool ........................ 333
Building a Management Model ................................................................................................ 333
Generating the Instrumentation Code ...................................................................................... 347
Testing the Model with a Windows Forms Application ............................................................ 349
Generating an Operations Manager 2007 Management Pack ................................................ 353
Appendix C .................................................................................................................................. 355
Performance Counter Types .................................................................................................... 355
Copyright Information
Information in this document, including URL and other Internet Web site references, is subject
to change without notice. Unless otherwise noted, the companies, organizations, products,
domain names, e-mail addresses, logos, people, places, and events depicted in examples herein
are fictitious. No association with any real company, organization, product, domain name, e-
mail address, logo, person, place, or event is intended or should be inferred. Complying with all
applicable copyright laws is the responsibility of the user. Without limiting the rights under
copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying,
recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you
any license to these patents, trademarks, copyrights, or other intellectual property.
Microsoft, Windows, System Center Operations Manager, C#, Visual Basic, Visual Studio, and
Team System are trademarks of the Microsoft group of companies.
All other trademarks are property of their respective owners.
© 2008 Microsoft Corporation. All rights reserved.
Introduction
Welcome to Design for Operations: Designing Manageable Applications – January 2008 Release.
This guide describes how to create applications that are easier to manage than existing
applications. When used alongside the associated code artifacts, this guide should help
dramatically simplify the process of creating manageable applications, and therefore reduce the
costs associated with application operations.
Intended Audiences
This guide is designed for people involved in designing, developing, testing, deploying, and
operating business applications. These include people in the following roles:
• Solutions architects
• Infrastructure architects
• Developers
• Senior operators
People in each role are likely to use the guide in different ways; different sections are suitable
for different roles. For more information about which sections are appropriate for particular
roles, see the next section, "How This Guide Is Organized."
"Architecting for Operations" Solutions architects Examines the architectural principles that
Infrastructure architects should be followed to design manageable
applications.
Explains management models, and shows
how these can be defined.
"Developing for Operations" Developers Examines the development tasks that must
be performed to create manageable
applications. Shows how the management
model can be consumed by developers to
make developing manageable applications
easier.
Chapter Outline
This guide includes the following chapters and an appendices:
• "Introduction"
• Section 1, "Understanding Manageable Applications"
◦ Chapter 1, "Understanding Manageable Applications"
◦ Chapter 2, "A High-Level Process for Manageable Applications"
• Section 2, "Architecting for Operations"
◦ Chapter 3, "Architecting Manageable Applications"
◦ Chapter 4, "Creating Effective Management Models"
◦ Chapter 5, "Proven Practices for Application Instrumentation"
◦ Chapter 6, "Specifying Infrastructure Trust Levels"
◦ Chapter 7, "Specifying a Management Model Using the TSMMD Tool"
• Section 3, "Developing for Operations"
◦ Chapter 8, "Creating Reusable Instrumentation Helpers"
◦ Chapter 9, "Event Log Instrumentation"
◦ Chapter 10, "WMI Instrumentation"
◦ Chapter 11, "Windows Eventing 6.0 Instrumentation"
◦ Chapter 12, "Performance Counters Instrumentation"
◦ Chapter 13, "Building Install Packages"
• Section 4, "Managing Operations"
◦ Chapter 14, "Deploying and Operating Manageable Applications"
◦ Chapter 15, "Monitoring Applications"
◦ Chapter 16, "Creating and Using Microsoft Operations Manager 2005
Management Packs"
◦ Chapter 17, "Creating and Using System Center Operations Manager 2007
Management Packs"
• Section 5, "Technical References"
◦ Appendix A, "Building and Deploying Applications Modeled with the TSMMD"
◦ Appendix B, "Walkthrough of the TSMMD Tool"
◦ Appendix C, "Performance Counter Types"
The technical reference section chapters are included in this outline for the sake of
completeness. However, these chapters are scheduled for inclusion in a later revision of the
guide. The plans for the final version of this guide are subject to change, based on feedback
from the community.
For more details about the DSI initiative, see "Dynamic Systems Initiative" on the Microsoft
Business & Industry Web site at http://www.microsoft.com/business/dsi/default.mspx.
Microsoft patterns & practices contain deep technical guidance and tested source code based
on real-world experience. The technical guidance is created, reviewed, and approved by
Microsoft architects, product teams, consultants, product support engineers, and by Microsoft
partners and customers. The result is a thoroughly engineered and tested set of
recommendations that you can follow with confidence when building your applications.
Acknowledgments
Thanks to the following individuals who assisted in the content development, code
development, test, and documentation experience:
Core Development Team
• William Loeffler, Microsoft Corporation
• Keith Pleas, Keith Pleas and Associates
• Fernando Simonazzi, Clarius Consulting
• Vanesa Cillo, Clarius Consulting
• Peter Clift, Tek Systems
• Alex Homer, Content Master Ltd
• Paul Slater, Wadeware LLC
Reviewers
• David Aiken, Microsoft Corporation,
• Mary Gray, Microsoft Corporation
• Peter Costatini, Microsoft Corporation
• Marty Hough, Microsoft Corporation
• Kyle Bergum, Microsoft Corporation
• Alex Torone, Microsoft Corporation
• David Trowbridge, Microsoft Corporation
• Tim Sinclair, Microsoft Corporation
• Jeff Levinson, Boeing Corporation
Section 1
Introduction to Manageable
Applications
This section defines manageable applications and explains the benefits to operators, developers,
and architects of manageable applications. It also defines a high level-process for designing,
developing, deploying, and operating manageable applications.
This section should be of use primarily to solutions architects and infrastructure architects.
However, it also provides useful background information to developers and operators.
Chapter 1, "Understanding Manageable Applications"
Chapter 2, "A High-Level Process for Manageable Applications"
Chapter 1
Understanding Manageable
Applications
Hardware and software costs form only a small percentage of the total cost of ownership (TCO)
for enterprise applications. Over time, the costs of managing, maintaining, and supporting those
applications are far more significant.
A large portion of day-to-day running costs is attributable to application failures, performance
degradation, intermittent faults, and operator error. The resultant downtime can severely
impact business processes throughout an organization.
Many of these problems can be mitigated by ensuring that the enterprise applications are
designed to be manageable. As a minimum, a manageable application must meet the following
criteria:
• It is compatible with the target deployment environment.
• It works well with operational tools and processes.
• It provides visibility into the health of the application.
• It is dynamically configurable at run time.
• User. The user perspective can be thought of as the consumer of the application. From
the user perspective, an application is responsible for meeting user requirements.
Requirements such as security, performance, and availability are typically defined in a
service-level agreement (SLA).
• Operator. The operator perspective can be thought of as the facilitator of the
application. From the operator perspective, the application must be provided to the
user, according to the requirements of the application SLA. The operator is responsible
for ensuring that the requirements of the user are being met and taking appropriate
action if they are not being met. Appropriate action includes troubleshooting problems,
providing the user with feedback, and providing the developer with feedback that may
lead to further development.
• Developer. The developer perspective can be thought of as the creator of the
application. From the developer perspective, the application must be designed and built
to meet the needs defined by the user. However, when creating manageable
applications, the developer perspective should also capture the needs of the operator
and the tasks the operator must perform.
Each of these perspectives is held by multiple job roles, all of whom should be involved in
developing and consuming a manageable application. For example, the developer perspective
will typically be held by one or more architect roles, along with the application developers. For
more details about the specific job roles involved in a manageable application, see Chapter 3,
"Architecting Manageable Applications."
The operations team is responsible for ensuring day-to-day availability of the application, yet
they are often provided with applications that are difficult to effectively manage. This often
results in a number of problems, including the following:
These problems affect the efficiency of the operations team to manage the application and can
ultimately affect the experience of the users consuming the application.
To solve these problems, the work of the operations team needs to be considered throughout
application design, development, test, and deployment. In many cases, this will be an iterative
process. For example, the experience gained from the day-to-day operation of the system
should guide improvements to the application design over time. With manageable applications,
it is generally easier to transfer system knowledge between all phases of the IT life cycle.
Application Dependencies
Figure 1 illustrates a typical three-tiered architecture for an application.
Figure 1
Application three-tier architecture
From an operations perspective, applications always execute on a platform and generally
communicate over a network. Applications are dependent on their own underlying system and
network layers, but they may also communicate with, and be dependent on, other applications
and services.
Figure 2 illustrates the application from the perspective of an operations team.
Figure 2
Applications from an operations perspective
Operators collect information that corresponds to each of these layers, using the information to
ensure that applications continue to run smoothly. Understanding each layer as a separate
entity, and understanding the relationships between the layers, often allows the operations
team to quickly isolate the source of any problem.
For example, if a computer running a SQL database that provides data to an application
becomes unavailable, the functionality of the application could be affected. In this situation, the
operator needs to know several things:
• What has caused the SQL Sever to become unavailable? Typically, this is exposed in the
form of instrumentation at the system tier and network tier. For example, the computer
running SQL Server may have shut down or a network cable may have been removed.
• What are the consequences to the application? Typically, this is exposed in the form of
instrumentation at the application tiers. For example, some functionality of the
application may be lost or performance of the application may be affected.
• What are the consequences to the business operations of the company? Typically, this
can be exposed in the form of instrumentation at the application business logic tier and
may depend on factors outside the application itself. For example, if a business
operation that occurs once a month is affected, and the problem occurs when there are
25 days before the operation occurs again, the problem is less critical than if the
operation must occur every day.
Typically, developers are not concerned with the details of the lower layers. However, an
architect that is designing for operations should at least have a greater awareness of these
details, because issues at a lower level can lead to problems with the health of the application
itself.
Operations Challenges
A number of the problems with product shipping faced by Northern Electronics stem from the
existing product shipping application. The operations team for this application face the following
challenges:
• They rely on users to detect and report faults. Sometimes, users cannot provide
sufficient or accurate information; this makes diagnosis and resolution of faults difficult,
costly, and time-consuming.
• They may have to visit the computer to investigate issues. The information they
receive or can extract from the event logs or performance counters may not provide the
appropriate data required to resolve the fault.
• They cannot easily detect some problems early. These problems include impending
failure of a connection to a remote service caused by a failing network connection or
lack of disk space on the server. They are unlikely to monitor performance counters and
event logs continuously and, instead, use them solely as a source of information for
diagnosing faults.
Development Challenges
The solutions architect is committed to making the new product shipping solution a manageable
application. However, he faces several challenges in achieving this goal:
• The development team has no experience in developing manageable applications, and
there is no budget for using external developer resources.
• Northern Electronics is planning to modify the design of its infrastructure, and these
plans are currently not finalized.
• Northern Electronics is planning to migrate early to Windows Vista and Windows Server
2008.
The solutions architect plans to use a management model for the application to help him
overcome these challenges.
Summary
This chapter examined the different perspectives that interact with an application and focused
more closely on the operations perspective, which must be well understood to design
manageable applications. It introduced some core principles that should be followed when
designing manageable applications. It also provided more details about the Northern Electronics
scenario.
Chapter 2
A High-Level Process for Manageable
Applications
The high-level process for manageable applications defines four interconnected stages that
capture the application through design, development, deployment, and operations, as shown in
Figure 1.
Figure 1
High-level process for manageable applications
This chapter describes each stage and demonstrates how the stages are used together in
manageable applications. As illustrated in Figure 1, the stages are the following:
• Design. A management model is used to define how the application will function in
operations. The management model captures, at an abstract level, the entities that
make up the application, the dependencies between them, the deployment model for
the application, and an abstract representation of the health and instrumentation in the
application.
• Develop. A manageable application will include extensive health and instrumentation
artifacts represented in the management model. Information contained in the
management model is used to help determine the specifics of the health and
instrumentation implementation. Instrumentation will include event IDs, performance
counters, categories, and messages. The application may also perform additional health
checks, such as synthetic transactions.
• Deploy. After the application is developed, it must be deployed. The infrastructure
model (defined as part of the management model) for the application affects the
specific environment that the application runs in, which in turn, affects the health and
instrumentation technologies that can be used. For example, an application deployed in
a low trust environment may not be able to log to a Windows Event Log.
• Operate. After the application is deployed, it must be operated on a day-to-day basis.
Typically, the operations team uses management tools to consume the health and
instrumentation information provided by the application in daily operations and makes
necessary changes to application configuration.
• Solutions architect. The solutions architect is responsible for defining the application at
the logical level. This involves determining how the application should be structured,
how health can be determined for the application (in an abstract sense), and the
instrumentation that is necessary to make that determination.
• To help define the various manageability requirements of an application, the solutions
architect should create a management model; typically, this is created in collaboration
with the infrastructure architect.
• Developer. The developer is responsible for consuming the model created by the
solutions architect and creating the application, along with appropriate health,
instrumentation, and configuration artifacts, as defined in the model.
• Infrastructure architect. The infrastructure architect is responsible for specifying the
environment in which the application will run. This information may be specified in an
infrastructure model, which may affect decisions made by the solutions architect (for
example, the trust environment into which the application will be deployed). The
infrastructure architect must also ensure that the application can be deployed in the
environment; if it cannot be deployed in the environment, the infrastructure architect
must ensure that the appropriate changes are made to the application or the
environment.
• Operator. The operator is responsible for the ongoing running of the application and
responds to application and system alerts using a variety of operations tools. The
operator may also adjust run-time configuration of the application in response to
certain events.
Figure 2 illustrates how these job roles participate in the high-level process.
Figure 2
High-level process showing job roles
Many additional job roles participate at some point in the life cycle of a manageable application.
The following table lists these roles and the perspectives that they would hold on the
application. For more information about application perspectives, see Chapter 1,
"Understanding Manageable Applications."
User Product Manager User Defines user needs and required features of the application.
(User PM) Works with the solutions architect and infrastructure architect
to define service-level agreement (SLA) for application.
User Education Developer Responsible for content in error messages, events, and Help
files.
In many cases, individuals are responsible for more than one role in a project.
Creating a management model for the application does not prevent you from using an
iterative approach when designing your application—the model should be flexible enough to
be altered as changes occur in later iterations.
Typically, the infrastructure architect and the solutions architect are the main roles involved in
creating a management model. The infrastructure architect provides input about the
environment in which the application will be deployed, which may include factors such as
network connectivity, network zones, and allowed protocols. This information is critical to the
overall design, because it can affect the way instrumentation will be implemented in the
application. For example, if the application is to be deployed in a low-trust environment, it is
typically not possible to write events to an event log. In some cases (for example, for a shrink-
wrapped application), it may not be possible to determine in advance what the deployment
environment will be, so multiple trust levels may have to be supported.
Generally, the solutions architect is responsible for the specifics of the management model. The
management model defines how the application is broken into manageable operational units
(known as managed entities). It also contains abstract information about the application, which
defines how the application is developed, deployed, and, ultimately, how it is managed. This
information includes an instrumentation model, which indicates all the instrumentation points
for the application, and a health model, which indicates the various health states for the
application.
For more information about creating a management model, including information about how to
use the Team System Management Model Designer Power Tool (TSMMD) tool, see Chapter 4,
"Creating Effective Management Models," Chapter 5, "Instrumentation Best Practices," and
Chapter 6, "Specifying Infrastructure Requirements."
The developer may also need to incorporate specific health indicators, which are used to
determine the health of an application, and configurability support, which are used to modify
what instrumentation is used at run time.
• Guidance. This guidance can be used at all stages of the application life cycle. Chapter 4
includes detailed architectural guidance for designing manageable applications.
Chapters 8–15 provide developer and deployment guidance. Chapters 16 and 17
provide detailed guidance for operating manageable applications.
• TSMMD. This tool is integrated with Visual Studio; it supports many of the
requirements involved in developing manageable application. The feature set of
TSMMD includes the following:
◦ Modeling capabilities. You can use the tool to model many of the artifacts
required in a manageable application. TSMMD represents the application as a
series of related managed entities. By defining different properties of the
managed entities, you can create an abstract representation of application
health, instrumentation, and the target infrastructure.
◦ Automated generation of instrumentation code. TSMMD includes recipes for
automatically generating instrumentation code from the information in the
management model. Instrumentation code is generated in the form of
instrumentation helpers, which separate the process of instrumentation from
the application itself. This means that the application can call abstract
instrumentation, and the application developer does not have to worry about
the specifics of the instrumentation technologies being used.
◦ Validation. TSMMD supports two forms of validation. It ensures that the model
is internally consistent and does not contain orphaned elements. It also
validates that defined instrumentation is called from the application. If
instrumentation represented in the management model is not included in the
application code, the tool generates warnings in Visual Studio.
◦ Management Pack Generation. The TSMMD can generate Management Packs
for System Center Operations Manager directly, using the information about the
instrumentation stored in the Management Model.
• MMD. The Management Model Designer (MMD) is a standalone tool that can be used
to create a hierarchy of managed entities and define a health model for the application.
The MMD can also be used to create Management Packs for Microsoft Operations
Manager (MOM) 2005 and System Center Operations Manager 2007.
• Trust levels. In some cases, the developer architect will not know the specifics of the
deployment environment for the application. By specifying multiple trust levels for an
application, the application can support deployment environments and the decision
about which trust level to use can be deferred until run time. For more details, see
Chapter 6, "Specifying Infrastructure Requirements."
• Run-time configuration. At an architectural level, it is usually not possible to be sure
exactly how the application will be used in daily operations. Therefore, the developer
architect should support flexible operations by providing run-time configuration of the
application. Typically, manageable applications need to support run-time configuration
of instrumentation so the operations team can turn on and turn off instrumentation in
real time and modify the granularity level of instrumentation.
• Management Packs. Management Packs provide a predefined, ready-to-run set of
rules, monitoring scripts, and reports that encapsulate the knowledge required to
monitor, manage, and report about a specific service or application. A Management
Pack monitors events that are placed in the application event log, system event log, and
directory service event log by various components of an application or subsystem. The
rules and monitoring scripts also can monitor the overall health of an application or
system and alert you to critical performance issues in several ways:
◦ They can monitor all aspects of the health of that application or system and its
components.
◦ They can monitor the health of vital processes that the application or system
depends on.
◦ They can monitor service availability.
◦ They can collect key performance data.
◦ They can provide comprehensive reports, including reports about service
availability and service health and reports that can you can use for capacity
planning.
Figure 3 illustrates how the guidance and other artifacts provided can be used to facilitate the
process.
Figure 3
The process showing guidance and artifacts
Summary
This chapter examined a high-level process for designing, developing, deploying, and operating
manageable applications. It examined the roles that participate in that process and the
responsibilities that each role holds. It also examined the artifacts that are available to facilitate
the process of designing manageable applications.
Section 2
Architecting for Operations
This section examines the architectural principles that should be followed when designing
manageable applications. It examines management models and looks in detail at modeling
health and instrumentation. It also captures best practices for instrumenting applications, and it
discusses how to instrument applications that may be deployed to different infrastructures.
Lastly, it shows how to use the Team System Management Model Designer Power Tool
(TSMMD) to create a management model for an application.
This section should be of use primarily to solutions architects and infrastructure architects.
Chapter 3, "Architecting Manageable Applications"
Chapter 4, "Creating Effective Management Models"
Chapter 5, "Proven Practices for Application Instrumentation"
Chapter 6, "Specifying Infrastructure Trust Levels"
Chapter 7, "Specifying a Management Model Using the TSMMD Tool"
Chapter 3
Architecting Manageable Applications
There are a number of significant challenges the architect faces when determining how to
design a manageable application. This chapter examines the fundamental design principles that
must be addressed. It then demonstrates a structure of a manageable application. Finally, it
shows how creating a management model for the application can simplify the work of the
architect and other members of the development team.
• The application instrumentation should be isolated from the rest of application code.
The architect should make informed choices about the instrumentation technologies to
use, and enforce the use of those technologies by isolating the instrumentation code in
an instrumentation helper. In this case, the application developer calls only abstract
instrumentation code and this is mapped to concrete instrumentation technologies. For
more details, see Chapter 5, "Proven Practices for Application Instrumentation."
• The application should be designed with the target environment (or environments) in
mind. Some instrumentation technologies cannot be used in low trust environments
because they require the application to have a higher level of trust than is available.
Abstracting the specifics of instrumentation can allow for increased flexibility in this
area. In cases where the architect knows the nature of the deployment environment
ahead of time, the appropriate concrete instrumentation can be mapped to the
abstract representation of the instrumentation. In other cases, increased flexibility will
be needed, and the decision about the specific instrumentation technology used must
be deferred until the application is deployed.
• The application should provide configuration options useful to the operations team.
The information provided by extensive instrumentation is of use to the operations team
only if they can perform an action based on that information. In some cases, the
operations team will need to restart the application or individual services. In other
cases, it may be possible to make other real-time changes to application configuration
to solve a problem. Instrumentation information that is closely related to configuration
options is more relevant to operations. It should also be possible to configure the
instrumentation options themselves—for example, to increase the amount of
information that is reported when troubleshooting a problem. Where possible,
configuration settings should be constrained to ensure that the operations team does
not create incorrect settings or change the wrong settings.
Relationships between managed entities are very important in a management model. These
relationships can directly affect the health, instrumentation, and performance of an entire
system. For example, in Figure 1, the Products Web service is dependent on the Products
database and the Transport Web service. This means that a change in the health state of the
Transport Web service may affect the health of the Products Web service.
The way these relationships are specified depends on the tooling used to represent the
management model. For example, the Management Model Designer (MMD) tool (which focuses
predominantly on health) enforces a parent-child hierarchy between managed entities and uses
the relationship to determine the health of a managed entity. In this case, the health of child
managed entities is rolled up to provide an indication of the health of a parent managed entity.
By contrast, the Team System Management Model Designer Power Tool (TSMMD) tool (which
focuses predominantly on instrumentation) does not use a parent-child relationship.
◦ ShippingService
◦ PickupService
◦ TransportService
• One managed entity for each database used by the application:
◦ Transport
◦ Shipping
• One or more managed entities for each workstation application that communicates
with a Web service:
◦ WarehouseClient. This corresponds to the application running on the
warehouse workstation.
The application running on the Transport Office workstation has two distinct pieces of
functionality of concern to the operations team. The solutions architect has decided to reflect
this by representing the application as two separate managed entities.
The solutions architect plans to use these managed entities as the basis for an application
management model. As a minimum, he plans to define abstract events and measures for each
managed entity, along with default instrumentation levels for each abstract event, and
mappings to concrete instrumentation technologies. He will also define trust levels for the
application and health states for the application.
Summary
This chapter examined the overall design of a manageable application and discussed the design
principles that should be adhered to when architecting manageable applications. It also used
these principles to refine the high-level process previously discussed in Chapter 2, "A High-Level
Process for Manageable Applications," and provided additional information about the Northern
Electronics Scenario.
Chapter 4
Creating Effective Management
Models
Creating a management model is a key part of designing manageable applications.
Comprehensive management models provide an abstract representation of all knowledge about
the application; they do this by capturing information that is relevant to the successful
management of the application. Management models ensure that manageability is built into
every service and application; they also ensure that management features are aligned with the
needs of the administrator who will be running the application. As a result, they can
dramatically simplify the deployment and maintenance of applications in a distributed IT
environment.
Information contained in a comprehensive management model for an application has a number
of uses for the operations team, including the following:
• It provides operations with a broader view of the applications they need to maintain by
encapsulating all the information about an application in a coherent organized manner.
• It provides an abstraction of day-to-day operations from low-level technologies. For
example, if a database that forms part of a business application fails, the operations
team will often have to examine low-level events in a SQL log to determine the cause of
a problem. However, if the management model encapsulates the functionality of the
application, a management tool can be used to diagnose and correct the problem.
• It demonstrates how the various technologies that form a solution relate to one
another in operations.
• It predicts the impact of proposed changes to the environment.
• It provides effective troubleshooting information and a detailed view of issues,
including the impact of any problem.
• It provides well-defined, prescriptive configurations for deployment.
• It automates operations with pre-defined command line tools and scripting.
The output from a management model can form the basis for the definition of many artifacts
required during development, including instrumentation and health artifacts. This ultimately
leads to well-designed application instrumentation that supports full monitoring, diagnosis and
troubleshooting by IT operations staff. Effective management models can also reduce the time
needed to adopt a new application, because operations staff will have a more thorough
understanding of the application architecture.
Management models should represent the application as comprehensively as possible.
However, even a partial management model can be very useful in creating a manageable
application. This chapter discusses the elements that make up a comprehensive management
model, and then it discusses in more detail two of the key areas that the rest of this guide will
focus on: instrumentation and health.
After a comprehensive management model is in place, management of the complete system can
be performed through the model.
Comprehensive Management Models
Creating a comprehensive management model consists of modeling in a variety of different
areas to provide a total system view, including the following:
• Configuration modeling. This involves encapsulating all the settings that control the
behavior or functionality of an application or system component.
• Task modeling. This involves cataloging the complete list of tasks that administrators
have to perform to administer and manage a software system or application.
• Instrumentation modeling. This involves capturing the instrumentation used to record
the operations of a system or application. Instrumentation provides information to the
operations team to increase understanding about how the application functions, and to
diagnose problems with an application.
• Health modeling. This involves defining what it means for a system or application to be
healthy (operating normally) or unhealthy (operating in a degraded condition or not
working at all). A health model represents logically the parts of an application or service
the operations team is responsible for keeping operational.
• Performance modeling. This involves capturing the expected baseline performance of
an application. Performance counters can then be used to report and expose
performance on an ongoing basis, and a monitoring tool can compare this performance
to the expected performance.
Configuration Modeling
In a corporate setting, system administrators frequently have to configure thousands of client
computers and hundreds of servers in their organizations. Standardizing and locking down
configurations for client computers and servers helps simplify this complexity. Recent studies on
total cost of ownership (TCO) identify loss of productivity at the desktop as one of the largest
costs for corporations. Lost productivity is frequently attributed to user errors, such as
modifying system configuration that renders their applications unworkable, or to complexity,
caused by non-essential applications and features on the desktop. Configuration modeling
attempts to address this problem by capturing all the settings that control the behavior or
functionality of an application or system component.
Configuration modeling addresses only those settings that are controllable by an administrator
or an agent. Typically, a configuration model captures the valid configuration settings for client
computers and users, and also for member servers and domain controllers in an Active Directory
forest.
In many cases, configurations will be standardized and centrally managed using technologies
such as Group Policy or Systems Management Server (SMS).
For an application to be managed using Group Policy, that application must have built-in
support for Group Policy. Enterprise Library applications can be managed by Group Policy,
using Enterprise Library.
Task Modeling
Administrators typically must learn to use multiple tools to achieve a single administrative task.
Task modeling helps address this problem by enumerating the activities that are performed
when managing a system as defined tasks. These may be maintenance tasks, such as backup,
event-driven tasks, such as adding a user, or diagnostic tasks performed to correct system
failures. Defining these tasks guides the development of administration tools and interfaces and
becomes the basis for automation. The task model can also drive self-correcting systems when
used in conjunction with instrumentation and health models.
Task-based administration uses tasks to administer systems. Task models describe
administration of the component or application in terms of tasks. Tasks are defined as complete
actions that accomplish a goal that has a direct value to the administrator. They enable task-
based administration; this makes it easier to define, enforce, and delegate responsibilities to
different system administrators. In the future, task models will provide a foundation for role-
based access control.
Building all command-line and GUI administration tools based on the same task model can
dramatically lower the time and effort required to learn how to manage Windows operating
systems, server applications, and client applications and it enables automating system
administration tasks.
The following are the most important benefits of building a task-based administration model:
• Administrative tasks can more closely reflect the operations experience. The
administration of applications is described in terms of tasks that are understandable by
system administrators instead of simply reflecting the way in which the application was
developed.
• User experiences are consistent with the administrative tools. Administrative tools
may be GUI-based snap-ins, command line-based utilities, or scripts (for example,
Powershell scripts). Consistency between all these administrative tools allows
administrators to start working with the system using easy-to-understand GUI tools,
and then directly use this knowledge to manage applications with command-line tools
and build automated management scripts.
• Role-based administration is easier to implement. Task models can be the foundation
for implementing role-based administration for your application. Role-based
administration allows you to simplify the access control list (ACL) complexity that exists
today. Task models provide a simplified method for assigning and grouping
responsibilities and access rights. A user role can then be defined as a collection of
tasks. Being a member of a particular user role simply implies being allowed to perform
a set of tasks.
• System management costs for your software are easier to estimate. Each task in the
task model has an associated cost when performing the task. The cost of executing a
task depends on different factors, such as how frequently the task should be
performed, how long it takes to do it, the skill level of the person who runs it, and so on.
It currently takes a substantial amount of effort to gather these statistics. Capturing this
data in task models allows you to do the following:
◦ Calculate the management cost for your product.
◦ Compare it to the management cost of the previous version or a competitor’s
product.
◦ Show your customers the financial benefits of migrating to the new version.
◦ See what tasks cost your customers the most to perform.
Instrumentation Modeling
Applications often contain minimal instrumentation or instrumentation that is not relevant to
operations. This results in applications that are difficult to manage, because the operations staff
is not provided with the information it needs to manage the application on a daily basis or to
troubleshoot issues as they occur.
Instrumentation modeling helps to ensure that appropriate instrumentation is built into the
application from the beginning. An instrumentation model allows you to discover the
appropriate instrumentation requirements and then implement this instrumentation within the
application.
Benefits of instrumentation modeling include the following:
• It makes the task of developing the instrumented application more straightforward
for the application developer. The application architect can create the instrumentation
model in abstract form in advance of the development process, clearly defining the
nature of instrumentation required in the application.
• It provides relevant feedback about the application to the operations staff. Well-
designed instrumentation will correlate closely to the operations view of the application
and assist in daily operations tasks. In other words, it will correspond directly to the
configuration and task models. At a deeper level, instrumentation will provide
diagnostic information that the operations team can use to troubleshoot applications
problems.
• It provides feedback about the application to the application developers.
Instrumentation can also provide information to a developer that is directly relevant to
the design of the application. This makes application testing easier, and it reduces the
costs of future development cycles. This type of administration is generally hidden from
operations.
Health Modeling
Health modeling defines what it means for a managed entity to be healthy or unhealthy. Good
information about the health state of an application or system is necessary for maintaining,
diagnosing, and recovering from errors in applications and operating systems deployed in
production environments.
Health modeling uses instrumentation as the basis on which monitoring and automated
recovery is built. Frequently, information is supplied in a way that has meaning for developers
but does not reflect the user experience of the administrator who manages, monitors, and
repairs the application or system day to day. Health models allow you to define both what kinds
of information should be provided and how the administrator and the application or system
should respond.
When customers are evaluating a new application, they expect to receive important information
about its capabilities, along with deployment and setup instructions. However, they frequently
are never given the guidance or tools to operate that software on a daily basis after it is
deployed.
Providing the correct view of an application, what it looks like when it is functionally normally
and when it is not functioning normally, and providing the correct knowledge to troubleshoot
issues to IT and operations customers allows them to meet their service level agreements (SLAs)
to their own customers. Troubleshooting guidance and automated monitoring capabilities
delivered to customers when an application is released will substantially improve the adoption
and deployment rates for any new or updated application. Customers will be more comfortable
and confident in deploying new technology when they can monitor how it is performing in
production and know how to get out of trouble quickly when something goes wrong.
Most problems that impact the service delivery of an application could be fixed before the
problem is visible to end users. Effective health modeling ensures that the operations team
thoroughly understands what affects the health of their system, so problems can be detected
before service is impacted and troubleshooting and resolution can be automated as much as
possible. When a problem is detected, the management model facilitates a thorough diagnosis
and a proper solution. Health modeling also enables the operator to take preventive-care
measures before problems occur to maximize system up-time.
Performance Modeling
Performance modeling is used to capture the expected performance of a system, defining a
baseline that can be measured against in the future. Performance modeling is closely related to
instrumentation modeling (performance counters are a form of instrumentation) and health
modeling (an application that is performing poorly compared to a pre-determined baseline is
typically considered to be unhealthy).
Performance modeling is useful in capacity planning because it can be used to help determine
expected performance when a system is put under stress or when the configuration of a system
is changed in some way.
A monitoring tool is normally used to measure an application against the performance
information in a management model. When the monitoring tool detects that the application is
not responding or is failing to meet the expected performance level, it can raise an alert to the
operations staff and send an e-mail message. Operators can check the performance and event
logs to get diagnostic information about the problem that will help them recover the application
in the shortest possible time.
Your management models should capture the abstract instrumentation requirements for your
application. The developer can then use these requirements to create the corresponding
instrumentation artifacts.
Types of Instrumentation
Typically, instrumentation takes one of two forms in an application:
• Performance counters
• Events
When determining how to support manageability, you should consider how operations will
consume the instrumentation you create. Instrumentation created by the developer may be
consumed in a relatively raw form by the operator—for example, by examining event logs or by
using a low level tool or script to examine Windows Management Instrumentation (WMI)
events. However, particularly in larger organizations, the operator may have access to a tool
such as Microsoft Operations Manager (MOM), which allows him or her to see the information
in a more structured way, and can automate many of the processes of effective operations, such
as creating rule sets and issuing alerts.
Performance Counters
Performance counters provide continuous metrics for specific processes or situations within the
system. For example, a performance counter may indicate the current processor usage as a
percentage of its maximum capacity or the percentage of memory available. The metric can also
be an absolute value instead of a percentage, such as the number of current connections to a
database, or the number of queued requests for a Web server.
The operating system and the default services, such as Internet Information Server (IIS) and the
Common Language Runtime (CLR), expose built-in performance counters. In general, you should
aim to use these where possible, complementing them with custom performance counters only
where necessary. For example, your management model should specify use of the built-in IIS
Request Execution Time counter if this can provide the information required by the
management model. In this case, adding an equivalent custom counter will simply add to the
load on the server; it will not achieve anything extra.
Built-in counters cover a wide range of processes in IIS, ASP.NET, the CLR, and SQL Server. For
a complete list of these counters, see "Windows Server 2003 Performance Counters
Reference" on Microsoft TechNet at
http://technet2.microsoft.com/WindowsServer/en/library/3fb01419-b1ab-4f52-a9f8-
09d5ebeb9ef21033.mspx.
Events
Monitoring tools can read the event logs of each server in a distributed application and use this
information to raise alerts and send e-mail messages to specified groups of operators when
problems occur. They can also indicate recovery from a problem, which allows operators to
verify that resolution of a problem was successful. Events may take many forms, including
Windows Event Log events, WMI events, and trace statement file entries.
You should consider specifying events for all possible state transitions; operators can filter those
that are of interest. To allow filtering to take place in the monitoring environment, events must
specify a severity and a category in addition to the description and, where possible, recovery
information. Events can also specify security levels; in this case, filtering can take place based on
an operator's security status.
You can use events to indicate non-error conditions if this is appropriate for your application, or
if it is necessary to indicate state changes. To indicate a state change, you can arrange for a
service to raise an event when it starts and again when it completes processing of each Web
service request. An event handler can then be used to determine the average number of
requests in a particular period, the average request time, and the total number of requests. If
these values reach some pre-defined threshold, the event handler then raises another event.
In this case, you are using events to implement a counter, and then monitoring the counter
within your code. However, the overall result is that, in line with the principles of health
monitoring, your application raises an event to the monitoring system that indicates a state
transition. Your management model will simply indicate that the specified process can undergo
a state transition and the parameters that indicate when this state transition takes place.
Another approach is to focus instrumentation on elements that are relevant to operations. If the
application is structured according to the principles outlined earlier in this chapter, the services
that make up the application will correspond to those defined in the management model; they
will also correspond to the units of operation that can be seen and, in many cases, configured by
operations. This approach offers a number of advantages:
• It provides information directly relevant to operations, which can be acted upon.
• It requires less development time (although potentially more initial time in determining
the services).
The difficulty of the second approach is that it requires the developer to map instrumentation to
the operator's view of the application. In some cases, it may not be possible to determine
exactly what instrumentation will be most useful at run time, although involving the
infrastructure architect and administrators in the creation of the management model should
help.
The recommended approach when determining what to instrument is to perform extensive
instrumentation of all elements that could be of use to operations, and provide run-time
configurability to allow operators fine-grained control over instrumentation.
Granularity of Instrumentation
After you decide what to instrument, you need to determine the appropriate level of granularity
for instrumentation. Normally, the appropriate level of instrumentation will depend on the
current health state of an application. When an application is functioning normally, the operator
will require minimal detail to indicate successful operations. However, if the application has a
problem, or is about to have a problem, more detailed and granular information is useful.
To support this, you should consider instrumenting the application at a fine-grained level but
allowing the operator to configure the level of instrumentation that is exposed at application
runtime.
Performance Considerations
When determining how to instrument your application, you should bear in mind that monitoring
performance counters and raising events absorbs resources from the system being monitored.
As a general rule, you should ensure that monitoring does not consume more than 10 percent of
the available resources on the host.
Health States
The overall health of an application or system is determined by the health of the managed
entities that make up the application. A managed entity is typically considered to be in any one
of three health states:
• RED. This corresponds to a failed state.
• YELLOW. This corresponds to a less than fully operational state.
• GREEN. This corresponds to normal operation within expected performance
boundaries.
In some cases, it is considered beneficial to differentiate between a failed state and an offline
state. In this case, the failed state is represented by RED and the offline state is represented by
BLACK.
Information about the health state of managed entities can be manually gathered by operators
or by management tools that allow operators to do the following:
• Detect a problem.
• Verify that the problem still exists.
• Diagnose the cause(s) of the problem.
• Resolve the problem.
• Verify that the problem was resolved.
• Build the correct application structure, which is made up of components derived from
appropriately predefined components (base classes as defined in the Common Model
Library [CML]) and the relationships between them.
• Build a hierarchy of managed entities that represent the logical services and objects the
application exposes—in a way IT professionals can understand.
• Identify the functional aspects for each managed entity that are of interest for
monitoring. For more information about aspects, see the definition of a managed entity
in Chapter 3 of this guide.
• Identify all the health states that are possible for the application.
• Identify the verification steps that need to be taken to confirm or refute whether an
aspect is in a particular health state.
• Provide the instrumentation required to detect each health state.
• Identify the diagnostic steps needed to determine the root causes for each aspect's
health state.
• Identify the recovery steps that need to be taken to resolve each root cause and return
an aspect and its parent managed entity to full health.
Figure 1
Dependencies in the example application
The following table indicates the health states of the low-level entities illustrated in Figure 1.
Entity State Description and effect
The customer Web service and products Web service both have dependencies on these low-
level entities and a corresponding dependency on the health state, as shown in the following
table.
Aggregate Aspects
Different audiences or consumers of an application or service require different views of the
health of a managed entity. An aggregate aspect provides a higher-level view of a health state
by aggregating health state information from different aspects (and potentially other aggregate
aspects). A common scenario for using aggregate aspects is when you need to represent the
health state for a particular functional area of an application at the managed entity level and a
managed entity has multiple instances (multi-instance managed entity). Figure 2 illustrates a
case where health state can be aggregated at different levels.
Figure 2
Health state of an aggregate aspect
In this case, Web service A is a parent of multiple instances of Web service B (residing on a Web
farm). Web service B has an aspect named connectivity, which corresponds to connectivity to a
database. An administrator wanting to monitor Web service A looks at the connectivity aspect,
which turns yellow if 50 percent of the instances of Web service B have no connectivity and red
if 75 percent of the instances of Web service B have no connectivity.
In some cases, a RED (failure) for one aspect may cause its managed entity, and perhaps even
the entire application, to fail. In other cases, a RED (failure) for one aspect may cause only
degraded performance of one managed entity (YELLOW). There are no definite rules to help you
decide, because each application is unique and the affects of each component and managed
entity will vary. However, there are two general rules:
• If any child of a parent is RED, the parent should be RED or YELLOW. Otherwise,
operators viewing only roll-up indicators will not realize that there is a failure
somewhere in the application.
• If a managed entity is vital for operation of the application, a RED state must cause all
parents and ancestors to be RED indicating at the top level of the monitoring tree that
the application has failed.
Exactly how roll-ups are used in determining health will depend on the technology used. For
example, System Center Operations Manager (SCOM) 2007 uses roll-ups in a different way to
the MMD tool.
Figure 3
Troubleshooting workflow
Detection
A monitoring agent defines how the health states of a particular aspect can be detected.
Typically, there are multiple ways to detect a problem with a managed entity. To detect a
problem, a monitoring agent can do the following:
• Listen for events related to the health of the managed entity.
• Poll and compare performance counters against the specified thresholds as the basis to
detect a problem.
• Scan trace logs for information used to detect a problem.
• Use health indicators, such as heartbeats or synthetic transactions, to determine health.
For more details about the specific health indicators you can use, see Chapter 6,
"Specifying Infrastructure Requirements."
The instrumentation listed within a single detector is linked by the OR logic operator—that is,
the application enters the health state if any of the items is detected. If multiple detectors are
present, they are linked by the AND operator, and all the conditions need to be detected
simultaneously for the application to enter the health state.
It can also be given a NOT flag; in which case, the health state is signaled by the absence of the
detector listed within the monitoring agent within a particular timeframe.
When the defined problem signatures are detected, a problem associated with the operational
condition and health state is indicated. Until the condition is verified, the associated health state
is not updated, and diagnosis and recovery steps should not be attempted.
Verification
After a problem is detected, it is often necessary to verify that it actually still exists. This step is
critical to make sure the problem was not simply a surge in resource consumption, spike in
workload, or simply a transient issue that has gone away. Verification is basic confirmation that
the application is in a particular operational condition without trying to diagnose why or to
recover from it.
The logic that verifies whether an aspect is in a red or yellow operational condition should be
separated into a separate external verifier that will simply return which of the three possible
conditions is in effect at the time. Verifiers should not attempt any kind of diagnosis because
they need to be lightweight and their job is only to confirm whether or not the loss of
functionality (such as "Queue Latency Critical" or "Can't Print") is still observed. Having a verifier
that is built as an external script or executable file will allow the same piece of code to be used
to do the following:
Diagnostics
After a negative health state has been detected within an aspect and confirmed to still exist, it
may be necessary to perform diagnosis to determine the root cause of the problem so the
appropriate recovery actions can be taken. Wherever possible, you should try to have
instrumentation that is specific enough to lead directly to resolution, and thereby avoiding this
step. Even if you do not know the exact root cause of a problem, there is usually a good
indication of where to start diagnosis based on the context of how the problem was detected.
In many cases, further analysis is required during diagnosis. For example, it may be known that
there is a network connectivity problem of some kind because of an error code that was
returned to the application. However, until it has been determined that the IP address lease
from the DHCP server was lost, the steps needed to fix it (attempting to renew the lease) are not
clear. Additional trace logs may have to be examined, correlation of information from other
events may have to be done, or even querying the live run-time state may be necessary to
determine the true root cause of a problem.
The diagnostics step uses all forms of available instrumentation, such as events, performance
counters, WMI providers, and activity traces, to correlate information and determine the root
cause. The diagnostics step can take a long time and further disrupt service while it is
happening. It can be necessary to inspect a much broader set of internal state parameters and
correlate between applications to perform the diagnostics step.
The diagnosis step captures the step-by-step instructions of what someone needs to do to
diagnose the root cause of a problem, and may also include script or code to automate this
diagnosis. It can be thought of as a function that takes a general high-level indication of what is
causing a particular aspect of health state as input and returns a specific root cause that can
then be used to take the appropriate recovery steps. The event or performance counter that
leads to the detection of the health state will usually indicate where to start diagnosis for the
problem.
Resolution
After the root cause is identified, the next step is to attempt to resolve the problem. This
process can involve reconfiguration of the application, restarting a service, manipulating internal
state by calling some management API, or performing some other administrative task.
Resolution may also be in the form of a code or a script that will attempt to automate the
resolution steps. It can also reference the GUID of another "blame" managed entity that is
failing to provide the needed services that will become the new starting point for diagnosis.
Re-verification
The same verification procedure that was used to verify the existence of the operational
condition is used to re-verify that the operational condition has indeed been corrected. When
the issue is successfully resolved, this returns false.
Figure 4 illustrates the structure and content sections of typical health models for the System,
Application, and Business Operations categories.
Figure 4
The structure and content sections of a typical health model
As shown in Figure 4, each section contains the following:
• Requirement. This includes a series of rule definitions in appropriate terms for the
section in which it resides—for example, a rule in the Business Operations section that
all orders marked as "urgent" should be processed and completed within four hours.
• Detection Information. This includes a series of rules or functions that implement the
detection information. The rules or functions indicate the heath state or condition of
the application (such as "offline" or "failed"), the criticality (RED, YELLOW, or GREEN),
the alerts to send to the operator and monitoring system, and a series of indicators that
define the Health and Diagnostics Workflow sections to which this rule applies. There
may also be a contingency plan that describes workaround procedures while awaiting
rectification of the fault.
• Health and Diagnostics Workflow. This describes the steps to verify the fault, diagnose
the causes, resolve the problem, and re-verify the solution afterward.
• The health model may define requirements for which measurement is possible only
through using indicators built into the operating system or some underlying application
code, such as one of the Enterprise Library application blocks. In this case, you would
take advantage of the indicators provided by the operating system or underlying
application instead of creating a specific indicator. However, the rule is still part of the
health model for the application you are designing.
• A state transition in specific components may reflect one of a set of different underlying
problems. Correct implementation of the instrumentation will include indicators for
each detectable condition, such as failure to open a connection, failure to update data,
or failure to commit a transaction. Each indicator will return a RED, YELLOW, or GREEN
status, allowing operators to see if the failure is because of, for example, an incorrect
connection string, incorrect permissions within the database, or failure of another of
the series of data access operations.
• Some failures may have more than one cause but only one effect that is detectable
within the application. For example, failure to access a database may be the result of a
network failure, an incorrect connection string, or a database server failure. However,
indicators for this aspect within the application will probably not be able to detect the
actual cause and will just return RED (failed). To diagnose this failure requires other
indicators within the database system, which are not part of the health model for this
application.
A state transition defined in the health model will not always map directly to a single indicator
status change; in some cases, it will reflect the overall results from a combination of settings.
Figure 5
Rolling up managed entities from a Web server farm
The example in Figure 5 illustrates how you can circumvent some of the issues you may
encounter when mapping a health model to the managed entities it describes. To create an
indicator for the overall state of this section of the application in terms of the number of servers
online (if this was a requirement of the health model), you would have to use a probe, such as a
ping request to each server through its IP address, to detect individual server failures.
The alternative of rolling up the individual aspects from the servers through a rule in the health
model makes more sense and reduces the impact on the servers. In the monitoring application,
you would implement the health model rule using the tools that monitor the event logs of the
individual servers and a roll-up rule or combining rule that produces the overall indication based
on the definition in your health model of the minimum number of servers to be online.
The solutions architect has also defined the following abstract measures for the application:
• PickupServiceConfirmPickup
• ConfirmShippingService
• ConfirmShippingServiceResponse
• DelayedShippingService
• DelayedShippingServiceResponse
• ShippingRequestPerSecond
• TransportServiceOrderTransport
• TransportServiceOrderTransportResponse
Health Model
The solutions architect has defined the following operational requirements for the application:
• The Transport Order Web service must be available at all times.
• The Transport Order application must be available at all times.
• The Warehouse Management application must have an availability exceeding 90
percent.
• The Shipping Service must be available at all times. However, if the Transport Order
Web service is not available, it will store transport requests until it can pass them to the
Transport Order Web service and must store them for a maximum of two hours.
The solutions architect will use the health model for the application to help determine whether
these requirements are being met. The solutions architect does not consider it necessary to
differentiate between offline and failed heath states, so he defines the following three health
states for each managed entity:
• Green. This indicates the managed entity is working normally.
• Yellow. This indicates the functionality of the managed entity is degraded.
• Red. This indicates the managed entity is either unavailable or offline.
The solutions architect has defined the following aspects for each managed entity:
• Connectivity
• Data access
Other documentation that references Northern Electronics differentiates between a failed and
offline state, so it uses the four health states: Green, Yellow, Red, and Black.
The Transport Order Web service depends on the Transport Order application and the
Warehouse Management application. This means that the health of the Transport Order Web
service is affected by the health of these other managed entities and their corresponding
aspects. Figure 6 illustrates how problems with the Transport Order Web service and
Warehouse Management applications affect the health of the Transport Order application and
the Shipping Service.
Figure 6
Rolling up heath states
In this case, the connectivity aspect of the Transport Order Web service is RED, indicating a
failure. This means that the Transport Order application is also RED because it cannot process
requests, even though its own Data Access aspect is GREEN. The Warehouse Management
application is GREEN because its Data Access aspect has only just transitioned to YELLOW, and
this situation is within the operating parameter defined in the health model (90 percent
availability).
The health model also defines the contingency situation for the Shipping Service in that it will
store requests for the Transport Oder application for a maximum of two hours. Assuming that
this period has not yet passed, the Shipping Service is YELLOW indicating a pending problem but
not a failure.
Summary
This chapter has described how to create effective management models that capture the
knowledge about an application. In cases where all the knowledge cannot be captured, it is still
effective to use a management model to capture health and instrumentation information, which
are critical to designing manageable applications.
Chapter 5
Proven Practices for Application
Instrumentation
Software instrumentation provides information about executing applications. This information
can be used for a number of purposes, such as troubleshooting, capacity planning, business
monitoring, optimizing development, and security auditing. In this guide, instrumentation is
created to support the management of software applications by operations staff, where their
primary concern is the health of the application—for example, the response time for specific
operations, the availability of key resources, or the status of integration points. This chapter
provides a number of proven practices for the architecture and design of software
instrumentation in general, but it focuses on those aspects of instrumentation that assist
operations staff determine application health.
• Coarse. This level indicates the event is raised during all operations.
• Fine. This level indicates the event is raised during diagnostic and debug operations.
• Debug. This level indicates the event is raised only during debug operations.
• Off. This level indicates the event is not raised at all.
Do not use the granularity level for anything other than verbosity control. Configuration levels
should be inclusive; if you require different behavior, you should use different events.
After levels are defined for each event, an overall instrumentation level can be specified for
each managed entity in the configuration file for that managed entity. Whether a particular
event is raised is dependent on comparing these two values. For example, an architect could
specify that if an event is specified as fine, and the overall instrumentation level for the
managed entity is specified in configuration as coarse, the event will not be raised. However, in
this case, if the overall instrumentation level in the configuration file is changed to fine or
debug, the event will be raised.
The information in the following table shows in more detail how these rules are used.
Coarse Off No
Fine Coarse No
Fine Off No
Debug Coarse No
Debug Fine No
Debug Off No
Off Coarse No
Off Fine No
Off Debug No
Off Off No
By specifying an overall instrumentation level in the configuration file for each managed entity,
it is possible to change the instrumentation level at run time in response to a change in
circumstances. For example, if a stop event is raised during coarse grain monitoring, the level of
instrumentation may be changed to fine, so that more information about the application can be
gained.
You may require additional configurability of instrumentation at run time. As a minimum, you
should be able to turn on or turn off instrumentation at run time. However, in many cases, in
addition to granularity control, you will also need to define other settings, such as designating a
remote source for logging.
The default overall instrumentation level will normally be set to Coarse in production
environments to maximize application performance.
The next sections describe each of these design practices in more detail.
Windows Vista supports event forwarding, which allows the application to write events locally
and then have the events forwarded to a centralized location automatically.
Summary
This chapter has examined proven practices that can be used for instrumentation when
architecting, designing, and deploying applications. You should use this chapter in conjunction
with Chapter 2, "Architecting Manageable Applications," to ensure that you create instrumented
applications that support the overall goal of designing manageable applications.
Chapter 6
Specifying Infrastructure Trust Levels
It is important to understand the infrastructure to which an application is going to be deployed
when the application is first developed. This allows the solutions architect to ensure that the
application will work as expected in the target environment; it also allows the application to
take full advantage of the existing infrastructure.
If different deployment infrastructures are not considered at design time, the application will
not function as expected in some cases; this results in increased costs from change requests at
staging or deployment time. However, it is not always possible to determine the exact nature of
the deployment infrastructure at design time. In many cases, the full details of each datacenter
are not known. Even when the deployment infrastructure is well known at design time, there
may still be a requirement to support multiple infrastructure environments.
In some cases, changes will occur to the target infrastructure during application development.
The deployment environment may also change during deployment or after deployment in a way
that affects the application. If these changes are communicated back to the application
architect, appropriate changes can be made to the application design.
Understanding the target infrastructure is particularly important when designing manageable
applications. The environment into which an application is deployed can have a significant effect
on the instrumentation technologies that can be used in that application. For example, if an
application cannot be installed or run with administrator privileges, that application cannot
write to an event log.
The developer architect must ensure that the application uses instrumentation that will work in
the target environment. In cases where the target environment is not known, the architect will
typically need to support multiple forms of instrumentation, and allow the specific technologies
to be configured at deployment time or run time.
You should deploy applications into the lowest possible level trust environment for proper
execution. The trust environment used leads to design decisions that a developer architect
must make early in the development life cycle.
This chapter describes different infrastructure model scenarios, and then it examines tools that
can be used to create an infrastructure model. The chapter then describes how the Team
System Management Model Designer Power Tool (TSMMD) can be used to capture information
about an infrastructure pertinent to manageable applications.
LowTrust Writable to isolated storage, local event log, and Web access.
MedTrust Writeable to isolated storage, local event log, Web access, the file system, and
system registry.
Everything Unrestricted access to performance counters and event log, message queues, and
the service controller, in addition to the file system, system registry, and the Web.
Several of these capabilities have some overlap. For example, creating a performance counter
requires administrator privileges because it essentially writes to a secure part of the system
registry.
Figure 1 illustrates the .NET Framework Configuration tool that can be used to specify the
permission set used for an application.
Figure 4
.NET Framework Configuration tool
The Team System Management Model Designer Power Tool (TSMMD) does not attempt to
enforce selections made when defining trust levels or the instrumentation technologies that are
available within the trust level. This is because each deployment infrastructure is potentially
unique.
This means the TSMMD is flexible in that it may be adapted to a wide range of infrastructure
definitions represented as trust levels. But with this flexibility comes opportunity to make
mistakes when defining available instrumentation technologies for a target deployment
environment.
The next sections describe each of these types of tools in more detail.
Standalone Tools
Standalone tools, such as Microsoft Visio, can be used to create a comprehensive model of each
infrastructure that should be supported by an application. In its simplest form, the model would
consist of a named instance of each infrastructure element, with a number of properties
defined. This model can be modified over time as the infrastructure changes.
In some cases, a standalone tool may be designed to export information directly to a
development environment, allowing the developer architect to directly use the information in
his or her design. In other cases, the architect would simply read the model and use it to make
decisions on the design of the application.
Integrated Tools
Integrated tools, such as the TSMMD are used to make the infrastructure model part of the
overall design of the application. This allows the developer architect to directly model elements
of the infrastructure at design time, and it ensures that the infrastructure considerations are not
missed when the application is developed. The TSMMD also allows the instrumentation
technologies for each infrastructure scenario to be automatically generated. The next section
provides more details about the TSMMD.
• Enterprise Library Logging. Enterprise Library, from the Microsoft patterns & practices
division, contains the Logging Application Block that allows developers to perform a
wide range of logging tasks using a standardized and easy-to-use interface. The TSMMD
integrates with the Logging Application Block to allow architects to model and specify
events that the Logging Application Block will handle, and which it will write to the
configured target. By default, the TSMMD configures the Logging Application Block to
write events to the Windows Event Log, but administrators can change the
configuration to send events to any target medium supported by the Logging
Application Block (such as email, database, MSMQ, or text files).
• Windows Event Logging. The Windows Event Log service enables an application to
publish, access, and process events. Events are stored in event logs, which can be
routinely checked by an administrator or monitoring tool to detect occurrences of
problems on a computer. The Windows Event Log SDK allows users to query for events
in an event log, receive event data as events occur (subscribe to events), create event
data and raise events (publish events), and display event data in a readable format
(render events). For more information about Windows Event Logging, see Chapter 9,
"Event Log Instrumentation."
• Windows Eventing 6.0 event s. The Windows Eventing 6.0 service added to Windows
Vista and Windows Server 2008 extends the capabilities of the event logging system
while still providing familiar access to Windows Event Logs. Applications can publish,
access, and process events, and administrators or monitoring tools can use the logs
detect occurrences of problems on a computer. For more information about Windows
Eventing 6.0 Event Logging, see Chapter 11, "Windows Eventing 6.0 Instrumentation."
• Event Trace for Windows. Event Tracing for Windows (ETW) provides application
programmers the ability to start and stop event tracing sessions, instrument an
application to provide trace events, and consume trace events. Trace events contain an
event header and provider-defined data that describes the current state of an
application or operation. You can use the events to debug an application and perform
capacity and performance analysis.
• WMI events. Windows Management Instrumentation (WMI) is the instrumentation
standard used by management applications such as Microsoft Operations Manager
(MOM), Microsoft Application Center, and many third-party management tools. The
Windows operating system is instrumented with WMI, but developers who want their
own products to work with management tools must provide instrumentation in their
own code. WMI in the .NET Framework is built on the original WMI technology and
allows the same development of applications and providers with the advantages of
programming in the .NET Framework. For more information about WMI events, see
Chapter 10, "WMI Instrumentation."
• Performance counters. Windows collects performance data on various system
resources using performance counters. Windows contains a pre-defined set of
performance counters with which you can interact; you can also create additional
performance counters relevant to your application. For more information about how to
programmatically create performance counters and how to read performance counters,
see Chapter 12, "Performance Counter Instrumentation."
To support this scenario, the solutions architect decides to define two target environments:
• Low trust will be used when a managed entity is writing to a trace file. Low trust will be
defined as a target environment instrumentation for all the Web service managed
entities.
• High trust will be used when the managed entity is writing to the event log and creating
performance counters. High trust will be defined as a target environment
instrumentation for both the Web service and the workstation application managed
entities.
Summary
This chapter discussed the use of infrastructure trust levels, which are used to support different
target deployment environments. If an architect can define different infrastructure models that
the application supports, the decision about which instrumentation technologies to use can be
deferred until the application is deployed or run. This helps to ensure that the application will
function as expected in the target environment, without requiring changes to the underlying
application code.
Chapter 7
Specifying a Management Model
Using the TSMMD Tool
Creating a management model for your application can be somewhat challenging. To simplify
the process, this guide includes a tool, known as the Team System Management Model Designer
Power Tool (TSMMD), which allows you to graphically model and operations view of the
application. You can use the tool to apply instrumentation to this model and some basic health
artifacts.
This chapter describes the requirements for the TSMMD, and then it demonstrates how to use it
to create a management model for your application.
You must install both the C# and C++ languages when you install Visual Studio 2008.
The TSMMD requires C++ to generate instrumentation for Windows Eventing 6.0
events.
To obtain the Team System Management Model Designer Power Tool, visit the Design For
Operations community Web site at http://www.codeplex.com/dfo/.
Creating a Management Model
The following are the high-level steps to creating a management model with the TSMMD tool:
1. Create a TSMMD file.
2. Graphically model an operations view of the application.
3. Define Target Environments for the application.
4. Define instrumentation for the application.
5. Create health definitions for the application.
6. Validate the model.
If you cannot see the Management Model Explorer window, click the View menu,
point to Other Windows, and click ManagementModel Explorer.
3. Ensure that the guidance packages for the TSMMD are loaded. To do this, click
Guidance Package Manager on the Visual Studio Tools menu. If the list of recipes in the
Guidance Package Manager dialog box does not contain any entries that apply to Team
System Management Model, follow these steps to enable the recipes:
◦ Click the Enable/Disable Packages button.
◦ Select the two guidance packages named Team System MMD Instrumentation
and Team System MMD Management Pack Generation.
If you do not see the two guidance packages in the list, you may need to reinstall the
TSMMD guidance package.
4. In Management Model Explorer, select the top-level item named Operations. In the
Visual Studio Properties window, enter values for the Description, Knowledgebase,
Name, and Version. If you cannot see the Properties window, press F4.
5. Enter values for the Description, Knowledgebase, Name, and Version in the controls of
the wizard, and then click Next.
6. In Management Model Explorer, expand the Target Environments node and select the
target environment named Default. Change the values of properties to indicate
instrumentation technologies you want to use in the default target environment.
7. Right-click on the top-level model entry in Management Model Explorer and click Add
New Target Environment if you want o add more target environments, setting the
appropriate instrumentation technology check boxes for each one.
You use the properties of a target environment to specify that you require any
combination of Enterprise Library Logging events, Windows Event Log events, trace file
events, Windows Eventing 6.0 events, Windows Management Instrumentation (WMI)
events, and Windows performance counters for that target environment. You can also
add more than one target environment to a model to describe different deployment
scenarios.
8. On the File menu, click Save All to save the entire solution.
The following procedures detail how to model these artifacts in the designer.
To create managed entities using the Wizard:
1. Right-click the designer surface of the management model diagram or right-click the
top-level item in Management Model Explorer, and then click New Managed Entity
Wizard.
2. Enter the required information into the pages of the wizard. The wizard allows you to
specify the name, type, description, discovery type and target, and enable model
extenders for the new managed entity.
Each managed entity must have a name that is unique from other managed entities and
external managed entities in the management model. Validation code checks for this and
prompts you with a dialog box if two entities are identically named.
For a local managed entity (an entity from the list above that is part of the application, but
excluding the External Managed Entity), you can specify values for the properties shown in the
following table.
Discovery Target This property, in conjunction with the Discovery Type, defines the way that the
monitoring system will locate the entity to check if it exists on a monitored server. In
other words, whether this part of the application is deployed on that server.
Discovery Type This property defines where the monitoring system should look for the Discovery
Target value. Depending on the type of entity, the options are FilePath,
RegistryValue, ServiceName, and IISApplicationName.
Executable Application
The Executable Application entity represents a Windows Forms application, a console
application, or any other type of application that is not a Windows Service or an ASP.NET based
application or service. You can specify a Discovery Type of either FilePath or RegistryValue for
this type of managed entity. There are no additional properties for an Executable Application
entity.
Windows Service
The Windows Service entity represents a Windows Service that (usually) has no runtime user
interface. You can specify Discovery Type of only ServiceName for this type of managed entity.
The Windows Service entity has one additional property shown in the following table.
Windows Service Extension This Boolean property specifies whether the process that generates
Enabled management packs will add a specific extender monitor to the management
pack that checks the status of a Windows service by querying WMI at timed
intervals. The monitor will raise an alert if the service is configured to start
automatically and is not currently running. The monitor will not raise an alert
If the service is disabled or configured to start manually if it is not running, or
when it is stopped.
ASP.NET Application
The ASP.NET Application entity represents an ASP.NET application that runs on an Internet
Information Services (IIS) Web server. You can specify a Discovery Type of only
IISApplicationName for this type of managed entity. The ASP.NET Application entity has the
additional properties shown in the following table.
ASP.NET Extension Enabled This Boolean property specifies if the process that generates
management packs will add specific extender monitors to the
management pack. The default setting is False. When set to True, the
following properties in this table specify the parameters for the
extender monitors.
Exception Error Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Critical (RED) for
exceptions generated by the entity within a time specified by the
Exception Sample Time Interval property. The default value is 50.
Exception Sample Time Interval This property defines the duration is seconds that the extender monitor
counts exceptions occurring in the entity, and matches this figure to
the Exception Error Threshold and Exception Warning Threshold
values. The default value is 30 seconds.
Exception Warning Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Warning (YELLOW)
for exceptions generated by the entity within a time specified by the
Exception Sample Time Interval property. The default value is 30.
Performance Error Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Critical (RED) for
degraded performance measures incurred by the entity within a time
specified by the Performance Sample Time Interval property. The
default value is 50.
Performance Sample Time This property defines the duration is seconds that the extender monitor
Interval counts degraded performance measures incurred by the entity, and
matches this figure to the Performance Error Threshold and
Performance Warning Threshold values. The default value is 30
seconds.
Performance Warning Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Warning (YELLOW)
for degraded performance measures incurred by the entity within a
time specified by the Performance Sample Time Interval property. The
default value is 30.
Response Time (ms) This property defines the maximum time within which the application
must respond to the request. The default value is 5000 (5 seconds).
These extended properties allow you to specify the behavior of the application in terms of the
intrinsic performance and internal errors that it generates. This is useful for monitoring and
reporting scenarios that ensure the application meets business requirements and Service Level
Agreements (SLAs).
ASP.NET Extension Enabled This Boolean property specifies if the process that generates
management packs will add specific extender monitors to the
management pack. The default setting is False. When set to True, the
following properties in this table specify the parameters for the
extender monitors.
Exception Error Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Critical (RED) for
exceptions generated by the entity within a time specified by the
Exception Sample Time Interval property. The default value is 50.
Exception Sample Time Interval This property defines the duration is seconds that the extender monitor
counts exceptions occurring in the entity, and matches this figure to
the Exception Error Threshold and Exception Warning Threshold
values. The default value is 30 seconds.
Exception Warning Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Warning (YELLOW)
for exceptions generated by the entity within a time specified by the
Exception Sample Time Interval property. The default value is 30.
Performance Error Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Critical (RED) for
degraded performance measures incurred by the entity within a time
specified by the Performance Sample Time Interval property. The
default value is 50.
Performance Sample Time This property defines the duration is seconds that the extender monitor
Interval counts degraded performance measures incurred by the entity, and
matches this figure to the Performance Error Threshold and
Performance Warning Threshold values. The default value is 30
seconds.
Performance Warning Threshold This property defines the threshold value at which point the extender
monitor will change the health state of the entity to Warning (YELLOW)
for degraded performance measures incurred by the entity within a
time specified by the Performance Sample Time Interval property. The
default value is 30.
Response Time (ms) This property defines the maximum time within which the service must
respond to the request. The default value is 5000 (5 seconds).
These extended properties allow you to specify the behavior of the service in terms of the
intrinsic performance and internal errors that it generates. This is useful for monitoring and
reporting scenarios that ensure the application meets business requirements and Service Level
Agreements (SLAs).
You can use External Managed Entities to model services that your application consumes, but
which are not part of your model. You can also use External Managed Entities to split a large
model into smaller models. In this case, the External Managed Entity simply represents the
section that does not appear in the current diagram. It is important to avoid repeating
Managed Entities in more than one section of a management model.
However, you can add only one management model to a solution, and so—in this release—the
management model equates to the solution. If you need to divide your application into
multiple management models, you must create multiple solutions.
Everything outside of the model is classed as external. It is likely that external entities such as
databases, Web sites, and Web services will already be instrumented and managed by other
tools, such as existing management packs (for example, the System Center Operations
Manager pack for SQL Server).
For instrumentation helpers to be generated for managed entities, you must define target
environments and associate them with those managed entities. Validation code checks to
ensure that you have at least one target environment defined for each managed entity, and it
displays a warning if not.
If you decide to manually create the instrumentation for your model by adding each abstract
and concrete implementation individually, you will perform the following steps:
1. Specify the abstract instrumentation (events and measures) for each entity.
2. Specify the implementations of these events and measures for each target environment
for each entity.
3. Map the event and measure implementations to each aspect in the health model.
This section describes how to create both forms of instrumentation (abstract and implemented)
individually. It consists of the following procedures:
This section describes how to create both forms of instrumentation and how to map one to the
other. It consists of the following procedures:
• Modeling abstract events
• Modeling abstract event parameters
• Modeling Enterprise Library logging events
• Modeling event log events
An abstract event has two properties that you can set, as shown in the following table.
Instrumentation Level This property specifies the level at which the entity will raise the event. The options
are Coarse (all operations, the default), Fine (diagnostic and debug operations
only), and Debug (debug operations only). For information about how this setting
affects the behavior of an application, see Appendix A.
Name This property contains the name of the abstract event definition.
In addition, you will define one or more parameters for each abstract event. For each
parameter, architects set the two properties shown in the following table.
Name This property contains the name of the abstract event definition.
Index This property is an integer value that specifies which placeholder in the
message template the value of the parameter will replace.
Type The data type of the parameter. The available types are DateTime, Double,
Int32, Int64, and String (the default).
Event parameter names should use title-style capitalization (the first letter must be
capitalized). Validation code checks for this and displays an error message if this is not the
case. Also, if multiple event parameters are used, they should be numbered in increasing order
from 1, with no duplicates and no missing integers. Again, validation code checks for this.
An abstract measure has two properties that you can set, as shown in the following table.
Abstract Measure Description
property
Instrumentation Level This property specifies the level at which the entity will update the counter. The
options are Coarse (all operations, the default), Fine (diagnostic and debug
operations only), and Debug (debug operations only). For information about how
setting this affects the behavior of an application, see Appendix A.
Name This property contains the name of the abstract measure definition.
The following table shows the properties of an Enterprise Library Event implementation.
Categories This property specifies a list of Categories that allows you to filter logging
events using a Category Filter in the Enterprise Library Logging Application
Block configuration. Separate each category name with a carriage return.
Event ID This property specifies the identifier for the event, and should be different from
any existing events.
Message This property specifies the text that Enterprise Library Logging Application
Block will include in the log message it generates.
Name This property contains the name of the Enterprise Library Logging Event
implementation. The name should start with a capital letter, and can contain
only alphanumeric characters (letters and numbers) and underscores.
Priority This property specifies the priority of the event using a positive or negative
numeric value. The priority allows you to filter logging events using a Priority
Filter in the Enterprise Library Logging Application Block configuration.
Severity This property specifies the severity of the error Select the Severity for the
event. You can select Critical, Error (these are equivalent to Windows Event
Log Error events), Information, Resume, Start, Stop, Suspend, Transfer,
Verbose (these are equivalent to Windows Event Log Information events), or
Warning (equivalent to Windows Event Log Warning events).
Title This property specifies the text that Enterprise Library Logging Application
Block will use as the title of the log message it generates.
The following table shows the properties of an Event Log Event implementation.
Category This property contains a value list that allows you to filter individual events.
Event ID This property specifies the identifier for the event, and should be different from any existing
events.
Log This property specifies the target Windows Event Log name such as Application, or the
name of a custom Event Log.
Name This property contains the name of the Event Log Event implementation. The name should
start with a capital letter, and can contain only alphanumeric characters (letters and
numbers) and underscores.
Severity This property specifies the severity of the error, which sets the type of icon shown in
Windows Event Log and is useful for filtering events in a monitoring tool. The options
available are Error, Warning, Information, SuccessAudit, and FailureAudit.
Source This property contains the name to pass to the event system as the source of the error or
event.
Message This property is a template containing placeholders where the event system will insert the
Template values from event parameters when raising the event. If the abstract event defines any
parameters, you must include placeholders for the value of each parameter. The
placeholders must start with %1 and run consecutively to the number of parameters defined
for the event.
The following table shows the properties of a Windows Eventing 6.0 Event implementation.
Channel This property specifies the channel to use to deliver the event. The channels
you can use are Operational, TraceClassic, System, Application, Security,
Analytic, and Debug. Generally, you should use the three channels that target
the Event Log. These are Application, System, and Security.
Level This property specifies the severity or importance of the event. The values you
can select are Error, Critical, Warning, Informational, and Verbose. The
usual approach is to select Error for events that cause a transition to a Red
(failed) state, Warning for events that cause a transition to a Yellow (degraded)
state, and Information for events that cause a transition to a Green (working
normally) state.
Message Template This property is a template containing placeholders where the event system
will insert the values from event parameters when raising the event. If the
abstract event defines any parameters, you must include placeholders for the
value of each parameter. The placeholders must start with %1 and run
consecutively to the number of parameters defined for the event.
Name This property contains the name of the Windows Eventing 6.0 Event
implementation. The name should start with a capital letter, and can contain
only alphanumeric characters (letters and numbers) and underscores.
Operation This property indicates the type of low-level operation the application was
executing when the event occurred. The values you can choose are Info,
Start, Stop, DC_Start, DC_Stop, Extension, Reply, Resume, Suspend, and
Send.
Provider This property contains the value passed to the event system to indicate the
provider, and provides an indication to administrators and operators of the
source of the event. The default value is a combination of the name of the
model and the name of the current managed entity.
Value This property is a unique identifier for the event, and should therefore be
different from any other events so that the monitoring system can filter on this
value.
The following table shows the single property of a Trace File Entry implementation.
Name This property contains the name of the WMI Event implementation.
Namespace This property contains the WMI namespace within which the event will reside.
Counter Category Name This optional property contains the category name of the Windows Performance
Counter that supplies the values for this measure.
Counter Object Name This property contains the name of the Windows Performance Counter that
supplies the values for this measure. It must start with a capital letter.
Counter Type This property specifies the type of counter to use in terms of the way that it
aggregates or measures the target object, such as AverageBase or
ElapsedTime.
Name This property contains the name of the Performance Counter implementation.
Now that you have modeled concrete events and performance counters, you can map them to
the trust levels associated with the managed entity.
Discovering Existing Instrumentation in an Application
The Team System Management Model Designer Power Tool can discover instrumentation in
assemblies that are part of an existing application solution. The process will discover most
common instances of Windows Event Log Events, WMI Events, Enterprise Library Logging
entries, and Windows Performance Counters. The assemblies must reside in one or more
projects located in the Solution Items folder of the Management Model solution. The TSMMD
will compile the projects automatically when required.
To discover existing instrumentation:
1. Open the TSMMD solution that contains the application project(s) from which you want
to discover existing instrumentation. If you have not yet created a TSMMD solution
containing the application project(s), do the following:
a. Create a new TSMMD solution by following the steps in the topic "Creating a
New Management Model."
b. In Solution Explorer, right-click the Solution Items folder, point to Add, and then
click Existing Project.
c. Navigate to the existing project, and then click Open to add it to the TSMMD
solution.
d. Repeat the steps b and c to add any more required projects.
2. Ensure that the TSMMD guidance package is enabled:
a. On the Tools menu, click Guidance Package Manager.
b. In the Guidance Package Manager dialog box, click the Enable/Disable
Packages button.
c. In the Enable and Disable Packages dialog box, select the TSMMD
Instrumentation and TSMMD Management Pack Generation check boxes.
d. In the Enable and Disable Packages dialog box, click OK, and then click Close in
the Guidance Package Manager dialog box.
3. Open an existing .tsmmd management model file into the designer, and then open
Management Model Explorer. If you cannot see Management Model Explorer, click the
View menu, point to Other Windows on the View menu, and then click Management
Model Explorer.
4. Right-click the top-level node in Management Model Explorer, and then click Discover
Instrumentation.
5. The Discover Instrumentation Wizard opens, showing a list of all assemblies in all
projects with a check box next to each one. The check boxes for assemblies that will be
searched are already set. You can change the settings to add or remove individual
assemblies from the discovery process as required.
6. Select the type of instrumentation you want to discover in the Instrumentation Type
option list under the list of assemblies. You can select Event Log Event, WMI Event,
Performance Counter Measure, or Enterprise Library Logging, depending on whether
the assemblies you select contain instances of these types of instrumentation. Figure 2
shows the Discover Instrumentation Wizard.
Figure 2
The Discover Instrumentation Wizard
7. Click the Discover button. The Discovery Results window opens in Visual Studio showing
a list of all the discovered instrumentation. Figure 2 shows the Discovery Results
window after discovering Event Log Events instrumentation.
After you discover the instrumentation within one or more projects, you must map that
instrumentation to the appropriate managed entities in the management model. The following
procedure describes this process.
To map discovered instrumentation to a model:
1. Perform the steps of the previous procedure to generate a list of discovered
instrumentation using the TSMMD Discover Instrumentation recipe.
2. Locate the rows containing the instrumentation you want to import. You can filter the
list of instrumentation rows using the drop-down lists at the top of some of the columns
to help locate rows, and then click a column heading to sort the rows based on the
values in that column.
3. If you are not sure of the actual implementation of an instrumentation item, such as an
event or performance counter, right-click that item in the list of rows, and then click
one of the Go To options. For example, with Enterprise Library Logging, you can go to
the source code line that makes the call into the Logging Application Block or go to the
line that writes the logging entry.
4. Some of the instrumentation rows may contain one or more values that the discovery
process could not resolve. It marks these values as <Not Resolved>. Some of the
unresolved values may be optional (such as the instance name of some performance
counters), while others are mandatory. You must provide these values as part of the
mapping process.
5. Select the rows in the Discovery Results window that contain the discovered
instrumentation items you want to import into your management model. You can press
SHIFT+CTRL while clicking the list to select multiple items.
6. Now you can specify the mapping between the selected instrumentation items in the
Discovery Results window and the management model entities. To map one or more
instrumentation items to a specific managed entity, right-click the selected item rows,
click Quick Map, and then click the name of the entity. If none of the rows contains
unresolved mandatory items, you will see the managed entity name appear in the
Mapped To column.
7. If any row contains an unresolved mandatory item, you will see a dialog box that asks if
you want to resolve mandatory properties. Click Yes to display a dialog box where you
can provide values to override those in all the selected rows in the discovered
instrumentation list. For example, Figure 3 shows the Event Details dialog box, where
you specify the mandatory Source, Severity, and Log Name properties for an Enterprise
Library Logging event.
Figure 3
The Event Details dialog box for specifying unresolved mandatory instrumentation
properties
8. Alternatively, you can force the TSMMD to display the mapping details window;
perhaps because you want to change some values for the properties of the discovered
instrumentation or there are unresolved mandatory properties for which you know you
must provide values. In these cases, right-click the selected rows in the Discovery
Results window, click Map to open the mapping details window and enter the relevant
values, and then click OK.
9. The TSMMD adds the instrumentation to the Discovered Instrumentation section of
the selected management entity in the Management Model Explorer. Open the
Discovered Instrumentation section in Management Model Explorer to see the result,
to rename events or measures, and to make any remaining edits you require to the
properties.
The following tables describe the properties that you can set or edit for discovered
instrumentation. The Events section can contain definitions of Event Log Events and WMI
Events. For an existing or imported Event Log Event, the architect defines or edits the properties
shown in the following table.
Description This property contains a description of the existing Event Log Event.
Event ID This property specifies the identifier for the event, and should be different from any
existing events.
IsDiscovered This Boolean property indicates if the Event Log Event was discovered by the
TSMMD or entered manually into the model.
Log This property specifies the target Windows Event Log name such as Application,
or the name of a custom Event Log.
Message This property contains the error message for this event.
Name This property contains the name of the existing Event Log Event.
Severity This property specifies the severity of the error, which sets the type of icon shown in
Windows Event Log and is useful for filtering events in a monitoring tool. The
options available are Error, Warning, Information, SuccessAudit, and
FailureAudit.
Source This property contains the name to pass to the event system as the source of the
error or event.
For an existing or imported WMI Event, the architect defines or edits the properties shown in
the following table.
Existing WMI Event Description
property
IsDiscovered This Boolean property indicates if the WMI Event was discovered by the TSMMD or
entered manually into the model.
Name This property contains the name of the existing WMI Event.
Namespace This property contains the WMI namespace within which the event will reside.
The Measures section can contain only definitions of Performance Counters. For an existing or
imported Performance Counter, the architect defines or edits the properties shown in the
following table.
Existing Description
Performance
Counter property
Counter Category This optional property contains the category name of the Windows Performance
Name Counter that supplies the values for this measure.
Counter Instance This property contains the instance name of the Windows Performance Counter that
Name supplies the values for this measure.
Counter Object This property contains the name of the Windows Performance Counter that supplies
Name the values for this measure. It must start with a capital letter.
Counter Object Type This property specifies the type of counter to use in terms of the way that it
aggregates or measures the target object, such as AverageBase or ElapsedTime.
IsDiscovered This Boolean property indicates if the Performance Counter was discovered by the
TSMMD or entered manually into the model.
Name This property contains the name of the existing Performance Counter.
Visible Name This property indicates the name of the counter as seen by the operating system.
You cannot mix events and measures in an aspect. All the states you define for an
aspect must be either events or measures (performance counters).
9. If you added a new Event Formula, use the Event property to specify the event that will
act as the indicator for this state transition. You can select an abstract event that you
previously defined in the Management Instrumentation section of this entity.
Alternatively, you can select an event discovered by the TSMMD or defined in the
Discovered Instrumentation section.
10. If you added a new Measure Formula, use the Measure Formula property to specify the
measure that will act as the indicator for this state transition. You can select an abstract
measure that you previously defined in the Management Instrumentation section of
this entity. Alternatively, you can select a performance counter discovered by the
TSMMD or defined in the Discovered Instrumentation section.
11. For a Measure Formula, you must also specify the conditions that trigger a state
transition. Select the Measure Formula node and specify values for the Upper Bound
and Lower Bound properties.
12. Repeat steps 5 through 11 to specify the yellow and red states for the aspect. In
addition to the mandatory green health state, you can specify either or both of the
yellow and red health states for a managed entity.
13. Repeat steps 2 through 12 to add any other aspects you require to the managed entity.
14. Repeat the complete procedure to add aspects to all other managed entities in the
model.
• Entry points to the model from other managed entities not represented in the model
should be shown as unmanaged entities.
• If multiple models are used to represent a system, each managed entity should only be
represented in one model; this managed entity can be represented as an external
managed entity in other models.
Figure 2
The Transport Consolidation Solution
Figure 3
The Shipping Solution
Summary
This chapter described how to use the TSMMD tool to create a management model, and it
provided guidelines for effective use of the TSMMD tool. It also showed how the TSMMD tool
was used to model the solutions in the Northern Electronics Scenario.
Section 3
Developing for Operations
This section focuses on the developer tasks necessary for creating well-instrumented
manageable applications. It describes how to create reusable instrumentation helpers from the
model defined in the Team System Management Model Designer Power Tool (TSMMD) and
discusses the instrumentation artifacts that are generated. It examines the developer tasks that
are necessary to create and manage event log, Windows Management Instrumentation (WMI),
Eventing 6.0, and performance counter instrumentation. The section also includes a chapter
about building install packages for instrumentation; however, this chapter is not complete in the
preliminary version of this guide.
This section should be of use primarily to application and instrumentation developers.
Chapter 8, "Creating Reusable Instrumentation Helpers"
Chapter 9, "Event Log Instrumentation"
Chapter 10, "WMI Instrumentation"
Chapter 11, "Windows Eventing 6.0 Instrumentation"
Chapter 12, "Performance Counters Instrumentation"
Chapter 13, "Building Install Packages"
Chapter 8
Creating Reusable Instrumentation
Helpers
After the architect defines the management model for the application, it is up to the developer
to write instrumentation code that reflects the management model. It is recommended that you
isolate instrumentation in an instrumentation helper. This chapter describes how to use the
guidance automation supplied with the Team System Management Model Designer Power Tool
(TSMMD) to automatically create the instrumentation helper, and it includes details about the
artifacts that are created. It then discusses how to consume the instrumentation from an
application.
The guidance automation included with the TSMMD tool simplifies the process of creating
instrumentation helper artifacts. However, you can use the information contained in this
chapter to manually create your own instrumentation helper classes.
Figure 1
Instrumentation Helper code generated by the TSMMD guidance automation
The next sections describe each of these types of projects in more detail.
API Projects
One API project is created for each managed entity. Each of these projects contains an abstract
class. The abstract class is a helper class that defines the following:
The guidance automation in the TSMMD names the API projects ManagedEntityNameAPI.
One implementation project is created for each managed entity’s trust level. Each of these
projects contains one class as the concrete implementation of the API class previously
explained. This concrete helper class extends the API class and defines the following:
Technology Projects
One technology project is created for each technology used. Exactly what each technology
project contains depends on the technology. This section describes the three technologies
currently represented in the TSMMD tool: event logs, Windows Management Instrumentation
(WMI) events, and performance counters.
For more information about how the event logs, WMI events, and performance counters are
used, see Chapters 9, 10, and 11 of this guide.
There is no technology project for Enterprise Library Logging events. The TSMMD generates
the code required to create logging entries within the API helper classes.
The guidance automation provided with this guide names the event log project
EventLogEventsInstaller.
The guidance automation provided with this guide names the Windows Eventing 6.0 project
WindowsEventing6EventsInstaller.
The TSMMD can create a Windows Eventing 6.0 View file that administrators can use to create
a custom view in Windows Event Log in Windows Vista and Windows Server 2008 to view
events generated by a TSMMD-based application.
WMI Project
A WMI project contains the following:
• It contains one class for each WMI event defined across managed entities.
• It contains one WmiEventsInstaller class.
The guidance automation provided with this guide names the WMI project
WmiEventsInstaller.
The guidance automation provided with this guide names the performance counter project
PerformanceCountersInstaller.
The results of the validation check appear in the Output window. Figure 2 shows a case where
helper methods are not called from the application.
Figure 5
Error list generated when Verify Instrumentation Coverage runs
You can use the validation check to provide a checklist of tasks when instrumenting your
application. The TSMMD can verify coverage for applications written in Visual Basic and C#. If
you create your application using any other language, the TSMMD will not be able to locate
calls to the instrumentation, and will report an error.
An additional limitation in this release is that the TSMMD cannot discover instrumentation
calls made from an ASP.NET Web application written in Visual Basic.
Summary
This chapter described how to generate instrumentation helper classes for an application, and
how to call the application code from the application. By starting with a management model
defined in TSMMD, you can automatically create the instrumentation code you require, and
then call the abstract events from your application code. The instrumentation helpers ensure
that the correct instrumentation technologies are used.
Chapter 9
Event Log Instrumentation
In Windows, an event is defined as any significant occurrence—whether in the operating system
or in an application—that requires users to be notified. Critical events are sent to the user in the
form of an immediate message on the screen. Other event notifications are written to one of
several event logs that record the information for future reference.
Event logging in Microsoft Windows provides a standard, centralized way for you to have your
applications record important software and hardware events. Operations staff can access events
written to the event logs using the Event Viewer and use them to diagnose application
problems.
This chapter focuses on the eventing mechanism used in versions of Windows earlier than
Windows Server 2003. Windows Vista uses a different eventing mechanism, Eventing 6.0, as
will future versions of Windows. For information about Eventing 6.0, see Chapter 11 of this
guide.
In addition to these logs, other programs, such as Active Directory, may create their own default
logs. You can also create your own custom logs for use with your own applications.
This chapter demonstrates how developers can create event log events in code and ensure that
they are written to the appropriate event log. Where appropriate, code examples reflect the
code used in the Northern Electronics Transport Consolidation Solution.
Not all of the event log instrumentation code described in this chapter is implemented in the
instrumentation helpers generated by the TSMMD tool. For example, no code is generated to
clear existing event logs or to delete event logs. However, it is still included in this chapter
because it may be required.
Installing Event Log Functionality
Before you can write event log entries, you must specify settings for the event log in the
Windows registry. These changes require administrative rights over the local computer, so they
should usually be performed when the application is installed instead of at run time. This section
describes how to use the EventLogInstaller class to install event log functionality for your
application.
Event Sources
One of the primary responsibilities of the EventLogInstaller class is to create an event source for
the application. Event sources are used to uniquely identify a source of events in the event log.
They are defined in the registry under
HKLM\System\CurrentControlSet\Services\EventLog\EventLogName.
Typically, an event source will be named after the application or managed entity that the event
arose from. Figure 1 shows an event in Event Viewer, with the source value for the event
highlighted.
Figure 1
Event log entry with the event source highlighted
By default, an event source for an application is defined in the Windows Application log.
However, it is possible to specify different logs, including custom event logs. For more
information, see "Using Custom Event Logs" later in this chapter.
The EventLogInstaller class can install event logs only on the local computer.
It is common for the source to be the name of the application or another identifying string. Any
attempt to create a duplicated Source value will result in an exception. However, a single event
log can be associated with multiple sources.
Using the EventLogInstaller class
To install an event log, you should create a project installer class that inherits from Installer and
set the RunInstallerAttribute for the class to true. Within your project, create an
EventLogInstaller instance for each event source and add the instance to your project installer
class.
When the install utility is called, it looks at the RunInstallerAttribute. If this attribute is set to
true, the utility installs all the items in the Installers collection associated with your project
installer. If RunInstallerAttribute is false, the utility ignores the project installer.
You modify properties of an EventLogInstaller instance either before or after adding the
instance to the Installers collection of your project installer. You must set the Source property if
your application will be writing to the event log.
If the specified source already exists when you set the Source property, EventLogInstaller
deletes the previous source and recreates it, assigning the source to the log you specify in the
Log property.
Typically, you would set the following additional properties:
• Log. This property is the event log that events will be written to. If it is not set, the
event source is registered to the Application log.
• UninstallAction. This property gets or sets a value that indicates whether the installer
tool (Installutil.exe) should remove the event log or leave it in its installed state at
uninstall time.
• CategoryResourceFile. This property identifies a category resource file, which is used to
write events with localized category strings. It should only be used if you are creating
events with categories.
• CategoryCount. This property sets (and gets) the number of categories in the category
resource file. It should only be used if you are creating events with categories.
• ParameterResourceFile. This property gets or sets the path of the resource file that
contains message parameter strings for the source. It is used when you want to
configure an event log source to write localized event messages with inserted
parameter strings.
• MessageResourceFile. This gets or sets the path of the resource file that contains
message formatting strings for the source. It is used when you want to configure an
event log source to write localized event messages.
These last four properties in the preceding list provide a lot of flexibility in creating events that
are useful for manageability purposes. By using message resource files, categories, and inserting
parameters, you can create messages with more useful information, and manageability
applications can perform automated processes based on particular parameters. For more
information about how these properties are used, see "Writing Events to an Event Log" later in
this chapter.
Typically, you should not call the methods of the EventLogInstaller class from within your code;
they are generally called only by the InstallUtil.exe installation utility. The utility automatically
calls the Install method during the installation process. It backs out failures, if necessary, by
calling the Rollback method for the object that generated the exception.
The following code example shows EventLogInstaller.
C#
using System;
using System.Management.Instrumentation;
using System.ComponentModel;
using System.Diagnostics;
using System.Configuration.Install;
using System.IO;
using System.Text;
namespace EventLogEvents.InstrumentationTechnology
{
[RunInstaller(true)]
public class EventLogEventsInstaller : Installer
{
// constructor
public EventLogEventsInstaller()
{
}
}
}
eventLog.WriteEvent(eventInstance, values);
}
Set values to a null reference if the event message does not contain formatting placeholders
for replacement strings.
You can specify binary data with an event when it is necessary to provide additional details for
the event. For example, use the data parameter to include information about a specific error.
The Event Viewer does not interpret the associated event data; it displays the data in a
combined hexadecimal and text format. You should use event-specific data sparingly; include it
only if you are sure it will be useful. You can also use event-specific data to store information the
application can process independently of the Event Viewer.
The specified source must be registered for an event log before using WriteEvent. The specified
source must be configured for writing localized entries to the log; the source must at minimum
have a message resource file defined.
If your application writes entries using both resource identifiers and string values, you must
register two separate sources. For example, configure one source with resource files, and then
use that source in the WriteEvent method to write entries using resource identifiers to the
event log. Then create a different source without resource files, and use that source in the
WriteEntry method to write strings directly to the event log using that source.
Reading events from event logs is not included in the functionality of the instrumentation
helper classes automatically generated by the TSMMD tool.
You should treat the data from an event log as you would any other input coming from outside
your system. Your application may need to validate the data in the event log before using it as
input. Another process, possibly a malicious one, may have accessed the event log and added
entries.
• Log. This property indicates the log with which you want to interact.
• MachineName. This property indicates the computer on which the log you resides.
• Source. This property indicates the source string that will be used to identify your
component when it writes entries to a log. In this case, you are reading from a log, so
you do not need to specify this property.
To read from an event log, you must specify the Log and MachineName properties, so that the
component is aware of which log to read from. The following code shows the Log and
MachineName properties specified.
C#
eventLog.Source = source;
eventLog.Log = logName;
eventLog.MachineName = machineName;
The Entries collection is read-only, so it cannot be used to write to the event log.
The following example shows how to retrieve all of the entries from a log.
C#
foreach (System.Diagnostics.EventLogEntry entry in EventLog1.Entries)
{
Console.WriteLine(entry.Message);
}
If you ask for the count of entries in a new custom log that has not yet been written to, the
system returns the count of the entries in the Application log on that server. To avoid this
problem, make sure that logs you are counting have been created and written to.
Clearing Event Logs
Event logs are set to a maximum size that determines how many entries each log can contain.
When an event log is full, it either stops recording entries or begins overwriting the oldest
entries with new entries, depending on the settings specified in the Windows Event Viewer. In
either case, you can clear the log of its existing entries to free the log and allow it to start
recording events again. You must have Administrator rights to the computer on which the log
resides in order to clear entries.
Clearing event logs is not included in the functionality of the instrumentation helper
automatically generated by the TSMMD tool.
By default, the Application log, System log, and Security log are set to a default maximum size of
4992 K. Custom logs are set to a default maximum of 512 K.
You can also use the Windows Event Viewer to free up space on a log that has become full.
You can set the log to overwrite existing events, you can write log entries to an external file, or
you can increase the maximum size of the log. However, you cannot remove only some of the
entries in a log; when you clear a log, you remove all of its contents. For more information, see
"How to: Launch Event Viewer" on MSDN or your Event Viewer documentation.
You use the Clear method to clear the contents of an event log. The following code is used to
clear the events from EventLog1.
C#
EventLog1.Clear();
Deleting event logs is not included in the functionality of the instrumentation helper
automatically generated by the TSMMD tool.
To delete an event log, you should use the Delete method and specify the name of the log you
want to delete. The Delete method is static, so you do not need to create an instance of the
EventLog component before you call the method—instead, you can call the method on the
EventLog class itself, as shown in the following code.
C#
System.Diagnostics.EventLog.Delete ("MyCustomLog");
Re-creating an event log can be a difficult process. It is good practice to not delete any of the
system-created event logs, such as the Application log. You can delete your custom logs and
re-create them as needed.
The following code shows an example of verifying a source and deleting a log if the source
exists. This code assumes that an Imports or Using statement exists for the System.Diagnostics
namespace.
C#
if (System.Diagnostics.EventLog.Exists("MyCustomLog"))
{
System.Diagnostics.EventLog.Delete("MyCustomLog");
}
Removing event sources is not included in the functionality of the instrumentation helper
automatically generated by the TSMMD tool.
To remove an event source, you should call the DeleteEventSource method, specifying the
source name to remove. The following code shows an event source named MyApp1 being
removed from the local computer.
C#
System.Diagnostics.EventLog.DeleteEventSource("MyApp1");
The instrumentation helper automatically generated by the TSMMD tool does not create event
handlers.
For more information about this syntax, see "Event Handlers in Visual Basic and Visual
C#" on MSDN at http://msdn2.microsoft.com/en-us/library/aa984105(VS.71).aspx.
2. Create the EntryWritten procedure and define the code you want to process the
entries.
3. Set the EnableRaisingEvents property to true.
• Installing the custom log (only necessary if the log does not already exist)
• Writing events to the custom log
namespace EventLogEvents.InstrumentationTechnology
{
[RunInstaller(true)]
public class EventLogEventsInstaller : Installer
{
// constructor
public EventLogEventsInstaller()
{
Summary
This chapter has demonstrated many of the developer tasks associated with event log
instrumentation. Many of the developer tasks you will need to perform are automated by the
TSMMD tool. However, it is still important for developers to understand the work performed by
the TSMMD tool when developing instrumented application.
Chapter 10
WMI Instrumentation
Windows Management Instrumentation (WMI) is the Microsoft implementation of Web-based
Enterprise Management (WBEM), which is an industry initiative developed to standardize the
technology for managing enterprise computing environments. WMI uses classes based on the
Common Information Model (CIM) industry standard to represent systems, processes, networks,
devices, and other enterprise components.
WMI supplies a pre-installed class schema that allows scripts or applications written in scripting
languages, Visual Basic, or C++ to monitor and configure applications, system or network
components, and hardware in an enterprise. For example, instances of the Win32_Process class
represent all the processes on a computer, and the Win32_LogicalDisk class can represent any
disk devices. For more information, see "Win32 Classes" in the Windows Management
Instrumentation documentation in the MSDN Library at http://msdn.microsoft.com/library.
The WMI architecture consists of the following tiers:
• Client software components. These perform operations using WMI, such as reading
management details, configuring systems, and subscribing to events.
• Object manager. This is a broker between providers and clients that provides some key
services, such as standard event publication and subscription, event filtering, query
engine, and other services.
• Provider software components. These capture and return live data to the client
applications, process method invocations from the clients, and link the client to the
infrastructure being managed.
Not all the WMI instrumentation code described in this chapter is implemented in the
instrumentation helpers generated by the Team System Management Model Designer Power
Tool (TSMMD). However, it is still included in this chapter as it may be required.
Administrators can use WMI Control to specify security constraints for a specific namespace. For
more information, see "Locating the WMI Control" in the Windows Management
Instrumentation documentation in the MSDN Library at http://msdn.microsoft.com/library.
The WMI namespaces, such as root\cimv2 and root\default, are not to be confused with the
.NET Framework namespaces System.Management and
System.Management.Instrumentation. The System.Management namespace contains the
WMI in .NET Framework classes to perform WMI operations. The
System.Management.Instrumentation namespace contains the classes for adding
instrumentation to your application.
Administrators and IT developers can use the classes in System.Management to write
applications that access WMI data in any .NET Framework language, such as C#, Visual Basic
.NET, or J#. These applications can do the following:
• Enumerate or retrieve a collection of instance property data, such as the FreeSpace
property of all the instances of Win32_LogicalDisk on all the computers of a network.
For more information, see "Win32_LogicalDisk" in the Windows Management
Instrumentation documentation in the MSDN Library at
http://msdn.microsoft.com/library.
• Query for selected instance data. WMI in .NET Framework uses the original WMI WQL
query language, a subset of SQL. For more information on WQL, see "WQL query
language" in the Windows Management Instrumentation documentation in the MSDN
Library at http://msdn.microsoft.com/library.
• Subscribe to events, defined as instances of event classes. An event occurs when an
instrumented application (provider) creates an instance of one of its event classes.
As a convenience for developers at design time, the schema is automatically published the first
time an application raises an event or publishes an instance. This avoids having to declare a
project installer and running the InstallUtil.exe tool during rapid prototyping of an application.
However, this registration will succeed only if the user invoking it is a member of the Local
Administrators group, so you should not rely on this as a mechanism for publishing the
schema.
The event (or instance) class schema resides in the assembly and is registered in the WMI
repository during installation.
To publish a schema to WMI, you must first define an installer for the project. You can use the
ManagementInstaller class provided in the System.Management.Instrumentation namespace.
For example, you would add the following code to your project installer's constructor.
C#
[RunInstaller(true)]
public class WmiEventsInstaller : DefaultManagementProjectInstaller
{
// constructor
public WmiEventsInstaller()
{
}
}
Typically, you should not call the methods of the ManagementInstaller class from within your
code; they are generally called only by the InstallUtil.exe installation utility. The utility
automatically calls the Install method during the installation process. It backs out failures, if
necessary, by calling the Rollback method for the object that generated the exception.
If the currently registered schema becomes corrupted for any reason, there might be cases in
which re-running InstallUtil.exe will not detect the need to re-register the original schema. In
this case, it is possible to force the installer to re-install the schema using the /f or /force switch.
It is not always necessary to recompile the client application when the schema changes. If the
event schema has been changed by adding properties and methods, and none of the earlier
defined properties or methods were removed, you can move the application's instrumentation
to a different WMI namespace, and not recompile the client application.
The instrumentation helpers automatically generated by the TSMMD tool do not perform
queries for WMI data.
Creating targeted queries can noticeably increase the speed with which data is returned, and
make it easier to work with the returned data. Targeted queries can also cut down on the
amount of data that is returned, an important consideration for scripts that run over the
network.
The following code example shows how a query can be invoked using the
ManagementObjectSearcher class. In this case, the SelectQuery class is used to specify a
request for environment variables under the System user name. The query returns results in a
collection.
C#
using System;
using System.Management;
Summary
This chapter has demonstrated many of the developer tasks associated with WMI
instrumentation. Most of the developer tasks you will need to perform are automated by the
TSMMD tool. However, it is still important for developers to understand the work performed by
the TSMMD tool when developing applications with WMI instrumentation.
Chapter 11
Windows Eventing 6.0
Instrumentation
In versions of the Windows operating system earlier than Windows Vista, you would use either
Event Tracing for Windows (ETW) or event logging to log events. Windows Vista introduces a
new Eventing model that unifies both the ETW and Windows Event Log API.
The new model uses an XML manifest to define the events that you want to publish. Events can
be published to a channel or an ETW session. You can publish the events to the following types
of channels:
• Admin
• Operational
• Analytic
• Debug
This chapter provides an introduction to the Windows Eventing 6.0 mechanism, and describes
the tasks that must be performed when developing Eventing 6.0 instrumentation. For more
information about Windows Event Log, see "Windows Event Log" on MSDN
(http://msdn.microsoft.com/en-us/library/aa385780(VS.85).aspx).
Administrators can execute scripts against the Event Log if Windows PowerShell is installed.
Event Subscriptions
An instance of the Event Viewer enables administrators to view events on a single local or
remote computer. However, some troubleshooting scenarios may involve examining filtered
events stored in logs on multiple computers. The new Windows Evening system includes the
ability to forward copies of events from multiple remote computers and collect them on a single
computer.
Administrators create event subscriptions to exactly specify which events will be collected, and
in which log they will be stored. Forwarded events can be viewed and manipulated as with any
other local events. Event subscriptions require configuring the Windows Remote Management
(WinRM) service and the Windows Event Collector (Wecsvc) service on participating forwarding
and collecting computers.
There are restrictions on channel naming. Channel names can contain spaces, but a channel
name cannot be longer than 255 characters, and cannot contain '>', '<', '&', '"', '|', '\', ':', '`', '?',
'*', or characters with codes less than 31. Additionally, the name must follow the general
constraints on file and registry key names.
Direct Channel
You cannot subscribe to a direct channel, but you can query a direct channel. A direct channel is
performance-oriented. Events are not processed in any way by the eventing system. This allows
the direct channel to support high volumes of events. Direct channels have the following types:
• Analytic. Analytic events are published in high volume. They describe program
operation and indicate problems that cannot be handled by user intervention.
• Debug. Debug events are used solely by developers to diagnose a problem for
debugging.
The following XML example shows how to use substitution parameters in event messages. A
printer name value can be substituted into the message (as it is the first parameter) during an
event.
XML
Print Spooler has failed to connect to %1 printer.
All further print jobs to this printer will fail.
Ping the printer to check if it is online.
The following XML example shows an instrumentation manifest with each of the preceding
elements defined.
XML
<!-- <?xml version="1.0" encoding="UTF-16"?> -->
<instrumentationManifest
xmlns="http://schemas.microsoft.com/win/2004/08/events">
<instrumentation xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:win="http://manifests.microsoft.com/win/2004/08/windows/events">
<events xmlns="http://schemas.microsoft.com/win/2004/08/events">
<!--Publisher Info -->
<provider name="Microsoft-Windows-EventLogSamplePublisher"
guid="{1db28f2e-8f80-4027-8c5a-a11f7f10f62d}"
symbol="MICROSOFT_SAMPLE_PUBLISHER"
resourceFileName="C:\temp\Publisher.exe"
messageFileName="C:\temp\Publisher.exe">
</templates>
</provider>
</events>
</instrumentation>
<localization>
<resources culture="en-US">
<stringTable>
<!--This is how event data can be used as part of Message String -->
<string id="Publisher.EventMessage"
value="Prop_UnicodeString=%1;%n
Prop_AnsiString=%2;%n
Prop_Int8=%3;%n
Prop_UInt8=%4;%n
Prop_Int16=%5;%n
Prop_UInt16=%6;%n
Prop_Int32=%7;%n
Prop_UInt32=%8;%n
Prop_Int64=%9;%n
Prop_UInt64=%10;%n
Prop_Float=%11;%n
Prop_Double=%12;%n
Prop_Boolean=%13;%n
Prop_GUID=%14;%n
Prop_Pointer=%15;%n
Prop_FILETIME=%16;%n
Prop_SYSTEMTIME=%17;%n
Prop_SID_Length=%18;%n
Prop_SID=%19;%n
Prop_Binary=%20"/>
</stringTable>
</resources>
</localization>
</instrumentationManifest>
You can also create event descriptions in multiple languages, by adding the localized strings to
the localization element of the instrumentation manifest.
An event publisher application uses these files along with the Windows Event Log API to publish
events to an event channel.
If MC.exe is used on the instrumentation manifest shown in the previous section, the following
Publisher.h file is generated.
C++
// publisher.h
#pragma once
__declspec(selectany) GUID MICROSOFT_SAMPLE_PUBLISHER = {0x1db28f2e, 0x8f80,
0x4027, {0x8c, 0x5a,0xa1,0x1f,0x7f,0x10,0xf6,0x2d}};
#define SAMPLE_PUBLISHER 0x10
__declspec(selectany) EVENT_DESCRIPTOR PROCESS_INFO_EVENT = {0x1, 0x0, 0x10,
0x4, 0x0, 0x0, 0x8000000000000000};
#define MSG_Publisher_EventMessage 0x00000000L
// end of publisher.h
The Publisher.h header file contains an EVENT_DESCRIPTOR variable definition that was defined
in the instrumentation manifest. This variable will be used in the EventWrite function call to
publish the event.
#include <windows.h>
#include <comdef.h>
#include <sddl.h>
#include <iostream>
#include <tchar.h>
#include <string>
#include <vector>
#include <evntprov.h> // ETW Publishing header
# pragma comment(lib, "advapi32.lib")
#include <winevt.h> // EventLog Header
# pragma comment(lib, "wevtapi.lib")
#include "publisher.h" // Header generated by mc.exe
// from manifest (publisher.man)
using namespace std;
// Register a Publisher
ULONG ulResult = EventRegister(
&MICROSOFT_SAMPLE_PUBLISHER, // provider guid
NULL, // callback; unused for now
NULL, // context
&hPublisher); // handle required to unregister
if ( ulResult != ERROR_SUCCESS)
{
wprintf(L"Publisher Registration Failed!. Error = 0x%x", ulResult);
return;
}
// EventData
std::vector<EVENT_DATA_DESCRIPTOR> EventDataDesc;
EVENT_DATA_DESCRIPTOR EvtData;
// inType="win:UnicodeString"
PWSTR pws = L"Sample Unicode string";
EventDataDescCreate(&EvtData, pws, ((ULONG)wcslen(pws)+1)*sizeof(WCHAR));
EventDataDesc.push_back( EvtData );
// inType="win:AnsiString"
CHAR * ps = "Sample ANSI string";
EventDataDescCreate(&EvtData, ps, ((ULONG)strlen(ps)+1)*sizeof(CHAR));
EventDataDesc.push_back( EvtData );
// inType="win:Int8"
INT8 i8 = 0x7F;
EventDataDescCreate(&EvtData, &i8, sizeof(i8));
EventDataDesc.push_back( EvtData );
// inType="win:UInt8"
UINT8 ui8 = 0xFF;
EventDataDescCreate(&EvtData, &ui8, sizeof(ui8));
EventDataDesc.push_back( EvtData );
// inType="win:Int16"
INT16 i16 = 0x7FFF;
EventDataDescCreate(&EvtData, &i16, sizeof(i16));
EventDataDesc.push_back( EvtData );
// inType="win:UInt16"
UINT16 ui16 = 0xFFFF;
EventDataDescCreate(&EvtData, &ui16, sizeof(ui16));
EventDataDesc.push_back( EvtData );
// inType="win:Int32"
INT32 i32 = 0x7FFFFFFF;
EventDataDescCreate(&EvtData, &i32, sizeof(i32));
EventDataDesc.push_back( EvtData );
// inType="win:UInt32"
UINT32 ui32 = 0xFFFFFFFF;
EventDataDescCreate(&EvtData, &ui32, sizeof(ui32));
EventDataDesc.push_back( EvtData );
// inType="win:Int64"
INT64 i64 = 0x7FFFFFFFFFFFFFFFi64;
EventDataDescCreate(&EvtData, &i64, sizeof(i64));
EventDataDesc.push_back( EvtData );
// inType="win:UInt64"
UINT64 ui64 = 0xFFFFFFFFFFFFFFFFui64;
EventDataDescCreate(&EvtData, &ui64, sizeof(ui64));
EventDataDesc.push_back( EvtData );
// inType="win:Float"
FLOAT f = -3.1415926e+23f;
EventDataDescCreate(&EvtData, &f, sizeof(f));
EventDataDesc.push_back( EvtData );
// inType="win:Double"
DOUBLE d = -2.7182818284590452353602874713527e-101;
EventDataDescCreate(&EvtData, &d, sizeof(d));
EventDataDesc.push_back( EvtData );
// inType="win:Boolean"
BOOL b = TRUE;
EventDataDescCreate(&EvtData, &b, sizeof(b));
EventDataDesc.push_back( EvtData );
// inType="win:GUID"
GUID guid;
EventDataDescCreate(&EvtData, &guid, sizeof(guid));
EventDataDesc.push_back( EvtData );
// inType="win:Pointer"
PVOID p = NULL;
EventDataDescCreate(&EvtData, &p, sizeof(p));
EventDataDesc.push_back( EvtData );
// inType="win:FILETIME"
SYSTEMTIME st;
FILETIME ft;
GetSystemTime(&st);
SystemTimeToFileTime(&st, &ft);
EventDataDescCreate(&EvtData, &ft, sizeof(ft));
EventDataDesc.push_back( EvtData );
// inType="win:SYSTEMTIME"
GetSystemTime(&st);
EventDataDescCreate(&EvtData, &st, sizeof(st));
EventDataDesc.push_back( EvtData );
// inType="win:SID"
PSID pSid = NULL;
ConvertStringSidToSidW(L"S-1-5-19", &pSid); // LocalService
// inType="win:Binary"
// Note: if you change the size of this array you'll have to change the
// "length" attribute in the manifest too.
BYTE ab[] = {0,1,2,3,4,5,4,3,2,1,0};
EventDataDescCreate(&EvtData, ab, sizeof(ab));
EventDataDesc.push_back( EvtData );
if ( EventEnabled(hPublisher, &PROCESS_INFO_EVENT) )
{
ulResult = EventWrite(hPublisher,
&PROCESS_INFO_EVENT,
(ULONG)EventDataDesc.size(),
&EventDataDesc[0]
);
if (ulResult != ERROR_SUCCESS)
{
//Get Extended Error Information
wprintf(L"EvtWrite Failed. Not able to fire event. Error = 0x%x",
ulResult);
LocalFree(pSid);
return;
}
}
else {
wprintf(L"Disabled");
}
wprintf(L"Success\n");
LocalFree(pSid);
// end of publisher.cpp
The Publisher.cpp file is shown in the preceding example (it includes the generated Publisher.h).
The Publisher.res file is the resource file generated from the Publisher.rc file.
This command is usually limited to members of the Administrators group and must be run with
elevated privileges. As a result, this step will typically occur when the application is installed.
Consuming Event Log Events
Eventing 6.0 includes a number of mechanisms for consuming event log events, such as
querying, reading, and subscribing. This section outlines those mechanisms.
if ( queryResult == NULL )
return GetLastError();
if ( queryResult == NULL )
return GetLastError();
EvtClose(batch[i]);
}
Subscribing to Events
Subscribing to events involves the receiving of notifications when selected events are raised. To
select events for a subscription, an event query is applied to events that are logged in one or
more channels. For information about creating a query, see "Event Selection" on MSDN at
http://msdn2.microsoft.com/en-us/library/aa385231.aspx. Because a data stream is logged,
subscriptions can get events that occur during periods when the subscriber is not connected. A
subscriber does not miss events that occur during down times (computer startup or shutdown).
Not only can log subscribers get the live events that pass the subscription filter, they can also get
the events that occurred before they were connected. At the time the subscription starts, any
events in the log that match the subscription start criteria are queued first and then live events
are added to the queue as they occur.
The subscription start criteria includes the following:
If a subscriber wants to ensure that it never misses a record and does not get repeat events,
then the subscriber indicates the last record that it received, which is marked by a bookmark.
The starting criteria for a subscription is specified in the Flags parameter of the EvtSubscribe
function by passing in a value from the EVT_SUBSCRIBE_FLAGS enumeration.
Push Subscriptions
In the push subscription model, events are delivered asynchronously to the callback function
that is provided to the EvtSubscribe function.
The following C++ example shows how to set up a push subscription by passing a callback
function into the Callback parameter of the EvtSubscribe function. The example subscribes to all
the Level 2 events in the Application channel.
C++
#include <windows.h>
#include <iostream>
if( !hSub )
{
wprintf(L"Couldn't Subscribe to Events!. Error = 0x%x", GetLastError());
return;
}
else
{
// Keep listening for events until 'q' or 'Q' is hit.
WCHAR ch = L'0';
do
{
ch = _getwch();
ch = towupper( ch );
Sleep(100);
} while( ch != 'Q' );
}
/**********************************************************************
Function: CallBack
Description: This function is called by EventLog to deliver RealTime Events.
Once the event is received it is rendered to the console.
Return: DWORD is returned. 0 if succeeded, otherwise a Win32 errorcode.
***********************************************************************/
DWORD WINAPI SubscriptionCallBack(
EVT_SUBSCRIBE_NOTIFY_ACTION Action,
PVOID Context,
EVT_HANDLE Event )
{
WCHAR *pBuff = NULL;
DWORD dwBuffSize = 0;
DWORD dwBuffUsed = 0;
DWORD dwRes = 0;
DWORD dwPropertyCount = 0;
if (!bRet)
{
dwRes = GetLastError();
if( dwRes == ERROR_INSUFFICIENT_BUFFER )
{
// Allocate the buffer size needed to for the XML event.
dwBuffSize = dwBuffUsed;
pBuff = new WCHAR[dwBuffSize/sizeof(WCHAR)];
if( !bRet )
{
wprintf(L"Couldn't Render Events!. Error = 0x%x", GetLastError());
delete[] pBuff;
return dwRes;
}
}
}
// Cleanup
delete[] pBuff;
return dwRes;
}
Pull Subscriptions
The pull subscription model is used to control the delivery of events by allowing the caller to
decide when to get an event from the queue.
To create a pull model subscription, the caller must provide an event to the SignalEvent
argument in the EvtSubscribe function. The event that is provided in the SignalEvent argument
is set when the first event arrives in the queue. The event is also set when an event arrives after
the client has attempted to read an empty queue.
A client can wait on an event until it is set. After the event is set, the client can read the
subscription results using the EvtNext function until the EvtNext function fails because of an
empty queue (in which case the client can start waiting again).
In the pull subscription model, the user obtains an enumeration object over the result set and
uses its methods to retrieve the event instances.
The following C++ example shows how to subscribe to events from event log channels using a
pull subscription. It registers a subscriber by providing an XPATH query, and then if any events
are received, the event XML is displayed on console using EvtRender.
C++
#include <windows.h>
#include <wchar.h>
#include <winevt.h> // EventLog Header
# pragma comment(lib, "wevtapi.lib")
if (!bRet)
{
dwRes = GetLastError();
if( dwRes == ERROR_INSUFFICIENT_BUFFER )
{
// Allocate the buffer size needed for the XML event.
dwBuffSize = dwBuffUsed;
pBuff = new WCHAR[dwBuffSize/sizeof(WCHAR)];
if( !bRet )
{
wprintf(L"Couldn't render events. Error = 0x%x",
GetLastError());
delete[] pBuff;
// Close the remaining event handles for this batch.
for(DWORD j=i; j < numRead; j++)
{
EvtClose(batch[j]);
}
break;
}
}
}
// Cleanup
delete[] pBuff;
EvtClose(batch[i]);
}
}
else
{
DWORD waitResult = 0;
result = GetLastError();
if( result == ERROR_NO_MORE_ITEMS )
{
// Wait for the subscription results
waitResult = WaitForSingleObject( signalEvent, INFINITE );
if( waitResult == WAIT_OBJECT_0 )
{
result = ERROR_SUCCESS;
}
else
{
result = GetLastError();
break;
}
}
}
}
}
CloseHandle(signalEvent);
Summary
This chapter provides information about the Windows Eventing 6.0 mechanism. It describes how
to view and handle events in Windows Vista and Windows Server 2008. It also contains technical
information about the way that Windows Eventing 6.0 works and how you can interact with the
mechanism in your own programs. You can define and implement Windows Eventing 6.0 events
in an application using the Team System Management Model Designer.
Chapter 12
Performance Counters
Instrumentation
Windows collects performance data about various system resources using performance
counters. Windows contains a pre-defined set of performance counters with which you can
interact; you can also create additional performance counters relevant to your application. This
chapter describes how to install performance counters, how to write to them, and how to read
existing performance counters.
Example code automatically generated by the Team System Management Model Designer
Power Tool (TSMMD) for the Northern Electronics scenario is used in this chapter to illustrate
its concepts.
Categories
Performance counters monitor the behavior of aspects of performance objects on a computer.
Performance objects include physical components, such as processors, disks, and memory,
system objects, such as processes and threads, and application objects, such as databases, and
Web services.
Counters that are related to the same performance object are grouped into categories that
indicate their common focus. When you create an instance of the PerformanceCounter object,
you first indicate the category (for example, the Memory category) for the object and then
choose a counter to interact with from within that category (for example Cached Bytes).
If you create new performance counter objects for your application, you cannot associate them
with existing categories. Instead, you must create a new category for the performance counter
object.
Instances
In some cases, categories are further subdivided into instances. If multiple instances are defined
for a category, each performance counter in the category also has those instances defined. For
example, the Process category contains instances named "Idle" and "System." Each counter
within the Process category specifies data in these two ways, showing information about either
idle processes or system processes. Figure 1 illustrates the structure of the category and
counters.
Figure 1
Performance counter categories and instances
Although instances are applied to the category, you create an instance by specifying an
instanceName on the PerformanceCounter constructor. If the instanceName already exists,
the new object will reference the existing category instance.
Types
There are many different types of performance counters. Each type is distinguished by how the
performance counter performs calculations. For example, there are counters that are used to
calculate average values over a period of time, and counters that measure the difference
between a current value and a previous value.
The following table lists the most commonly used counter types.
NumberOfItems32 Maintain a simple count of You might use this counter type to track
items, operations, and so on. the number of orders received as a 32-bit
number.
NumberOfItems64 Maintain a simple count with You might use this counter type to track
a higher capacity orders for a site that experiences very
high volume; stored as a 64-bit number.
RateOfCountsPerSecond32 Track the amount per second You might use this counter type to track
of an item or operation the orders received per second on a retail
site; stored as a 32-bit number.
RateOfCountsPerSecond64 Track the amount per second You might use this counter type to track
with a higher capacity the orders per second for a site that
experiences very high volume; stored as
a 64-bit number.
AverageTimer32 Calculate average time to You might use this counter type to
perform a process or to calculate the average time an order takes
process an item to be processed; stored as a 32-bit
number.
Some performance counter types rely on an accompanying base counter that is used in the
calculations. The following table lists the base counter types with their corresponding
performance counter types.
AverageBase AverageTimer32
AverageCount64
CounterMultiBase CounterMultiTimer
CounterMultiTimerInverse
CounterMultiTimer100Ns
CounterMultiTimer100NsInverse
RawBase RawFraction
SampleBase SampleFraction
For a detailed description of all the Performance Counter types available, see "Appendix B.
Performance Counter Types."
• CounterName. This property is used to get or set the name of the custom counter.
• CounterHelp. This property is used to get or set the description of the custom counter.
• CounterType. This property is used to get or set the type of the custom counter.
If the performance counter relies on a base counter, the performance counter creation data
must be immediately followed by the base counter creation data in code. If it is not, the two
counters will not be linked properly.
If you do not specify a counter type when creating the counter, it defaults to
NumberofItems32.
Now that the individual custom counters are created, they can be added to the
PerformanceCounterInstaller collection. The following code shows a performance counter
named ConfirmPickup in the category WSPickupService being added to the
PerformanceCounterInstaller collection.
C#
using System;
using System.Management.Instrumentation;
using System.ComponentModel;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Configuration.Install;
namespace PerformanceCounters.InstrumentationTechnology
{
[RunInstaller(true)]
public class PerformanceCountersClass : Installer
{
// constructor
public PerformanceCountersClass()
{
// Installer for performanceCounters with category name: WSPickupService
PerformanceCounterInstaller WSPickupServicePerfCountInstaller = new
PerformanceCounterInstaller();
WSPickupServicePerfCountInstaller.CategoryName = "WSPickupService";
// CounterCreation for event ConfirmPickup
CounterCreationData confirmPickupCounterCreation = new
CounterCreationData();
confirmPickupCounterCreation.CounterName = "ConfirmPickup";
confirmPickupCounterCreation.CounterHelp = "Counter Help"; //n/a now
confirmPickupCounterCreation.CounterType =
PerformanceCounterType.NumberOfItemsHEX32;
WSPickupServicePerfCountInstaller.Counters.Add(confirmPickupCounterCreation);
Installers.Add(WSPickupServicePerfCountInstaller);
}
}
}
Typically, you should not call the methods of the PerformanceCounterInstaller class from within
your code; they are generally called only by the InstallUtil.exe installation utility. The utility
automatically calls the Install method during the installation process. It backs out failures, if
necessary, by calling the Rollback method for the object that generated the exception.
Incrementing by a negative number decrements the counter by the absolute value of the
number. For example, incrementing with a value of 3 will increase the counter's raw value by
three. Incrementing with a value of –3 will decrease the counter's raw value by three.
You can only increment values on custom counters; by default, your interactions with system
counters via a PerformanceCounter component instance are restricted to read-only mode.
Before you can increment a custom counter, you must set the ReadOnly property on the
component instance with which you are accessing it to false.
There are security restrictions that affect your ability to use performance counters. For more
information, see "Introduction to Monitoring Performance Thresholds" on MSDN at
http://msdn.microsoft.com/library/en-
us/vbcon/html/vbconintroductiontomonitoringperformancethresholds.asp.
To write values to performance counters
1. Create a PerformanceCounter instance and configure it to interact with the desired
category and counter.
2. Write the value using one of the methods listed in the following table.
Increase the raw value by greater than one IncrementBy A positive integer
Decrease the raw value by greater than one IncrementBy A negative integer
Reset the raw value to any integer, instead of RawValue A positive or negative integer
incrementing it
The following code shows how to set values for a counter in various ways. This code assumes
that you are working on a Windows Form that contains a text box named txtValue and three
buttons: one that increments the raw value by the number entered in the text box, one that
decrements the raw value by one, and one that sets the raw value of the counter to the value
set in the text box.
C#
protected override void DoIncrementByPickupServiceConfirmPickup(int increment)
{
using (PerformanceCounter counter
= new PerformanceCounter("WSPickupService", "ConfirmPickup", false))
{
counter.IncrementBy(increment);
}
You may have to restart the Performance Monitor (Perfmon.exe) that is installed with
Windows when you create custom performance counters before you can see the custom
counter in that application.
There are security restrictions that affect your ability to use performance counters. For more
information, see "Introduction to Monitoring Performance Thresholds" on MSDN at
http://msdn.microsoft.com/library/en-
us/vbcon/html/vbconintroductiontomonitoringperformancethresholds.asp..
Figure 2
Performance counter values: raw, calculated, and sampled
The diagram in Figure 2 shows a representation of the data contained in a counter named
Orders Per Second. The raw values for this counter are individual data points that vary by
second, where the calculated average is represented by the line showing an increasing order
receipt over time. In this chart, the following data points have been taken:
• The user has used the NextValue method to retrieve the calculated value at three
different times, represented by NV1, NV2, and NV3. Because the next value is
constantly changing, a different value is retrieved each time without specifying any
additional parameters.
• The user has used the NextSample method to take two samples, indicated by S1 and S2.
Samples freeze a value in time, so the user can then compare the two sample values
and perform calculations on them.
Comparing Retrieval Methods
Retrieving a raw value with the RawValue property is very quick, because no calculations or
comparisons are performed. For example, if you are using a counter simply to count the number
of orders processed in a system, you can retrieve the counter's raw value.
Retrieving a calculated value with the NextValue method is often more useful than retrieving
the raw value, but this value may also give you an unrealistic view of the data because it can
reflect unusual fluctuations in the data at the moment when the value is calculated. For
example, if you have a counter that calculates the orders processed per second, an unusually
high or low amount of orders processed at one particular moment will result in an average that
is not realistic over time. This may provide a distorted view of the actual performance of your
system.
Samples provide the most realistic views of the data in your system by allowing you to retrieve,
retain, and compare various values over time. You would retrieve a sample, using the
NextSample method, if you needed to compare values in different counters or calculate a value
based on raw data. This may be slightly more resource-intensive, however, than a NextValue
call.
The NextSample method returns an object of type CounterSample. When you retrieve a
sample, you have access to properties on the CounterSample class, such as RawValue,
BaseValue, TimeStamp, and SystemFrequency. These properties let you get a very detailed look
at the data that makes up the sample data.
Summary
This chapter demonstrated how to create performance counters in custom performance
categories and how to connect to existing performance counters so that performance counter
data can be retrieved. For more detailed information about specific performance counter types,
see "Appendix C: Performance Counter Types."
Chapter 13
Building Install Packages
In this preliminary version of the guide, this chapter is still currently under development. It is
anticipated that a future release of the guide will include detailed information about building
install packages for instrumented applications.
Section 4
Managing Operations
This section focuses on the tasks performed by the operations team when managing
applications. It demonstrates an application in use and describes event log events, performance
counters, Windows Management Instrumentation (WMI) events, and event trace entries for the
application. It examines some of the important concepts involved in creating Management
Packs for Microsoft Operations Manager (MOM) 2005 and System Center Operations Manager
2007, and it describes the tasks involved in creating those Management Packs, including
importing Management Packs from the Management Model Designer (MMD) tool and the
TSMMD.
This section should be of use primarily to the operations team and for Management Pack
developers.
Chapter 14, "Deploying and Operating Manageable Applications"
Chapter 15, "Monitoring Applications"
Chapter 16, "Creating and Using Microsoft Operations Manager 2005 Management Packs"
Chapter 17, "Creating and Using System Center Operations Manager 2007 Management Packs"
Chapter 14
Deploying and Operating Manageable
Applications
After you define your application in the TSMMD, generate instrumentation for the application,
and call the instrumentation code from your application, you are ready to deploy the application
in a test, and ultimately a production environment. It is at this point that the instrumentation
you have created can be used by the test and operations teams.
This chapter uses the Transport Order application and the Transport Order Web service, two
parts of the Northern Electronics example, to illustrate how instrumentation can be created for
an application.
For more information about application instrumentation and how it is installed, see Chapters
8–12 of this guide.
The dates shown in this and other screenshots in this chapter are shown in dd/mm/yyyy
format.
However, in operation, the Transport Order application depends on configuration values such as
the URL of the Transport Order Web service. If this configuration information is incorrect,
perhaps because the operations team has been reorganizing the servers or the Transport Order
Web service has experienced a failure, posting the order results in an error. The application
detects that the order submission failed and displays a message on the left side of the page, as
shown in Figure 2.
Figure 2
The result when the application cannot contact the Transport Order Web service
Figure 5
The performance counters exposed by the Transport Order Web service
WMI
The instrumentation for the application includes Windows Management Instrumentation (WMI)
events. Figure 6 shows the WMI event generated when the database is stopped.
Figure 6
WMI event raised from the Transport Order application
Summary
This chapter showed how to install the application instrumentation and demonstrated the use
of application instrumentation for the Transport Order application, which forms part of the
Northern Electronics example.
Chapter 15
Monitoring Applications
Defining a management model for an application and using the management model to ensure
that your application is well instrumented and can report health state is an important
requirement for designing manageable applications. However, a manageable application is not
all that is required to ensure that an application is easy to manage by the operations team. You
must also have a solution for monitoring the application, such as Microsoft Operations Manager
2005 or System Center Operations Manager 2007.
This chapter examines how to monitor applications; as an example, it uses Operations Manager
2005 monitoring the Transport Order Web service (part of the Northern Electronics Scenario).
Management Packs
Microsoft Operations Manager, like most other monitoring applications and environments,
relies on a series of Management Packs that defines the rules, views, and alerts for a specific set
of monitoring processes. Each Management Pack contains a rule group, which contains the set
of rules applicable to the monitored application or system.
The usual approach is to create a Management Pack that matches the management model you
create for your application and install it along with the standard Management Packs that
monitor other features and systems. For example, the Management Pack generated by the
Management Model Designer (MMD) and used with the Northern Electronics application
contains a series of rules and alerts that map directly to the instrumentation within the
application. Separate Management Packs (provided with Operations Manager) monitor the
basic features of the remote computers as sent by the agents installed on the remote
computers.
This division of monitoring tasks into separate functional areas means that your Management
Pack should only include rules that directly relate to your application and that measure
features your application can influence. As an example, you should not include a rule to
monitor the amount of free memory in your application Management Pack, because this does
not directly relate to your application processes. Instead, you install and use the Management
Pack that contains the remote sever information and use this to monitor all the non-
application related features, such as processor loading and memory usage.
Figure 2
The MOM 2005 Administrator Console showing the TransportOrderApplication rule group
You can use a rule group to associate a set of rules with an individual server or a group of
servers. You can enable and disable a complete group, and display and work with just a single
group, without associating each rule that you add or modify directly with one or more servers.
This makes management and monitoring much easier, especially as the architecture and
deployment of an application change over time.
Figure 3 illustrates some of the properties of the TransportWebServiceFailed event rule. This
rule takes as its source the application event log on the monitored server (where the main
Transport Order application runs) and uses criteria to select the event log entries that
correspond to the "Unable to connect to the remote server" error. When it detects this event, it
generates a critical error alert within the monitoring system, using the source and description of
the original event log entry for the new alert.
Figure 3
The event rule for the TransportWebServiceFailed event
Figure 4 illustrates the TransportServiceResponseTime performance rule. In this case, the
source of the values for the rule (the provider) is the AverageDocumentProcessingTime
performance counter implemented by the instrumentation within the Transport Order Web
service. In this example, the monitoring system interrogates the counter every minute and
stores the values so it can present a graph of performance over time.
Figure 4
The Performance rule for the Transport Order Web service response time
With these rules in place, the Operator Console will display the overall state of the application
(based, of course, on the defined rules) by rolling up the individual values of each alert raised by
the event and performance rules. You can specify how these rules roll up (how they combine
when there is more than one monitored entity). In this example with only a single monitored
instance, the overall state directly reflects the worst case—this is described in greater detail in
Chapters 17 and 18 of this guide.
Figure 5 illustrates the state view in the Operations Manager 2005 Operator Console. You can
use the Group list on the toolbar at the top of the window to select the displayed scope; in this
case, it is set to show only the TransportApplication rule group. This rule group is associated
with only a single server named DELMONTE that (in this simple scenario demonstration)
implements both the main Web application and the Transport Order Web service. You can see
that there are no open (in other words, unresolved) alerts for the entire application running on
this server.
Figure 5
Monitoring the overall state of the Transport Order application running on a single remote server
One useful feature of monitoring and recording user errors (as opposed to application faults) is
that you gain an insight into the usability of the application and the kinds of problems that
users face when using it. In this particular example, you may decide that some mechanism that
prevents users submitting orders with no Expected Weight value (such as client-side
validation) would reduce the loading on the servers and make the application easier to use.
This is useful feedback for the architect and developer; it is automatically collected and reflects
actual usage instead of user perception and opinion.
As soon as the Transport Order Web service fails, the Operations Manager agent on the Web
server sends details of the event log entry to the central monitoring server, which automatically
changes the state to the worst of all currently unresolved alerts. The
TransportOrderServiceFailed rule maps event log entries to a critical error alert, so this is the
state displayed in the Operator Console (as shown in Figure 11).
Figure 11
The critical error state caused by failure to connect to the Transport Order Web service
Viewing details of the alert provides very little useful information—mainly the information that
is visible in Windows Event Log, plus a count of the number of times that Operations Manager
detected this event, the period within which they occurred, and the mapped rule name (see
Figure 12).
Figure 12
Alert details and summary for the alert raised by the TransportOrderServiceFailed rule
However, the TransportOrderServiceFailed rule specifies both product knowledge and company
knowledge that is directly useful for diagnosing and resolving the problem indicated by this
alert. For example, as you can see in Figure 13, the CAUSES and RESOLUTIONS sections identify
the configuration error that causes this failure to connect to the remote service and provides
the correct value (or points to where the operator could obtain the current value).
Figure 13
The product knowledge provided by the TransportOrderServiceFailed rule
Figure 14 illustrates the company knowledge for this rule. Operators could use this editable area
to store the correct current value for the Transport Order Web service or notes that indicate
how to deduce or discover its location if it changes on a regular basis.
Figure 14
The company knowledge provided by the TransportOrderServiceFailed rule
After correcting the incorrect configuration value, the operations staff can reattempt the
request to ensure it completes successfully. In fact, like in the example application, the target
Web service may expose a simple method that does no processing; instead, it simply indicates
successful connection to the service. In this case, the knowledge will include details of how to
execute this method to verify resolution of the connection problem.
In the example application, having resolved the connection problem, the operations staff might
now discover that the Transport Order Web service itself is failing. However, this is not directly
obvious because the only indication is that the controls on the Web page remain populated with
the original values, even after posting the order to the Transport Order Web service, as shown in
Figure 15. In normal circumstances, as demonstrated earlier, code in the application clears the
controls and allows the user to select another order.
Figure 15
Failure of the Transport Order Web service is not directly obvious in the Transport Order
application
However, the monitoring system shows the real situation, because the
TransportWebServiceFailed rule maps event log entries created by the Transport Order Web
service when it encounters an error to an alert in Operations Manager. Figure 16 illustrates the
new critical error alert at the top of the list and the event log message in the lower-right part of
the window. In this case, there is much more information in the error message, including the
useful fact that the code detected the DataBaseName key missing in the application
configuration file.
Figure 16
The alert created by the failure of the Transport Order Web service
The product knowledge stored within the rule indicates in more detail why this error occurred
and how to resolve it. The RESOLUTIONS section indicates that the DataBaseName key should
have the value "Transport", as shown in Figure 17.
Figure 17
The product knowledge provided by the TransportWebServiceFailed rule
Looking at the Web.config file for the Transport Order Web service, it becomes obvious why this
error occurred—someone has commented out the DataBaseName key, as shown in Figure 18.
Removing the enclosing comment markers "!--" and "--" and running the application again
results in successful execution of the Transport Order Web service.
Figure 18
The error arises because the DataBaseName key is commented out
From this simple example, you can see just how the combination of a suitable health model with
the appropriate instrumentation and application monitoring makes it much easier to detect,
diagnose, and resolve problems in complex and distributed applications. It solves the following
three issues encountered at the start of this chapter:
• The operations team no longer needs to rely on users to detect and report faults.
Sufficient and accurate information in the form of knowledge stored within the health
model and the monitoring rules make diagnosis and resolution of faults easier, less
costly, and less time-consuming.
• The operations team does not have to visit the computer to investigate, nor do they
have to depend on scant information they may extract from the event logs or
performance counters. The health model knowledge provides the detailed data
required to resolve the fault.
• The operations team can easily detect problems early, such as impending failure of a
connection to a remote service caused by a failing network connection or lack of disk
space on the server, without having to continuously monitor performance counters and
event logs or use them as the sole sources of information for diagnosing faults.
Summary
Defining a management model for an application is very important in ensuring that it can be
managed by the operations team. However, without an effective way of monitoring the
instrumentation that is generated by your application, the application may still prove difficult to
manage. This chapter explained the benefits of monitoring applications and explained some of
the most important components of monitoring software, using Operations Manager 2005 as an
example. The following chapters will examine the use of Operations Manager 2007 and
Operations Manager 2007 Management Packs in more detail.
Chapter 16
Creating and Using Microsoft
Operations Manager 2005
Management Packs
As discussed in Chapter 15, the basis for monitoring applications in Microsoft Operations
Manager 2005 are Management Packs that describe the rules, views, and alerts for a specific set
of monitoring processes. This chapter describes a number of scenarios of creating and using
Management Packs. It includes detailed information about the following:
The Transport Order application uses as a running example throughout this chapter. This
application forms part of the Shipping solution in the Northern Electronics worked example
used throughout this guide.
For more details about rule groups and rules, see "Creating and Configuring a Management
Pack in the Operations Manager 2005 Administrator Console".
As an example of the way that the MMD translates a management model into a Management
Pack, Figure 3 illustrates the General page of the Properties dialog box for the
TransportOrderUIErrors event rule. This rule uses Warning entries in Windows Application
Event Log to detect input errors by the user. In the management model, this causes a state
change to YELLOW for the user interface application, and the MMD appends this state to the
event name.
Figure 3
The General page of the Properties dialog box for a newly imported event rule
The MMD uses the values you enter when specifying the detector in the management model (in
this case, the TransportOrderUIErrors event) to generate the appropriate criteria for the event
rule. As shown in Figure 4, the MMD sets the Source of the event, and generates a regular
expression for the event ID to match that specified in the management model.
Figure 4
The Criteria page of the Properties dialog box for a newly imported event rule
The Alert page of the Properties dialog box shows that the MMD set the severity to Warning
(equivalent to the YELLOW health state), and specified that Operations Manager should create
an alert when this event occurs. It uses the values of the Source and Description from the event
to populate these fields of the alert, as shown in Figure 5.
Figure 5
The Alert page of the Properties dialog box for a newly imported event rule
The MMD uses the knowledge included in the management model to generate information for
the rules it generates. Figure 6 shows the Knowledge Base page for the new
TransportOrderUIErrors event rule, which contains sections labeled Summary, Diagnose,
Resolve, and Verify. These correspond directly to the steps defined in the management model
for monitoring and maintaining the application.
Figure 6
The Knowledge Base page of the Properties dialog box for a newly imported event rule
Figure 7 shows the Threshold page of the Properties dialog box for a newly imported
performance rule. The MMD sets the Threshold value and Match when the threshold meets
the following condition options for the rule based on the aspects you specified when creating
the management model. In this case, the rule will generate an alert and indicate a state change
when the value of the performance counter that measures the Transport Order Web service
response time exceeds 4999 (milliseconds).
Figure 7
The Threshold page of the Properties dialog box for a newly imported performance rule
After importing a Management Pack from the Management Model Designer, you may need to
edit it, add new rules, or change the behavior of some sections. The remaining procedures in
this chapter show these processes in detail.
You also need to generate computer groups in the Administrator Console that correspond to the
sets of computers that will run the application and deploy the rules to these computers. For
more information, see Create an Operations Manager 2005 Computer Group and Deploy the
Operations Manager Agent and Rules.
This section contains only enough information to create a Management Pack with rule groups
and rules in place. In many cases, you will need to perform additional editing to the
Management Pack. For more information about these other tasks, see "Editing an Operations
Manager 2005 Management Pack" later in this chapter.
Figure 8
A new rule group in Microsoft Operations Manager 2005
requiring scheduled maintenance. You can use the Global Settings section of the
Administrator Console to modify or define new resolution states. For more
information, see the later section "Viewing and Editing Global Settings"
◦ Specify the value for the Alert source. This is the text displayed as the Source in
the Operator Console when this alert occurs. You can enter custom text or select
from any of the fields in the event that causes this alert. The default is to use the
Source field value.
◦ Specify the value for the Description. This is the text displayed as the
Description in the Operator Console when this alert occurs. You can enter
custom text or select from any of the fields in the event that causes this alert.
The default is to use the Description field value.
◦ Specify details of the role of the server in the alert process using the Server role,
Instance, Component, and Customer Fields options.
Not all of the controls on the Alert page are available for every type of event
rule. Depending on the type of rule and the provider source, some of the
controls may be disabled.
11. If you are creating a Consolidate Similar Events rule, the next page is the Consolidate
page. Use the check boxes in the list of event fields to specify those that must have
identical values in order for Operations Manager to consolidate multiple events into
a single alert. You can also specify the period within which the multiple events must
occur as a number of seconds. Operations Manager will only raise one alert in the
Operator Console for any number of consolidated events within this period.
12. If you are creating a Filter Event rule, the next page is the Filter page. Select the
option required for the way you want Operations Manager to evaluate other rules
that match the source event. You can specify if it should add matching events to the
database or ignore them as it continues evaluating rules.
13. If you specified as the source of this rule a timed event (a regular scheduled
occurrence) or if you are creating an Alert on or Respond to Event or a Detect
Missing Event rule, the next page is the Alert Suppression page. Use the check boxes
in the list of alert fields to specify the repeated alerts that must have identical values
in order for Operations Manager to ignore (suppress) them.
14. Click Next. If you are not creating a Consolidate Similar Events or Collect Specific
Events rule, the next page is the Responses page. Here, you specify the actions
Operations Manager should perform when a matching event occurs. If you do not
specify any response, Operations Manager simply generates an alert (provided you
have specified this on the Alerts page), and changes the state displayed in the
Operator Console. You can specify the following types of response:
◦ Launch a Script. This opens a dialog box where you select an existing Operations
Manager script or create a new script. You also specify whether the script should
run on the remote computer (where the Operations Manager agent resides) or
on the Operations Manager management server, the script timeout, and any
parameters required by the script.
◦ Send an SNMP trap. This opens a dialog box where you choose whether to
generate the trap on the remote computer that raised the alert (SNMP must be
installed and enabled there) or on the Operations Manager management server.
◦ Send a Notification to a Notification Group. This opens a multi-tabbed dialog
box. On the Notification tab, select an existing notification group, modify an
existing notification group, or create a new notification group. On the Email
Format tab, you can accept the standard format for a notification e-mail or edit
this to create a custom format using placeholder variables. On the Email Format
tab, you can accept the standard format for a pager notification message or edit
this to create a custom format using placeholder variables. On the Command
Format tab, you can accept the standard command to run another application
or batch file, or you can edit this to create a custom format using placeholder
variables.
◦ Execute a command or batch file. This opens a dialog box where you can select
the Application and/or the Command Line, and the Initial directory. You also
specify whether the command or batch file should run on the remote computer
(where the Operations Manager agent resides) or on the Operations Manager
management server and the command timeout.
◦ Update state variable. This opens a dialog box where you can add state
variables that correspond to specific actions based on the values of fields in the
source event. Click the Add button in this dialog box and select an action (such
as incrementing the value of the variable or storing the last n occurrences), and
then select the field from the source event that provides the value for this
action. You also specify whether the operation is performed on the remote
computer (where the Operations Manager agent resides) or on the Operations
Manager management server.
◦ Transfer a file. This opens a dialog box where you specify a virtual directory for
the transferred file, whether to upload or download files, and the source and
destination file names. You can use values in the source event fields to select
the appropriate file, and use the standard Windows environment variables (such
as %WINDIR%) to specify the paths.
◦ Call a method on a managed code assembly. This opens a dialog box where you
specify the assembly name and type name for the managed code assembly you
want to execute. You must also enter the method name within that assembly
you want to call, specify whether it is a Static method or an Instance method,
and provide any parameters required for the method. You also specify whether
the assembly is located on the remote computer (where the Operations
Manager agent resides) or on the Operations Manager management server and
the response timeout.
15. Click Next to display the Knowledge Base page, click Edit, and enter any company-
specific knowledge appropriate for the rule that may be useful to operators and
administrators.
16. Click Next to display the Knowledge Authoring page. Click Summary in the Sections
list at the top of the page and enter summary information for this event. Repeat the
process by clicking Causes, Resolutions, and the other available categories and
entering the appropriate information. For each entry, you can specify the GUID of
another rule that shares this entry—this reduces the duplication that may occur if
many rules require the same knowledge.
17. Click Next to display the Advanced page, where you can mark this rule as deleted,
and specify the way it will be exported within a Management Pack. For a new rule,
leave the values set to the defaults.
18. Click Next to display the General page, where you provide a name for the new rule.
By default, the rule is enabled but you can disable it using the check box in this page.
You can also allow overrides of the rule by specifying the override name.
19. Finally, click Finish to create the new rule, which appears in the Event Rules section
of the left-side tree view. If you want to immediately force the new rule (or any
updated rules) through to the Operations Manager agents on remote computers,
instead of waiting for the scheduled update cycle, right-click the Management Packs
entry in the left-hand tree view, and then click Commit Configuration Change.
By default, Operations Manager pushes rule changes to all remote agents every five
minutes. To change this value, right-click Global Settings in the left-side tree view, click
Management Server Properties, click the Rule Change Polling tab, and then select the
required value.
• Use the management model you developed for your application to help you decide
what rules and performance counters you need to create.
• Create a top-level rule group that corresponds to the application, using a name for the
group that makes it easy to identify. You will later be able to use this top-level rule
group to expose the overall rolled-up state of the entire application. Then create child
rule groups to build a multi-level hierarchy that mirrors that of the management model,
adding the appropriate rules into each child group.
• Create only rules directly relevant to your application. Avoid duplicating rules that are
available in built-in Management Packs, such as measuring processor usage or free
memory.
• Use alerts to raise urgent issues to operations staff immediately, perhaps through e-
mail or pager.
• Take advantage of specific features of the monitoring application, such as timed events
that can provide heartbeat monitoring of remote services, or the ability to run scripts or
commands in response to alerts (for example, to query values or call a method that
provides diagnostic information, and then generates a suitable alert).
• Provide as much useful company-specific and application-specific knowledge as possible
for each rule group and rule to make problem diagnosis, resolution, and verification
easier for operators and administrators.
After making the required changes to the properties of the rule group, click Apply or OK in the
Properties dialog box. If you want to immediately force the changes through to the Operations
Manager agents on remote computers, instead of waiting for the scheduled update cycle, right-
click the Management Packs entry in the left-side tree view, and then click Commit
Configuration Change.
By default, Operations Manager pushes rule changes to all remote agents every five minutes. To
change this value, right-click Global Settings in the left-side tree view, click Management Server
Properties, click the Rule Change Polling tab, and then select the required value.
Editing Event Rules, Alert Rules, and Performance Rules
To edit rules in the Administrator Console, you must expand the list of rule groups under the
Rule Groups entry (which is under the Management Packs entry) to show all the currently
configured groups. Then expand the group that contains the rule you want to edit, and select
the appropriate rule type (Event Rules, Alert Rules, or Performance Rules). The right window
shows a list of the rules in the selected section. Right-click the rule you want to edit, and then
click Properties (or double-click the rule).
You can search for rules that meet specific criteria if you cannot remember where a rule
resides, or if you want to find rules that have specific properties. Right-click Rule Groups (or
any group under the Rule Groups entry) in the tree view, and then click Find Rules. This opens
the Rule Search Wizard; in it, specify the criteria, such as the location, name, type, or response.
When you click Finish, a new console window appears containing all the matching groups.
The Properties dialog box for an event rule (see Figure 11) contains ten tabs that allow you to
edit individual features and settings for this rule.
Figure 11
The Properties dialog box for an event rule
The Properties dialog box for an alert rule (see Figure 12) contains seven tabs that allow you to
edit individual features and settings for this rule.
Figure 12
The Properties dialog box for an alert rule
The Properties dialog box for a performance threshold rule (see Figure 13) contains eleven tabs
that allow you to edit individual features and settings for this rule. The Properties dialog box for
a performance measuring rule is similar, but it does not have the Criteria, Threshold, Alert, and
Alert Suppression tabs.
Figure 13
The Properties dialog box for a performance threshold rule
Many of the pages in the Properties dialog box are common across the three types of rules:
• General. On this page, you can edit the name of the rule. To disable this rule, or re-
enable it, clear or select the This rule is enabled check box. If you want to override this
rule with another rule defined elsewhere, select the Enable rule-disable overrides for
this rule check box, click the Set Criteria button, and then click the Add button in the
Set Override Criteria dialog box. Select a computer or a computer group, and then
specify Enable (0) or Disable (1) in the Edit Override Criteria dialog box. This allows you
to specify whether this rule will apply to the selected computer or group.
• Data Provider. (This page is not available for an alert rule.) On this page, you can select
the source of the event or the performance counter that acts as the data source for the
rule:
◦ For an event rule, you can select a Windows Event Log, a scheduled (timed)
event, a WMI event, or a custom script event. To specify a source not in the list,
click the New button, select an event type in the Select Provider Type dialog
box, click OK, and then specify the details for this source.
◦ For a performance rule, you can select any of the performance counters
exposed by Operations Manager and the Operations Manager agent installed on
monitored computers, or a script-generated or internally generated event. To
specify a source not in the list, click the New button, select a performance
counter type in the Select Provider Type dialog box, click OK, and then specify
the details for this source. To edit the properties, such as the counter location or
synchronization, click the Modify button and edit the values as required.
• Schedule. On this page, you can specify the periods when the rule is active. By default,
the rule is active at all times. To specify the active periods, select either Only process
data during the specified time or Process data except during the specified time, select
the start and end times, and then select the check boxes for the days of the week to
which this period applies.
• Criteria. (For an alert rule, this page is labeled Alert Criteria; this page is available for all
rule types except for a performance measuring rule.) On this page, you can specify how
an event rule or a performance threshold rule matches the source event or
performance counter, or how an alert rule matches the source alert:
◦ For an event rule, you can specify the Source, ID, Type, and Description
properties of the source event. Alternatively, click the Advanced button to
specify individual criteria for matching on any of the fields of the source event,
using a range of string matching, regular expressions, and numerical order
matching operations.
◦ For a performance threshold rule, you can specify the Instance, Domain, and
Computer properties of the source counter. Alternatively, click the Advanced
button to specify individual criteria for matching on any of the fields of the
source counter, using a range of string matching, regular expressions, and
numerical order matching operations.
◦ For an alert rule, you specify the Alert source and Severity properties of the
source alert generated by an event rule or a performance rule. If you only want
to match alerts from rules in a specific rule group, select the only match alerts
generated by rules in the following groups: check box, click the Browse button,
and then select the appropriate rule group. You can also click the Advanced
button to specify individual criteria for matching on any of the fields of the
source alert, using a range of string matching, regular expressions, and
numerical order matching operations.
• Threshold. (This page is available for only a performance threshold rule.) On this page,
you can specify the way that the rule samples the counter values, and the way that it
matches the sampled values. You can specify that the rule should calculate the
Threshold value using a single counter value, the average of a specified number of
values, or a specified change in the values. You can also specify that the threshold value
must be greater than or less than a value you provide, or if it should raise an alert for all
values. Finally, you can use this page to enable an override for this rule, and specify the
overriding rule.
• Alert. (This page is not available for an alert rule or a performance measuring rule.) On
this page, you can turn on and turn off generation of an alert when this event rule or
performance rule is activated and set the properties for the alert it generates. The
controls on this page allow you to do the following:
◦ Specify whether the event or counter will generate an alert by selecting the
Generate alert check box.
◦ Turn on alert severity condition checking by selecting the Enable state alert
properties check box.
◦ Specify the Alert severity (such as Critical Error, Warning, or Success) if you
always want to generate the same severity alert for this event or counter.
Alternatively, you can specify a series of If conditions and an Else condition so
that the severity depends on the parameter values for the event or counter. This
allows you to define, for example, that a particular event or counter will
generate a Service Unavailable condition for specific values of the parameters,
and a Success condition for other values. Click the Edit button to enter the
condition criteria.
◦ Specify the name of the person responsible for tracking and resolving the alert
as the Owner. This allows Operations Manager to direct the alert to the
appropriate administrators and operators listed in the Notification Groups
section of the Administrator Console.
◦ Specify the Resolution state for the alert. By default, this is New, but you can set
it to Assigned to in order to assign it to a group of people such as a helpdesk,
C
vendors, or mark it as requiring scheduled maintenance. You can use the Global
Settings section of the Administrator Console to modify or define new
resolution states.
◦ Specify the value for the Alert source. This is the text displayed as the Source in
the Operator Console when this alert occurs. You can enter custom text or select
from any of the fields in the event or counter that causes this alert. The default
is to use the Source field value.
◦ Specify the value for the Description. This is the text displayed as the
Description in the Operator Console when this alert occurs. You can enter
custom text, or select from any of the fields in the event or counter that causes
this alert. The default is to use the Description field value.
◦ Specify details of the role of the server in the alert process using the Server role,
Instance, Component, and Customer Fields options.
Not all the controls on the Alert page are available for every type of event rule
or performance rule. Depending on the type of rule and the provider source,
some of the controls may be disabled.
• Alert Suppression. (This page is not available for an alert rule or a performance
measuring rule.) On this page, you can compound multiple events or counter samples
into a single alert; this prevents the generation of duplicate alerts for the same source
condition. Turn on alert suppression using the check box at the top of this page, and
then select the check boxes in the list of alert fields below for those that must be
identical to suppress duplicated alerts.
• Responses. On this page, you can specify the actions that should occur when the event
rule, alert rule, or performance rule is activated. Click the Add button to show a list of
the available responses and click the one you require. Alternatively, click the Edit
button to edit an existing response selected in the list, or click the Remove button to
remove the selected response. The response actions available are the following:
◦ Launch a Script. This opens a dialog box where you select an existing Operations
Manager script or create a new script. You also specify if the script should run on
the remote computer (where the Operations Manager agent resides) or on the
Operations Manager management server, the script timeout, and any
parameters required by the script.
◦ Send an SNMP trap. This opens a dialog box where you specify where to
generate the trap: on the remote computer (where the Operations Manager
agent resides) or on the Operations Manager management server. You can use
SNMP responses to communicate alerts to other computers and systems that
run a wide variety of operating systems.
◦ Send a notification to a notification group. This opens a multi-tabbed dialog
box. On the Notification tab, select an existing notification group, modify an
existing notification group, or create a new notification group. On the Email
Format tab, you can accept the standard format for a notification e-mail or edit
this to create a custom format using placeholder variables. On the Email Format
tab, you can accept the standard format for a pager notification message or edit
this to create a custom format using placeholder variables. On the Command
Format tab, you can accept the standard command to run another application
or batch file or edit this to create a custom format using placeholder variables.
◦ Execute a command or batch file. This opens a dialog box where you can specify
the Application and/or the Command Line, and the Initial directory. You also
specify if the command or batch file should run on the remote computer (where
the Operations Manager agent resides) or on the Operations Manager
management server, and the command timeout.
◦ Update state variable. This opens a dialog box where you can add state
variables that correspond to specific actions based on the values of fields for the
counter. Click the Add button in this dialog box to select an action (such as
incrementing the value of the variable or storing the last n occurrences), and
then select the field from the source counter that provides the value for this
action. You also specify if the operation is performed on the remote computer
(where the Operations Manager agent resides) or on the Operations Manager
management server.
◦ Transfer a file. This opens a dialog box where you specify a virtual directory for
the transferred file, whether to upload or download files, and the source and
destination file names. You can use values in the source counter fields to select
the appropriate file, and use the standard Windows environment variables (such
as %WINDIR%) to specify the paths.
◦ Call a method on a managed code assembly. This opens a dialog box where you
specify the Assembly name and Type name for the managed code assembly you
want to execute. You must also enter the Method name within that assembly
you want to call, specify whether it is a Static or an Instance method, and
provide any Parameters required for the method. You also specify if the
assembly is located on the remote computer (where the Operations Manager
agent resides) or on the Operations Manager management server, and the
response timeout.
• Advanced. On this page, you can specify how Operations Manager 2005 will structure
rule groups when it exports them. If you want to mark the rule as deleted, select the
Mark this rule as deleted check box. Select from the three options that govern the
export of this rule group. The default option is Export as a vendor produced rule. If the
rule is disabled, then do not export. If you want to include the child group (in order to
import the rules into Operations Manager 2000), select the Export as a vendor
produced rule. Export rule if it is enabled or disabled check box. If you want to export
the rule as a modified rule, which Operations Manager will not overwrite when
importing Management Packs, select the Export as a customer created/modified rule
check box.
• Knowledge Base. On this page, you can view the Knowledge Base content and the
Company Knowledge Base content for this rule. Click the Edit button if you want to edit
the Company Knowledge Base content. You cannot edit the overall Knowledge Base
content in this page—you must use the Knowledge Authoring page for this.
• Knowledge Authoring. On this page, you can edit the overall Knowledge Base content.
It displays a list of knowledge Sections (such as Summary, Causes, and Resolutions).
Select a section in this list and then edit the knowledge content for that section in the
text box in this page. You can also specify that each knowledge section is shared with
other rules by clicking the Share new button and entering the sharing rule ID. This
reduces duplication of content and makes updates easier. When complete, click the
Generate Knowledge button to create the formatted content. To see the result, go back
to the Knowledge Base page.
After making the required changes to the properties of the rule, click Apply or OK in the
Properties dialog box. If you want to immediately force the changes through to the Operations
Manager agents on remote computers, instead of waiting for the scheduled update cycle, right-
click the Management Packs entry in the left-side tree view, and then click Commit
Configuration Change.
By default, Operations Manager pushes rule changes to all remote agents every five minutes.
To change this value, right-click Global Settings in the left-side tree view, click Management
Server Properties, click the Rule Change Polling tab, and then select the required value.
• General. On this page, you can edit the name and description for the group.
• Included Subgroups. (This page displays a list of the subgroups within this group.) On
this page, you can add and remove subgroups. Click the Add button to open the Add
Subgroup dialog box, select an existing computer group, and then click OK to move it
from its current position in the computer groups hierarchy to become a child of the
current group. To remove a subgroup from the current computer group, select it in the
list on the Included Subgroups page, and then click the Remove button.
• Included Computers. (This page displays a list of the computers within the current
computer group.) On this page, you can add a new computer to the group. Click the
Add button to open the Add Computer dialog box, which shows a list of computers that
have an Operations Manager agent installed. Select the check box next to computers in
the list that you want to add to this group, and then click OK. To add a computer that is
not listed, click New in the Add Computer dialog box, enter the domain name and
computer name, and then click OK. To remove a computer from the current computer
group, select it in the list on the Included Computers page, and then click the Remove
button.
• Excluded Computers. (This page displays a list of the computers that are always
excluded from the current computer group, even if they are listed on the Included
Computers page.) On this page, you can exclude a computer. Click the Add button to
open the same Add Computer dialog box as used on the Included Computers page
(described earlier). Alternatively, click the Search button to open the Computer dialog
box where you can specify computers to exclude using wildcard strings or regular
expressions to match on the domain name or the computer name. Select a computer
on the Excluded Computers page, and then click Edit to edit an existing computer or
Remove to remove the selected computer.
• Search for Computers. On this page, you can specify criteria that select computers to
add to this computer group. You can search for different types of computer (such as
servers, clients, and domain controllers), and use wildcard strings or regular expressions
to match on the domain name or the computer name.
• Formula. On this page, you can specify a formula that selects computers based on the
criteria entered on the Search for Computers page. You can generate the formula using
a range of attributes for the target computers, such as the IP address, subnet, operating
system, fully qualified domain name, and more. You can also use a range of operators
and string matching functions, and select from lists of other computer groups.
• State Rollup Policy. On this page, you can specify how the overall state for a computer
group will reflect the states of individual members of the group. The members can be
the subgroups included within this group and/or the individual computers in the group.
The three options on this page (see Figure 15) are the following:
◦ The worst state of any member computer or subgroup. If you select this option,
Operations Manager will set the State value displayed in the Operator Console
to that specified for the Severity for the worst of the current unresolved alerts
for the members of this group. The alert Severity states range from Success
(best) to Server Unavailable (worst). You can see a list of these states on the
Alert page of the Properties dialog box for any of your existing event rules,
performance rules, or alert rules.
◦ The worst state from the specified percentage of best states in the computer
group. If you select this option, you must specify a percentage that defines the
proportion of the group will act as the state indicator for the group. Operations
Manager will select a set of members from the group that consists of the
computers with the best health state up to the percentage you specified of the
total group membership. In other words, if there are 10 computers and you
specify 60% , Operations Manager will select the six members of the group that
C C
currently have the least severe state. It then uses the worst (the most severe)
state of the subset it selects as the overall (rolled-up) state for the group, and
displays this in the Operator Console as the State value for this computer group.
◦ The best state of any member or subgroup. If you select this option, Operations
Manager will set the State value displayed in the Operator Console to that
specified for the Severity for the best of the current unresolved alerts for the
members of this group. It is unlikely that you will use this option very often,
because it effectively hides the state of most of the members of the group as
long as one member is performing correctly.
Figure 15
The State Rollup Policy page of the Properties dialog box for a computer group
• Console Scopes. (This page displays a list of the scopes where the current computer
group is used. By default, every group is a member of every scope.) On this page,
administrators can specify custom sets of computer groups for each scope (Operations
Manager Users, Operations Manager Authors, and Operations Manager Administrators)
using the Console Scopes options within the main Administration section of the
Administrator Console.
• Parents. This page displays a list of the parent computer groups for this group, if it is a
child (nested) group.C
• Rules. On this page, you can enable and disable the rules in this computer group and its
child subgroups. Select the check box at the top of the page to disable all the rules in
this group and all its child subgroups (if any). The Rules page also shows a list of any
rule groups associated with parent computer groups that this computer group inherits.
At the bottom of the page, a list shows the rule groups already associated with this
computer group, which its child computer groups will inherit. To add a rule group to this
list, click the Add button to open the Select Rule Group dialog box, select the required
rule group, and then click OK. To remove a rule group from the list, select it, and then
click the Remove button.
After making the required changes to the properties of the computer group, click Apply or OK in
the Properties dialog box. If you want to immediately force the changes through to the
Operations Manager agents on remote computers, instead of waiting for the scheduled update
cycle, right-click the Management Packs entry in the left-side tree view, and then click Commit
Configuration Change.
By default, Operations Manager pushes rule changes to all remote agents every five minutes.
To change this value, right-click Global Settings in the left-side tree view, click Management
Server Properties, click the Rule Change Polling tab, and then select the required value.
For details about how to create a computer group, see the later section, "Create an Operations
Manager 2005 Computer Group and Deploy the Operations Manager Agent and Rules."
Figure 17
The main Properties dialog box for the Global Settings in Operations Manager 2005
The main properties dialog box contains eleven tabbed pages:
• Notification Command Format. On this page, you can specify a custom application that
you want to execute in response to an operator alert. You can specify the command line
for the application and include placeholders that Operations Manager replaces with
values when it executes the command. These placeholders include the Operator ID you
specify in the Properties dialog box for each operator.
• Knowledge Base Template. On this page, you can edit the HTML template Operations
Manager uses to generate the multi-section knowledge base content for items in a
Management Pack. The template contains placeholders of the form <!section-name>
that indicate where Operations Manager will insert the separate sections of knowledge
content text.
• Database Grooming. On this page, you can specify how Operations Manager will
automatically mark alerts as resolved after a certain period, removing them from the
Operator Console display.
• Operational Data Reports. On this page, you can automatically send to Microsoft
reports about the way you use Operations Manager 2005; this provides valuable
feedback to the development team about typical usage patterns.
• Custom Alert Fields. On this page, you can change the names of the five custom fields
displayed in alerts. You can use these if you want to add application-specific or
company-specific information to every alert.
• Alert Resolution States. On this page, you can modify the existing alert resolution
states or add new ones. The default states include Acknowledged, Assigned to xxx, and
Resolved. You can also specify the service level interval within which each state should
be resolved, the shortcut key assigned to this state, and whether users can set the state
within the Operator Console and the Web Console.
• Email Server. On this page, you can configure the settings used to send e-mail alerts
through your SMTP mail server.
• Licenses. On this page, you can manage the number of management licenses for
remote managed clients.
• Web Addresses. On this page, you can specify the URL of the Operations Manager Web
Console and the URL used for online product knowledge (the default is the Microsoft
Support Web site). You can also specify custom Web addresses for your file server (for
transferring files to clients), and for your company knowledge base.
• Communications. On this page, you can specify the port Operations Manager uses for
encrypted communication with remote managed computers.
• Security. On this page, you can specify features of the authentication and the response
execution for the Operations Manager server and remote managed computers.
After making the required changes to the global settings, click Apply or OK in the Properties
dialog box.
In the Management Servers Properties dialog box and the Agents Properties dialog box, you
can fine-tune the behavior of the Operations Manager server and the Operations Manager
agents installed on the Operations Manager server and on remote computers. You will usually
not need to change these settings, and this chapter does not describe them in detail. For more
information, examine the Operations Manager 2005 Help file or click the Help button in the
relevant Properties dialog box.
After you finish editing your Management Pack(s) and settings, you can turn off
Authoring mode. Right-click Rule Groups in the left-side tree view, click Disable
Authoring mode, and then click Yes in the confirmation dialog box.
Figure 20
Specifying the State Rollup Policy for a computer group
11. Click Next to open the Confirmation page, which provides a summary of the options
you have set in the wizard. To change any settings, click the Back button to return to
the relevant page.
12. If you are happy with the settings shown, click Next, and then click Finish. The new
computer group appears in the Administrator Console tree view. If you specified any
existing groups as subgroups of the new group, they move to appear under the new
group in the tree view.
To associate a rule group with a computer group and deploy the rules
1. In the left-side tree view of the Administrator Console, expand the list to show the
Rule Groups entry (which is under the Management Packs entry). If the tree-view
pane is not visible, click Customize on the View menu. In the Customize View dialog
box, select the Console tree check box, and then click OK.
2. Expand the list of rule groups, and then right-click the group of rules you want to
deploy to a specific set of computers. On the shortcut menu, click Associate with
Computer Group to open the Properties dialog box for this rule group with the
Computer Groups page selected. Alternatively, you can right-click the rule group,
select Properties, and then select the Computer Groups tab.
3. On the Computer Groups page, click Add to open the Select Item dialog box. Select
the computer group to which you want to deploy the rules in this rule group, and
then click OK. Repeat the process if you want to deploy the rules to more than one
computer group.
4. Back in the Properties dialog box for the rule group, click OK.
5. If you want to immediately force the rules in this rule group through to the
Operations Manager agents on remote computers, instead of waiting for the
scheduled update cycle, right-click Management Packs in the left-side tree view, and
then click Commit Configuration Change.
By default, Operations Manager pushes rule changes to all remote agents every five
minutes. To change this value, right-click Global Settings in the left-side tree view, click
Management Server Properties, click the Rule Change Polling tab, and then click
select the required value.
• Create a top-level computer group that includes all the computers that will execute the
application, and which you want to monitor. If the application has distinct separate
sections, such as separate Web services running on different computers or separate
groups of servers that may be in use at different times, create separate child rule
groups for each set of computers within a parent (top-level) rule group.
• Use the state rollup options for the top-level computer group to specify the overall
state for all the computers involved in the application, so the console displays the
appropriate state indication to operators. Use the appropriate severity settings for each
rule to represent the three basic states RED ("failed" or "unavailable"), YELLOW
("degraded"), and GREEN ("working normally" or "available").
• Combine the state of each subgroup using the same approach as for the top-level
group, so operators can drill down, monitor, and see the state of individual components
or sections of the application. This makes diagnosis of problems easier.
additional Management Packs allow you to detect faults in the underlying infrastructure, such as
performance degradation or core operating system service failures, and monitor services such
as Microsoft Exchange and SQL Server.
This section includes procedures for using both the Operator Console and the Web Console. The
Operator Console allows you to view the state of an application and drill down to see details of
the events, alerts, performance counters, and computers that run the application. The Web
Console has less functionality, but it can still be of great use to operators, particularly when the
Operator Console is not installed.
To view state information, alerts, events, and computers in Operations Manager 2005
using the Operator Console
1. Open the Operations Manager 2005 Operator Console, and use the Group: drop-
down list at the top of the window to select the computer group for which you want
to view information. Click the State link in Navigation pane at the lower-right section
of the window to show the overall health state for the application you selected in
the Group: drop-down list (See Figure 21).
Figure 21
The State view of an application in the Operations Manager 2005 Operator Console
If you cannot see all of the panes shown in Figure 21, on the View menu, select the
pane you want to open (Navigation Pane or Detail Pane).
2. Figure 21 indicates that the overall state for this computer group (all the computers
running this application) is Critical Error. The lower section of the window shows the
computers in this computer group (in this case, there is only one), and indicates the
total number of open or unresolved alerts, and the total number of events.
3. Click the Alerts link in the navigation pane to see all the open alerts for the computer
group. The lower window now shows details of the selected alert, including the
properties (field values) of the event or counter threshold that caused the alert (see
Figure 22). The Alert Details section in the lower window also displays the product-
specific and company-specific knowledge for the rule that detected the problem.
This knowledge assists in diagnosing, resolving, and verifying resolution of the
problem that originally caused the alert (see Figure 23).
Figure 22
The list of all alerts for the computer group and details about the selected alert
Figure 23
Viewing the product knowledge for the alert
4. To view only the alerts for a specific computer within the computer group, go back
to the State view and double-click an alert in the State upper window for the
computer you want to view, or double-click the computer in the State Details lower
window. You see the same view as in Figure 23, but it contains a list of only the
alerts for the selected computer.
5. Click the Events link in the navigation pane to see all the events from the Windows
Event Log for computers in the computer group. The list shows the domain and
computer names, and the lower window contains the values of the event fields for
the event selected. You can view a list of alerts raised by this event on the Alerts
tabbed page in the lower window, and the parameters of the event on the
Parameters tabbed page (see Figure 24).
Figure 24
The list of events for all computers within the selected computer group
Right-click the upper window in any view, and then click Personalize View to select the
columns displayed in the list or to change the order of the columns.
6. Click the Performance link in the navigation pane to see a list of all the computers
within the currently selected Group: scope. Select a computer in the list, and then
click the Select Counters button to display a list of all the performance counters for
that computer. This includes the standard operating system counters implemented
by the built-in Management Packs in Operations Manager 2005, such as processor
usage and elapsed time (see Figure 25).
Figure 25
Selecting a performance counter to view
7. Select the check boxes next to the counters you want to view results for, and then
click the Draw Graph button. In Figure 26, you can see the results for the
WSTransport Service counter implemented in an example application.
Figure 26
A chart showing performance counter data samples collected by Operations Manager
8. Click the Computers and Groups link in the navigation pane to see a list of all the
subgroups within the current group (the group selected in the Group: drop-down list
at the top of the window) and the state of each one. Double-click a subgroup to
navigate to that group and view the state and details of the group.
9. Click the Diagram link in the navigation pane to see a schematic diagram of the
current computer group, its subgroups, and the computers within each group. It also
displays the current health state of each group and computer (see Figure 27). This
makes it easy for operators to grasp visually the overall state of the application and
the individual components.
Figure 27
A computer group in Diagram view showing the state of each computer
10. Double-click a computer (not a computer group) in the right window in Diagram
view to switch to Alerts view for that computer.
11. You can use the My Views link in the navigation pane to create custom views of the
monitoring information. You can also define custom Public Views for viewing in the
Operator Console using the Console Scopes section within the Administration
section of the Administrator Console. For more details, see the Operations Manager
Help file.
Operations Manager 2005 also installs a Web-based Operator Console. While this has fewer
features, it can be used for remote monitoring and problem diagnosis from locations outside
your own network.
To use the Web Console for remote monitoring and problem diagnosis
1. To open the Web Console from the Administrator Console, select the Microsoft
Operations Manager entry directly below the console root in the left-side tree view.
On the right-side Home page, click the Start Web Console link in the Operations
section of the page.
2. To discover the URL of the Web Console, expand the Administration section of the
left-side tree view in the Administrator Console, and then select the Global Settings
entry. Double-click Web Addresses in the right window to see the Web Console
Address. This is, by default, a non-standard port on the local computer, such as
http://machine-name:1272. Enter this URL into your Web browser.
The Web Console provides three views of the monitoring information: Alerts, Computers, and
Events. These are very similar to the views you see in the Operator Console. For example, Figure
28 shows the Alerts view in the Web Console. You can select an alert and view the properties,
events, knowledge, and history just as you can in the Operator Console.
Figure 28
The Alerts view in the Operations Manager 2005 Web Console showing the product knowledge
Figure 29
Viewing the Management Group Agents report for an Operations Manager management group
The other two links on the Microsoft Operations Manager Reporting page open submenus
containing a range of other pre-defined reports. The Operational Data Reporting page contains
links to view all alerts and events, as well as the general health and a report listing any script or
response errors.
The Operational Health Analysis page contains a number of more detailed reports that drill
down into the operational history of the management group. These include analysis of alerts,
events, and performance by type, severity, time, frequency, and computer group. You can also
view reports on the association between rule groups, computer groups, and individual
computers.
Summary
Management Packs can be a very useful tool for the operations team in managing applications.
This chapter demonstrated how to create and import Management Packs in Operations
Manager 2005 and then showed how to edit the Management Packs to provide the functionality
required when monitoring an application.
Chapter 17
Creating and Using System Center
Operations Manager 2007
Management Packs
Chapter 16 of this guide described creating and authoring Management Packs in Microsoft
Operations Manager 2005. This chapter describes how to perform the same tasks using System
Center Operations Manager 2007. This chapter discusses the same scenarios for creating and
using Management Packs. It describes in detail the following:
The Transport Order application is used as a running example throughout this chapter. This
application forms part of the shipping solution in the Northern Electronics worked example
used throughout this guide.
4. Use the MPConvert tool to convert the XML file into an Operations Manager 2007
Management Pack file. The syntax is the following:
mpconvert [folder_name\]source_file.xml [destination_folder\]
new_filename.xml
5. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Administration
button. If the navigation pane is not visible, click Navigation Pane on the View
menu.
6. In the left-side tree view, right-click Administration (at the top of the tree), and the
click Import Management Pack(s).
7. In the Select Management Pack(s) to import dialog box, select the .mp or .xml file
for the Management Pack you want to import. You can hold down the SHIFT or CTRL
keys while clicking to select more than one file.
Files with the .mp file name extension are Sealed Management Packs that you cannot
edit. Files with the .xml file name extension are Unsealed Management Packs that you
can edit.
8. Operations Manager imports the Management Packs you selected and installs them.
A dialog box reports the results, indicating any that it cannot import. When you close
the dialog box, Operations Manager 2007 begins monitoring; it collects the same
data as in Microsoft Operations Manager 2005.
Figure 1
The Distributed Application Designer
8. If you selected the Blank (Advanced) option in the Distributed Application Designer
dialog box, you will see an empty designer surface. To add items to the designer,
click the Add Component button in the toolbar at the top of the window to open the
Create Component Group dialog box, where you specify the type of component you
want to add. Enter a name for the new component, and select the Objects of the
following types(s) option. Then select the component type in the tree view at the
bottom of the Create Component Group dialog box.
The list contains a wide selection of possible component types. For a Web-based
application or Web service, expand the Application Component node of the tree view
to see components such as Database and Web Site. For a Windows-based application,
expand the Local Application node of the tree view and then expand the Windows
Local Application node to see the various types of user and local application types.
These include Health Service components such as a Management Server, Notification
Server, Windows Cluster Service, Windows Local Service, and Windows User
Application.
9. To create a relationship between the items you add to the designer, click the Create
Relationship button in the toolbar at the top of the window, click the source item in
the relationship, and then click the target item. This creates a relationship such that
the source item "uses" (depends on) the target item and the arrow points towards
the target item. Click the Create Relationship button again to switch out of the
Create Relationship mode and return to the normal "arrow" mouse pointer.
You use component groups and relationships to separate the sets of rules for each
component into logical groups that correspond to the separation between the
components of the application. You can apply rollup rules to the overall health state of
the component groups to generate the appropriate health state indication at higher
levels of the application structure.
10. Click the Save button in the toolbar at the top of the window and close the
Distributed Application Designer window.
To create a new monitoring group in the Operations Manager 2007 Operations Console
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left pane of the main window to show the Groups node
if it is not already visible. Right-click the Groups node, and then click Create a new
Group.
3. In the Create Group Wizard dialog box, enter a name for the group and type in a
description that will help administrators and operators to identify the group.
4. Select the Management Pack to which you want to add the new group in the drop-
down list at the bottom of the dialog box. If you have not already created a
Management Pack, click the New button and follow the instructions in the earlier
procedure, "To create a new Management Pack in the Operations Manager 2007
Operations Console."
5. Click Next in the Create Group Wizard dialog box to show the Choose Members from
a List page. On this page, you can explicitly choose the members for the new group.
To add a member, click the Add/Remove Objects button to open the Object
Selection dialog box. Select the type of entity you want to add in the Search for
drop-down list or leave the list set to Entity to search for all suitable objects. Enter
all or part of the name of the items you want to find in the Filter by part of name
text box or leave it blank to search for all items of the selected type.
6. Click the Search button to display the items that match your selection in the
Available items list. The list shows the available entities (computers, databases,
sites, and applications) based on a range of features, such as the name, operating
system, or status within the Operations Manager 2007 environment (such as
Notification Server). Select the individual items you want to add, and then click the
Add button. You can hold down the SHIFT and CTRL keys while clicking to select
more than one item. To remove an item selected in the Selected objects list, click
the Remove button.
7. Click OK in the Object Selection dialog box to return to the Create Group Wizard
dialog box, and then click Next to show the Create a Membership Formula page. On
this page, you can create rules and use formulae to automatically select computers
to add to the new group. Click the Create/Edit rules button to open the Query
Builder dialog box, and then select the type of items you want to add to the group in
the drop-down list at the top of the window.
8. Click the Add button to create a row in the grid where you can specify an expression
for selecting items. In the first column of the conditional expression row, select a
property for the item you added, such as the Display Name, and then select the
criteria for matching the property in the second column of the grid. You can use a
range of criteria, such as partial and full string matching on the name, wild-card
string matching, and regular expressions. Enter the criteria value for this row in the
third column of the grid. Then repeat the process to add more conditional
expressions to the grid as required.
Clicking the Insert button adds a conditional expression row to the grid. However, if
you click the small "down arrow" next to the Insert button, you can create a series of
AND and OR groups containing conditional expressions. Select an expression row and
click the Formula button to view the conditional expression for that row, or click the
Delete button to remove any row from the grid.
9. After you create any rules you require for selecting objects, click Next in the Create
Group Wizard dialog box to show the Choose Optional Subgroups page. On this
page, you can select other groups you have already created to build a hierarchy of
groups that allows you to use rollup rules to expose the health state of the group
members as a whole. Click the Add/Remove Subgroups button to open the Group
Selection dialog box, and enter any part of the name of the group(s) you want to add
in the text box at the top of the window. If you want to see a list of all groups, leave
the text box empty.
10. Click the Search button, and the Available items list shows all available groups.
Select the groups you want to add as children of the new groups, and then click the
Add button. You can hold down the SHIFT and CTRL keys while clicking to select
more than one item. To remove an item selected in the Selected objects list, click
the Remove button.
11. Click OK in the Group Selection dialog box to return to the Create Group Wizard
dialog box, and then click Next to show the Specify Exclude List page. Here, you can
specify any objects you do not want to include in the group, which the previous rules
set up in the Create Group Wizard would include.
12. Click the Exclude Objects button to open the Object Exclusion dialog box, and select
the type of entity you want to exclude in the Search for drop-down list or leave the
list set to Entity to search for all suitable objects. Enter all or part of the name of the
items you want to find in the Filter by part of name text box or leave it blank to
search for all items of the selected type.
13. Click the Search button to display the items that match your selection in the
Available items list, select the individual items you want to exclude, and then click
the Add button to add them to the Selected objects list. You can hold down the
SHIFT and CTRL keys while clicking to select more than one item. To remove an item
selected in the Selected objects list, click the Remove button.
14. Click OK in the Object Exclusion dialog box to return to the Create Group Wizard
dialog box, and then click Create. The new group appears in the Groups list in the
Operations Console.
To create a new rule for a group in the Operations Manager 2007 Operations Console
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left pane of the main window to show the Rules node
(which is under the Management Pack Objects node) if it is not already visible, and
then click the Rules node to select it. The main window shows a list of all the rules
installed in Operations Manager 2007, grouped by type.
3. Click the Change Scope hyperlink the in small notification area above the list to open
the Scope MP Objects by target(s) dialog box. You can use this feature to limit the
list of items to those within a particular scope (such as a Management Pack, group,
or distributed application); which makes it easier to find and work with the rules and
other objects you create.
4. Type part of the name of the group or Management Pack you want to scope in the
Look for text box at the top of the Scope MP Objects by target(s) dialog box. The list
changes to reflect matching item and a tick appears in the check boxes of these
matching items. To see all the items, select the View all targets option button, select
the check boxes of any other targets you want to include, and then click OK.
Alternatively, use the Look for text box and the Find Now button below the
notification area to select specific rules that match a search string.
5. Right-click the Rules node in the left-side tree view, and then click New rule, or click
the New rule link on the toolbar or in the Actions window at the right of the main
window to start the Create Rule Wizard. If you cannot see the Actions window, click
Actions on the View menu.
6. The Select a Rule Type page of the Create Rule Wizard allows you to select the type
of rule you want to create. You can create an alert generating rule based on an
event; a collection rule based on an event, a performance counter, or a probe; or a
timed command that executes a command or a script. Figure 2 shows rule type
selection page of the wizard.
Figure 2
The different types of rule available in the Create Rule Wizard
• For an alert generating rule, you can select the following:
◦ Generic CSV Text Log (Alert). This rule type matches against the entries stored
in a Comma-Separated-Values log file and generates an alert when a value that
you specify using a pattern matches an entry in the log file.
◦ Generic Text Log (Alert). This rule type matches against the entries stored in a
generic text log file and generates an alert when a value that you specify using a
pattern matches an entry in the log file.
◦ NT Event Log (Alert). This rule type matches against the properties of events in
Windows Event Log of the monitored computers and generates an alert when a
matching event occurs. You can match on any of the fields of an event log entry,
such as the name, computer name, event number, category, and description.
◦ SNMP Trap (Alert). This rule type listens for events generated by specific classes
and traps of an SNMP provider on the monitored computers and generates an
alert when a matching event occurs.
◦ Syslog. This rule type matches against syslog entries forwarded to the
monitored computers, and generates an alert when a matching event occurs.
You can match on any of the values in the incoming syslog entry.
◦ WMI Event (Alert). This rule type uses a Windows Management Instrumentation
(WMI) query within a namespace you specify, which runs at intervals you define,
to query WMI objects and generate an alert when a query match occurs.
• For a collection rule, you can select from three categories of rule type, located in the
three folders named Event Based, Performance Based, and Probe Based. The collection
rule types are the following:
◦ Generic CSV Text Log (event-based rule). This rule type collects and logs to the
Operations Manager database entries stored in a comma-separated values log
file, using pattern matching to locate entries in the log file.
◦ Generic Text Log (event-based rule). This rule type collects and logs to the
Operations Manager database entries stored in a generic text log file, using
pattern matching to locate entries in the log file.
◦ NT Event Log (event-based rule). This rule type collects and logs to the
Operations Manager database events occurring in the Windows Event Log of the
monitored computers.
◦ SNMP Event (event-based rule). This rule type collects and logs to the
Operations Manager database events from a specified SNMP provider on the
monitored computers.
◦ SNMP Trap (Event) (event-based rule). This rule type collects and logs to the
Operations Manager database event traps from a specified SNMP provider on
the monitored computers.
◦ Syslog (event-based rule). This rule type collects and logs to the Operations
Manager database syslog entries forwarded to the monitored computers.
◦ WMI Event (event-based rule). This rule type uses a WMI query within a
namespace you specify, which runs at intervals you define, to collect and log
results to the Operations Manager database.
◦ SNMP Performance (performance-based rule). This rule type collects and logs to
the Operations Manager database performance counters exposed by a specified
SNMP provider on the monitored computers.
◦ WMI Performance (performance-based rule). This rule type collects and logs to
the Operations Manager database performance counters exposed through WMI
on the monitored computers.
◦ Windows Performance (performance-based rule). This rule type collects and
logs to the Operations Manager database values from Windows performance
counters defined on the monitored computers.
◦ Script (Event) (probe-based rule). This rule type collects and logs to the
Operations Manager database details of events that cause a specified script to
run when a matching event occurs on the monitored computers.
◦ Script (Performance) (probe-based rule). This rule type collects and logs to the
Operations Manager values of performance counters that cause a specified
script to run when a matching event occurs on the monitored computers.
• For a timed command, the rule types are the following:
◦ Execute a Command. This rule type runs a specified command using the
Operations Manager Windows command shell at the intervals you specify.
◦ Execute a Script. This rule type runs a specified script, either VBScript or JScript,
at the intervals you specify.
7. While you are still on the Select a Rule Type page, select the Management Pack to
which you want to add the new rule in the drop-down list at the bottom of the
dialog box. If you have not already created a Management Pack, click the New
button and follow the instructions in the earlier procedure, "To create a new
Management Pack in the Operations Manager 2007 Operations Console."
8. Click Next, and enter a name for the new rule and a description that will help
administrators and operators to identify the rule. Then click the Select button to
open the Select a Target Type dialog box. This dialog box shows a list of all the types
of object to which you can apply the new rule. Type part of the name of the entity
(group, computer, or Management Pack) you want to apply the rule to in the Look
for text box at the top of the dialog box. The list changes to reflect matching items
and the check boxes of these matching items become selected. To see all the
available entities, select the View all targets option button, select the check boxes of
any other targets you want to include, and then click OK.
9. Make sure that the Rule is enabled check box is selected (unless you want to create
the new rule but not enable it yet), and then click Next. The page you see next
depends on the type of rule you are creating:
◦ For a rule that uses a Generic Text (or CSV) Log as its source, you see the
Application Log Data Source page where you specify the source log file path and
name, and the pattern you want to use to match values in the log file. You can
also specify if the log file is UTF8 format instead of the more usual UTF16
format. Then click Next to show the Build Event Expression page, where you
specify how the rule will map to values your pattern selects from the log file.
You can use a range of criteria, such as partial and full string matching on the
name, wild-card string matching, and regular expressions. Enter the criteria
value for this row in the third column of the grid. Click the Insert button to add
more conditional expressions to the grid as required.
Clicking the Insert button adds a conditional expression row to the grid.
However, if you click the small "down arrow" next to the Insert button, you
can create a series of AND and OR groups containing conditional expressions.
Select an expression row and click the Formula button to view the conditional
expression for that row, or click the Delete button to remove any row from
the grid.
◦ For an NT Event Log or an NT Event Log (Alert) rule, you see the Event Log Name
page where you specify the source event log (such as Application, System, or
Security). Click the ellipsis button (...) to open a dialog box where you can select
a computer, and then select from the list of all available Windows Event Logs on
that computer. Click OK to return to the Event Log Name page, and then click
Next to show the Build Event Expression page where you specify how the rule
will map to events in the Windows Event Log. You can match on the standard
event properties, or use a numbered parameter, and specify a conditional
expression to match to that property value. You can use a range of criteria, such
as partial and full string matching on the name, wild-card string matching, and
regular expressions (see Figure 3). Enter the criteria value for this row in the
third column of the grid. Click the Insert button to add more conditional
expressions to the grid as required.
Figure 3
Specifying the mapping between a Windows Event and an event rule
◦ For a rule that uses SNMP as its data source, you see an SNMP object identifier
configuration page. Here, you must specify the discovery or a community string
that identifies the SNMP provider. If you are creating a collection rule, you can
also change the collection frequency using the drop-down list on this page. Then
specify the object identifier properties for each property you want to access.
Alternatively, if you are creating an alert generating rule, you can select the All
Traps check box.
◦ For a rule that uses a forwarded syslog entry as its data source, you see a Build
Event Expression page similar to that for the NT Event Log rule types. You can
use a range of criteria to match the value your pattern selects from the log
entry, such as partial and full string matching on the name, wild-card string
matching, and regular expressions. Enter the criteria value for this row in the
third column of the grid. Click the Insert button to add more conditional
expressions to the grid as required. Click the small "down arrow" next to the
Insert button to create AND and OR groups containing conditional expressions.
◦ For a rule that uses WMI as its data source, you see the Configure WMI Settings
page. Here, you specify the WMI namespace and the query. You can change the
polling interval using the drop-down list in this page.
◦ For a Windows performance rule, you see the Performance Object, Counter, and
Instance page. Click the Browse button to display the Select Performance
Counter dialog box and select the source computer, the performance counter
object (either a built-in object such as .NET CLR Data or your application
performance counter object), and the actual counters contained in this counter
object. Click the Explain button to see the explanatory text for the selected
counter. Then click OK to automatically populate the Object, Counter, and
Instance text boxes. Alternatively, you can use pattern matching strings for
these values to select multiple counters. You can also select the check box below
the text boxes to specify that the rule should include all instances of the
specified counter. Finally, change the collection Interval settings as required,
and click Next to show the Optimized Performance Collection Settings page.
Here, you must specify a tolerance for changes in the sample values collected
from the data source. Low tolerance (low optimization) means that small
changes in the values will cause Operations Manager to create a database entry,
while high tolerance (high optimization) provides information on changes in
performance that is more granular but stores less data. You can also specify an
absolute tolerance value or a percentage (see Figure 4).
Figure 4
Specifying the Optimized Performance Collection Settings
◦ For a Script (Event) or a Script (Performance) rule, you see the Schedule page,
where you specify how often the script should execute. The default is every 15
minutes, and you can enter a specific synchronization time from which the
intervals are measured. Then click Next to open the Script page, where you
enter the name of the script to execute, specify the script timeout, and select
the language (VBScript or JScript). Then edit the script in the window or click the
Edit in full screen button, and then type (or copy and paste) the script you
require. If your script requires parameters, click the Parameters button and
enter the parameter names. You can click the Target button next to the
Parameters list to insert property placeholders such as the display name or ID of
the computer or management group. Click OK and, back on the Script page, click
Next. If you are creating a Script (Event) rule, you see the Event Mapper page.
Use the ellipsis buttons (...) next to each text box to specify the Computer, Event
source, Event log, Event ID, Category of the event that will cause the script to
execute, and select the Level (such as Information, Warning, or Error) from the
drop-down list. If you are creating a Script (Performance) rule, you see the
Performance Mapper page. Use the ellipsis buttons (...) next to each text box to
specify the Object, Counter, Instance, and Value of the counter that will cause
the script to execute.
◦ For the Execute a Command rule, you see the Specify your Schedule Settings
page, where you can specify execution of the command or script on a simple
recurring interval basis or create a weekly schedule to execute the command or
script. After creating a suitable schedule, click Next to show the Configure
Command Line Execution Settings page. Here, you specify the full path and
name of the program to execute and any parameters you want to pass to that
program. You can click the arrow button next to the Parameters text box to
insert value placeholders, such as the display name or ID of the computer or
management group. In the Additional settings section of this page, you can
specify the working directory for the program, whether to capture the program
output, and the timeout for program execution.
◦ For the Execute a Script rule, you see the Specify your Schedule Settings page,
where you can execute the command or the script at a simple recurring interval,
or you can create a weekly schedule to execute the command or script. After
creating a suitable schedule, click Next to show the Script page. This is the same
page that appears for the Script (Event) or a Script (Performance) rules discussed
earlier and allows you to create the script to execute for this rule.
10. If you are creating an alert-generating event, you now see the Configure Alerts page.
On this page, you must specify the Name, Description, Priority, and Severity of the
alert that the rule will generate. Select Low, Medium, or High in the Priority drop-
down list, and Warning, Information, or Critical in the Severity drop-down list. If you
want to suppress repeated occurrences of this alert, click the Alert suppression
button to open the Alert Suppression dialog box, and select the check boxes next to
the fields of the source event that must have identical values for the alert to be
considered as a duplicate and suppressed.
You can use custom fields to pass values from event rules to alerts and monitors. Click
the Custom alert fields button and enter the values for any of these fields you want to
use or click the ellipsis button (...) next to a field text box and select a value from the
target entity or the source alert in the lists available in the Alert Description dialog box
that appears.
11. Click Create on the final page of the Create Rule Wizard and the new rule appears in
the list in the main window of the Operations Console.
To create a probe monitor in the Operations Manager 2007 Operations Console
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left pane of the main window to show the Management
Pack Templates node if it is not already visible, and then expand this node to show a
list of available templates. Right-click the Management Pack Templates node or one
of the template nodes, and then click Add monitoring wizard.
3. On the Select Monitoring Type page of the Add Monitoring Wizard, select the type of
probe monitor you want to create from the list. The four template types are the
following:
◦ OLE DB Data Source. This probe monitor tests the connectivity to any OLE-DB
compliant database at the specified intervals.
◦ TCP Port. This probe monitor sends a "ping" to the specified port on a specified
computer at the specified intervals.
◦ Web Application. This probe monitor sends one or more HTTP requests to a
specified Web site at the specified intervals.
◦ Windows Service. This probe monitor sends commands to a specified Windows
service at the specified intervals.
4. Click Next to show the General Properties page, and enter a name for the probe
monitor. Enter a description that will help administrators and operators to identify
the monitor. Then select the Management Pack to which you want to add the new
monitor in the drop-down list at the bottom of the dialog box. If you have not
already created a Management Pack, click the New button and follow the
instructions in the earlier procedure, "To create a new Management Pack in the
Operations Manager 2007 Operations Console."
5. The page you see next depends on the type of rule you are creating:
◦ OLE DB Data Source. For this type of probe monitor, you see a page where you
specify the connection details for the database. You can specify a Simple
Configuration using the Provider name, the IP address or device name, and the
name of the Database. Alternatively, you can select Advanced Configuration
and provide the full connection string. Click the Test button to check the
connection.
◦ TCP Port. For this type of probe monitor, you see a page where you specify the
IP address or device name and the Port number of the target computer you
want to probe. Click the Test button to check the availability of the specified
port.
◦ Web Application. For this type of probe monitor, you see a page where you
specify the URL of the Web application or Web page you want to probe. Click
the Test button to check the availability of the specified URL.
◦ Windows Service. For this type of probe monitor, you see a page where you
specify the service name. Click the ellipsis button (...) to open the Select
Windows Service dialog box, where you can select a computer and see a list of
the available services on that computer. Then go directly to step 9.
6. Click Next and, for all types except the Windows Service monitor, you see the
Choose Watcher Nodes page. This displays a list of all computers running the
Microsoft Operations Manager remote agent. Select the checkbox next to the
computer(s) that you want to execute this probe monitor. You can execute it from
the Microsoft Operations Manager management server or any of the remote agent-
managed computers in the management group.
7. Use the controls at the bottom of the Choose Watcher Nodes page to change the
frequency at which the probe monitor executes to the required value. The default is
every two minutes.
8. Click Next to see a summary of your settings. If you are creating a Web Application
probe monitor, you can select the check box at the bottom of this page to start the
Web Application Editor, where you can specify exact details of the request, create
group requests, and even record navigation using your Web browser.
9. Click Create to create the new monitor and close the wizard. Then expand the tree
view in the left pane of the main window, and then select the Monitors node (which
is under the Management Pack Objects node) if it is not already visible. The main
section of the window shows a list of all the monitors. You can use the Change Scope
link the in toolbar to limit the list of items to those within a particular scope (such as
a Management Pack, Group, or Distributed Application); which makes it easier to
find and work with the monitors you create.
3. For each entity (such as a distributed application or group), you can create monitors
for the four categories: Availability, Configuration, Performance, and Security.
Right-click the category node to which you want to add a new monitor, click Create a
Monitor, and then click Unit Monitor to start the Create Monitor Wizard.
4. On the Select a Monitor Type page of the Create Monitor Wizard, select the type of
monitor you want to create. There are many different types available, organized in
folders denoting the type (see Figure 6). These monitor types equate to the rule
types described in more detail in the earlier procedure, "To create a new rule for a
group in the Operations Manager 2007 Operations Console."
Figure 6
Some of the different types of unit monitor you can create
There are five basic types of unit monitor. You can create a monitor that reacts to an event, to
the changes in a performance counter value, to the result of executing a custom script, to an
SNMP event or trap, or which monitors a Windows service.
For an event monitor, you can detect one or more events that match a specified criteria (a
correlated event), a combination of different events occurring over a specified period, a
missing event that you expect to occur, or a series of repeated events. You can also specify if
the operator must reset the state manually, or if another event or a timer can reset the state.
For a performance monitor, you can detect specific values or specify threshold value ranges,
and expose a two-state (RED and GREEN) or a three-state (RED, YELLOW, and GREEN) health
status. You can also create a baseline performance monitor that measures average
performance over time. This is useful for measuring adherence to service level agreements
(SLAs) and estimating performance capabilities of the application and its individual
components.
For a script monitor, you can expose a two-state or a three-state health status.
For an SNMP monitor, you can detect a combination of different events occurring over a
specified period.
For a Windows service monitor, you can detect changes to the state and operation of the
service.
5. The following pages of the Create Monitor Wizard collect all the information
required for the specific monitor type you select. Each asks first for the name and
description of the monitor, and the Management Pack to add it to. Then there is a
different series of pages, but all follow the same basic pattern. The first steps help
you to set up the correlation (mapping) between one or more source events,
counters, or script executions and the new monitor:
◦ Event Monitor. For this type of monitor, you specify the name of the correlated
event logs, and expressions that match the events you want to monitor. This is a
similar process to that described in the earlier procedure for creating an event
rule.
◦ Performance Monitor. For this type of monitor, you specify the counter name
and location, and the threshold values. You can also use this type of monitor to
create baseline information (including varying the "learning rate") that indicates
the average performance of the monitored application or its individual
components over long or short business cycles (see Figure 7).
Figure 7
Specifying the threshold and learning cycle values for a baseline performance monitor
◦ Script Monitor. For this type of monitor, you specify the script to execute, and
any parameters it requires.
◦ SNMP Monitor. For this type of monitor, you specify one or more expressions
that match the SNMP traps or probes.
◦ Windows Service Monitor. For this type of monitor, you specify the location and
name of the service you want to monitor.
6. Complete the remaining pages of the Create Monitor Wizard. These pages include
the following:
◦ The Configure Health page that allows you to specify the health states that
Operations Manager will display when the correlated event, counter threshold,
or script execution occurs. You assign a Critical (RED), Warning (YELLOW), or
Healthy (GREEN) health state to each occurrence or value of the correlated
event, counter, or script execution.
◦ The Configure Alerts page that allows you to specify if changes to the state
detected by this monitor will raise an alert to display in the console (and,
optionally, send it to operators as an e-mail or pager message). You also specify
the severity of the alert here.
7. Click Create to create the new monitor and you will see it appear in the list in the
main window of the Operations Console.
8. To add product or company knowledge to a monitor, select it in the list in the main
window, right-click, and then click Properties. Open the Product Knowledge page,
click the Edit button, and enter the required information that helps operators and
administrators to diagnose, resolve, and verify resolution of the problem.
To create a health rollup monitor in the Operations Manager 2007 Operations Console
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left pane of the main window to show the Monitors
node (which is under the Management Pack Objects node) if it is not already visible,
and select it. Use the Change Scope link the in toolbar to limit the list of items to
those within the required scope, and expand the nodes below the Entity Health
node in the main Operations Console window to see the four categories:
Availability, Configuration, Performance, and Security.
3. If you want to create a rollup monitor that reflects the health state of a complete
distributed application or a top-level group, select the Entity Health node for that
distributed application or group. If you want to create a rollup monitor that reflects
the health state of one of the four categories below the Entity Health node
(Availability, Configuration, Performance, and Security), select that node instead.
4. Right-click the selected node, click Create a monitor, and then click either
Dependency Rollup Monitor or Aggregate Rollup Monitor to start the wizard.
A dependency rollup monitor allows you to specify the rollup policy based on subsets
of computers or components within the same group and specify what state to expose
when monitoring is unavailable or the computers are in maintenance mode
(temporarily disconnected from the monitoring system). An aggregate rollup monitor
simply exposes the best or worse state of all the computers or components within the
group.
5. In the General Properties page of the wizard, enter a name for the monitor and a
description that will help administrators and operators to identify the monitor. Then
click the Select button to open the Select a Target Type dialog box. This dialog box
shows a list of all the types of object to which you can apply the new monitor. Type
part of the name of the entity (group, computer, or Management Pack) you want to
apply the monitor to in the Look for text box at the top of the dialog box. The list
changes to reflect matching items and the check boxes of these matching items
become selected. To see all of the available entities, click the View all targets option
button and select the check boxes of any other targets you want to include.
6. Click OK to close the Select a Target Type dialog box, and select the appropriate
parent monitor that will act as a rollup for this monitor from the list on the main
wizard page.
7. Select the Management Pack to which you want to add the new monitor in the drop-
down list at the bottom of the dialog box. If you have not already created a
Management Pack, click the New button and follow the instructions in the earlier
procedure, "To create a new Management Pack in the Operations Manager 2007
Operations Console." Also make sure that the Monitor is enabled check box is
selected unless you do not want to enable the monitor immediately. Then click Next.
8. If you are creating an aggregate rollup monitor, the Health Rollup Policy page you
see next allows you to specify if the health state exposed by the monitor is that of
the worst state of any member of the group or the best state of any member of the
group. Select the required option, and then go to step 12 of this procedure.
As an example, if you select Worst state of any member while one computer has a
Warning state, one has an Critical state, and the rest have a Healthy state, the
monitor will show Critical. If you select Best state of any member while one computer
has a Warning state, one has an Critical state, and the rest have a Healthy state, the
monitor will show Healthy.
9. If you are creating a dependency rollup monitor, the next wizard page contains a
tree-view list of the entities related to the current entity for which you are creating
the monitor. These relationships match those that you (or the template you used in
the Distributed Application Designer) created. You also see all the subgroups within
the current group. Expand the target entity or group for which you are creating a
Monitor, and you see the Entity Health node and the four category nodes,
Availability, Configuration, Performance, and Security. Within each of these nodes
are any Monitors you have already created, and any default monitors created by the
Distributed Application Designer (see Figure 8).
Figure 8
Selecting the target entity for a Dependency Rollup Monitor
10. Select the node for which you want to rollup the state of the members, and click
Next to show the "Configure Health Rollup Policy" page. This page allows you to
specify how the overall state for the group will reflect the states of individual
members of the group. The three options in this page (see Figure 9) are:
◦ Worst state of any member. If you select this option, Operations Manager will
set the State value displayed in the Operations Console to that specified for the
Severity for the worst of the current unresolved alerts for the members of this
group.
◦ Worst state of the specified percentage of members in good health state. If
you select this option, you must specify a percentage that defines the
proportion of the group will act as the state indicator for the group. Operations
Manager will select a set of members from the group that consists of the
computers with the best health state up to the percentage you specified of the
total group membership. In other words, if there are 10 computers and you
specify 60%, Operations Manager will select the six members of the group that
currently have the least severe state. It then uses the worst (the most severe)
state of the subset it selects as the overall (rolled-up) state for the group, and
displays this in the Operations Console as the State value for this group.
◦ Best state of any member. If you select this option, Operations Manager will set
the State value displayed in the Operations Console to that specified for the
Severity for the best of the current unresolved alerts for the members of this
group. It is unlikely that you will use this option very often, as it effectively hides
the state of most of the members of the group as long as one member is
performing correctly.
Figure 9
Configuring the Health Rollup Policy for a Dependency Rollup Monitor
11. In the lower section of the "Configure Health Rollup Policy" page, use the two drop-
down lists to specify what state you want to assume for unavailable members of the
group (members where monitoring has failed, or members in maintenance mode). In
the first drop-down list, specify if the Rollup Monitor should treat a failed member's
state as either a Warning or an Error, or just ignore the failed member. In the second
drop-down list, specify if the Rollup Monitor should treat a member in maintenance
mode as either a Warning or an Error, or just ignore this member.
12. Click Next to show the "Configure Alerts" page (for both a Dependency Rollup
Monitor and an Aggregate Rollup Monitor). Set or clear the checkbox at the top of
the Alert Settings section of the page to specify if this Monitor will create an alert to
display in the console and send to operators when the health state changes. If you
turn on alerts, use the drop-down list below this checkbox to specify generation of
an alert for both a Critical state and a Warning state, or just for a Critical state. If
you require the alert to be automatically resolved when the monitor returns to a
Healthy state, set the checkbox below the drop-down list.
13. In the Alert Properties section of the page, enter a name for the Alert, a description,
and select the Priority and Severity you want to assign to the alert. The available
values for Priority are Low, Medium, and High. The available values for Severity are
Critical, Warning, and Information.
14. Click Create and you will see the new Monitor appear in the list in the main window
of the Operations Console.
15. To add product or company knowledge to a monitor, select it in the list in the main
window, right-click, and select Properties. Open the Product Knowledge page, click
the Edit button, and enter the required information that helps operators and
administrators to diagnose, resolve, and verify resolution of the problem.
To edit a rule
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left-hand pane of the main window to show the Rules
node (which is under the Management Pack Objects node) if it is not already visible,
and select it. Use the Change Scope link the in toolbar to limit the list of items to
those within the required scope.
3. Select the rule you want to edit, right-click it, and then click Properties (or double-
click the rule). The Properties dialog box contains the following tabbed pages:
◦ General. On this page, you can edit the Rule name and the Description.
However, you cannot change the rule target in this dialog box. To enable or
disable this rule, select or clear the Rule is enabled check box.
◦ Configuration. On this page, you can see the details of the source for the rule,
such as an event log, WMI query, or a performance counter. If the details are
available for editing, you will see an Edit button that opens a source type-
specific dialog box that allows you to change the settings for the source of this
rule. If the details are not editable, you will see a View button that opens a
source type-specific dialog box that allows you to view the settings for the
source of this rule.
◦ Configuration. On this page, you can see a list of any Responses defined for this
rule, such as creating an alert or running a script. If the details are available for
editing, you can click a response in the list and click the Edit button to view and
edit the properties of the selected response. You can also add new responses or
remove existing responses. If the details are not editable, you will see just a
View button that opens a dialog box that allows you to view any properties of
the selected response for this rule.
◦ Product Knowledge and Company Knowledge (for built-in rules). On this page,
you can see the knowledge associated with this rule. Click the Edit button to edit
the company knowledge for a built-in rule.
4. Click OK or Apply to save your changes to the rule properties.
To edit a monitor
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. In the navigation pane, click the Authoring button. If
the navigation pane is not visible, click Navigation Pane on the View menu.
2. Expand the tree view in the left pane of the main window to show the Monitors
node (which is under the Management Pack Objects node) if it is not already visible,
and select it. Use the Change Scope link the in toolbar to limit the list of items to
those within the required scope.
3. Select the monitor you want to edit, right-click it, and then click Properties (or
double-click on the monitor). The Properties dialog box contains the following
tabbed pages:
◦ General. On this page, you can edit the Name and the Description. Although you
cannot change the monitor target in this dialog box, you can select a different
parent monitor if you want a different roll-up monitor to handle state changes
for this monitor. To enable or disable this monitor, select or clear the Monitor is
enabled check box.
◦ Product Knowledge and Company Knowledge (for roll-up monitors). On this
page, you can click the Edit button and edit the product and company-specific
knowledge for this monitor.
◦ Health. On this page, you can specify the heath state (Critical, Warning, or
Healthy) for each monitor condition. For example, you can map the Degraded
monitor state to a Warning health state.
◦ Alerting. On this page, you can edit the settings for an alert that this monitor will
generate.
◦ Diagnostic and Recovery. On this page, you can add, modify, and remove
diagnostic and recovery tasks that will execute when the state of the monitor
changes to Critical or Warning. For example, you can configure a script or a
command to execute.
4. Depending on the type of monitor you are editing, you may see other pages in the
Properties dialog box. These include details of the event, script, counter, WMI query,
log file, Windows service, or other source for the monitor. Each allows you to modify
the settings for this monitor source. There are also pages, depending on the monitor
type, for the schedule to execute a script or command, and the actions that reset the
monitor when the state changes.
5. Click OK or Apply to save your changes to the monitor properties.
Figure 12
Viewing the knowledge for an unresolved alert in Active Alerts view
◦ Computers. In this view, the main window shows a list of computers within the
current scope, and the health state of each one—including the monitored
features it supports such as Agent, Management Server, or Windows Operating
System. Right-click a computer, click Open on the shortcut menu, and then
select from the five available views: Alert View, Diagram View, Event View,
Performance View, or State View. You can alternatively open the Health
Explorer for the selected computer from here or a PowerShell command
prompt.
◦ Discovered Inventory. In this view, the main window shows the overall state,
display name, and the path to each computer in the current scope. Double-click
a computer to see the properties of that computer. Right-click a computer, click
Open on the shortcut menu, and then select from the four available views: Alert
View, Diagram View, Event View, or Performance View. You can alternatively
open the Health Explorer for the selected computer from here or a PowerShell
command prompt.
◦ Distributed Applications. In this view, the main window shows the distributed
applications in the current scope and the overall state for each one. Right-click
an application, click Open on the shortcut menu, and then select from the five
available views: Alert View, Diagram View, Event View, Performance View, or
State View. You can alternatively open the Health Explorer for the selected
application from here or a PowerShell command prompt.
◦ Task Status. In this view, the main window shows all the tasks that Operations
Manager carried out, such as discovering computers, installing agents, and
executing monitoring probes. The details pane shows the output from each task
as you select it in the main Task Status list. If the details pane is not visible, click
Detail Pane on the View menu. Right-click a task, and then click Health Service
Tasks to see a list of the many tasks you can execute. These include a range of
configuration, probe, discovery, recovery, and execution tasks.
3. The first four of the basic monitoring categories listed in the previous step provide
views containing more detailed information:
◦ Alert view. This shows a list of active alerts for only the selected computer or
application.
◦ Diagram view. This shows a schematic representation of this computer or
application. This is a useful view for understanding the structure of a distributed
application or series of hierarchical groups (see Figure 13). It shows the overall
health state for each component as well as the application as a whole, and you
can expand and collapse the nodes to explore where any problems or
performance issues exist. Right-click any of the components, and then click
Health Explorer to open a window that contains a tree view where you can
explore the individual rules and monitors for the entire application; you can also
see details of the state of each one and the associated knowledge that helps to
verify, diagnose, resolve, and re-verify any problems.
Figure 13
Viewing the schematic structure and state of a distributed application in Diagram
view
◦ Event view. This shows details of the source events for the computer or
application. Right-click an event, and then select Show associated rule
properties to see the rules associated with the event. The details pane shows
the properties of the selected event.
◦ Performance view. This shows the performance counters available for a
computer or application. Select a counter from the list to see a graph of the
values over time (see Figure 14). This window contains commands on the
Actions menu that allow you to select the time range, copy or save the graph
image, and copy the source data to the clipboard for further examination and
analysis. For a baseline counter, you can also pause or restart a collection, or
you can reset the baseline values.
Figure 14
Viewing the history for a performance counter in Performance view
◦ State view. This shows the overall state of the computer or application. This
window shows the state and properties of the selected computer or application,
and contains commands to show the Health Explorer window and view reports.
◦ Other options available in the views listed in steps 2 and 3 allow you to start
Maintenance mode for an application or a computer, or you can create
personalized views with specific columns, grouping and ordering to suit your
requirements.
4. In any view, select a distributed application or a computer and double-click to open
the Health Explorer window. In the left pane tree view, expand the nodes to show
the overall state for the application or computer (the Entity Health node). Within
this node, depending on the structure of your application, you see the rolled-up
health state and the individual category health states for each component. Figure 15
shows the health state for an example distributed application.
Figure 15
The Health Explorer window for a distributed application
5. As you select each node in the Health Explorer tree view, the Knowledge tabbed
page in the right pane shows the product and company-specific knowledge for that
node. The State Change Events tab page shows a list of the events that caused
changes to the state, and the event context.
6. Examine the other views available in Monitoring mode to see a high-level view of the
computers and the applications they are running, and the overall health state of
each one. Expand the nodes in the tree node in the left pane of the main Operations
Console window for the category of information you are interested in. Available
categories include agentless monitored computers, Windows client computers,
Windows Server computers, Web applications, network devices, Operations
Manager itself, and any application groups you have created. Select the State node
within category group to see an overall view of the state for that category. You can
then right-click entries in the main window to see the different views (described in
step 3), or open Health Explorer or the PowerShell command prompt.
To use the Web Console to connect over an external network such as the Internet
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Web Console. The Web Console provides only monitoring features and
displays a much simpler interface for selecting and viewing information (see Figure
16).
Figure 16
The Web Console provided with System Center Operations Manager 2007
2. The left pane tree view displays only four basic categories and a reduced set of other
monitoring categories. However, it still provides a wealth of monitoring capabilities,
and works in much the same way as the standard Operations Console.
• If you connect directly to the management domain, use the Operations Console to
monitor applications and computers. If you connect from a remote location over the
Internet or an intranet, use the Web Console to monitor applications and computers.
• Use the Scope option on the View menu to limit your view to the appropriate
distributed applications or groups and subgroups, unless you want to see alerts raised
by all the managed computers for all events.
• Use the State view and the Diagram view to provide an overall picture of the health
state of the application. In Diagram view, you can also see the state of the subgroups
and individual computers.
• Use the Alerts view to obtain a list of alerts, generally sorted by descending severity,
which is useful in prioritizing diagnosis and resolution requirements, and the
corresponding actions.
• Use the Events view to see the details of source events, and use the Performance view
to see the values and history of performance counter samples. Both are useful in
diagnosing problems and verifying resolution.
• Use the Health Explorer to see the state of individual components, individual categories
(such as Configuration or Performance), and individual monitors and rules.
• Create personalized views if you want to see information displayed in a different order
or in different groups.
The reporting feature for System Center Operations Manager 2007 is a separate installation
from the monitoring system. You must rerun the setup for Operations Manager and select
Operations Manager 2007 Reporting to install the reporting feature.
To view monitoring and management reports in Operations Manager 2007
1. On the taskbar, click Start, point to System Center Operations Manager 2007, and
then click Operations Console. There are two ways to create a report. To view
information for a single computer or a single distributed application, go to step 2 of
this procedure. To view information for multiple computers, distributed applications,
or other entities, go to step 5 of this procedure
2. To view information for a single computer or a single distributed application, click
the Monitoring button in the navigation pane at the lower-left of the window. If the
navigation pane is not visible, click Navigation Pane on the View menu.
3. In the left pane tree view, select either the computer you want to view information
for in the Computers section or the application you want to view information for in
the Distributed Applications section.
4. The actions pane to the left of the main window contains a series of links to the
popular types of report. If you cannot see the actions pane, click Actions on the
View menu. Click the report you want to generate to open the Report Viewer
window. Now go to Step 7 of this procedure.
5. To view information for multiple computers, distributed applications, or other
entities, click the Reporting button in the navigation pane at the lower-left of the
window. Note that the Reporting button is not available until you install the
reporting feature for Operations Manager 2007.
6. In the left pane tree view, expand the Reporting node, and then click Microsoft
Generic Report Library. Right-click a report in the list in the main window, and then
click Open to open the Report Viewer window.
7. The Report Viewer window contains a series of controls where you specify the
period for the report, the objects to include, and any other parameters specific to
that report. For example, when you open the Alerts report, you can specify the
severity and priority of alerts that the report will include. Figure 17 shows these
parameter settings and the other Report Viewer controls, and the way that you can
select the period for the report.
Figure 17
The Report Viewer showing the controls for the parameters for the report
8. If you specified a computer or a distributed application and opened Report Viewer
from the Monitoring section of the Operations Manager console, the Objects list in
Report View will contain the item you selected. If you opened Report Viewer from
the Reporting section of the Operations Manager console, the Objects list will be
empty.
9. To add items to the Objects list, click the Add Group or Add Object button. In the
dialog box that opens, select a search option in the drop-down list, such as Contains
or Begins with, and then enter the text part of the name of the object(s) or group(s)
you want to find. If you want to specify the dates between which objects or groups
were created, or the management group they belong to, click the Options button,
and then enter the relevant details.
10. Click the Search button to view all matching items in the Available items list. Select
those you want to include in the report (you can hold down the SHIFT and CTRL keys
to select multiple items in the list), and then click the Add button to add them to the
Selected objects list. Then click OK to return to Report Viewer.
11. Set any other parameter values you require in the controls at the top of the Report
Builder window, and then click the Run button on the main toolbar to start the
report running. After a few moments, the report appears (see Figure 18).
Figure 18
The results of running the Event Analysis report for two computers
The reports included with Operations Manager 2007 allow you to view alerts and alert latency;
availability and health; custom configuration and configuration changes; event analysis, most
common events, and custom events; and performance and health details. You can also author
your own reports, and set up scheduled reporting. Figure 19 shows the graphical reports for
alert latency over 1 second.
Figure 19
The results of running the Alert Latency report for all alerts during one day
Summary
Management Packs can be a very useful tool for the operations team in managing applications.
This chapter demonstrated how to create and import Management Packs in Operations
Manager 2007, and then it described how to edit the Management Packs to provide the
functionality required when monitoring an application.
Section 5
Technical References
This section provides additional technical resources that can be of use when designing and
developing manageable applications. Chapter 18, "Design of the DFO Artifacts," is incomplete in
the preliminary version of this guide. Chapter 19 describes how to create or modify a guidance
package to modify the application management model defined in the Team System
Management Model Designer Power Tool (TSMMD).
This section is aimed primarily at solutions architects and application developers.
Appendix A, "Building and Deploying Applications Modeled with the TSMMD"
Appendix B, "Walkthrough of the TSMMD Tool"
Appendix C, "Performance Counter Types"
Appendix A
Building and Deploying Applications
Modeled with the TSMMD
In this preliminary version of the guide, this chapter provides guidance on how you can consume
the instrumentation artifacts generated by the Team System Management Model Designer
Power Tool (TSMMD) in your applications, and how you can deploy the applications complete
with the appropriate instrumentation. This chapter also explains how you can generate
Management Packs for System Center Operations Manager using the TSMMD. The topics in this
chapter are:
• Consuming the Instrumentation Helper Classes
• Verifying Instrumentation Coverage
• Removing Obsolete Events
◦ DatabaseEntity.MediumTrust.Impl
◦ WebsiteEntity.API
◦ WebsiteEntity.HighTrust.Impl
◦ WebsiteEntity.MediumTrust.Impl
4. In the code of the application that consumes the instrumentation, call the methods of
the instrumentation helper classes to raise events or increment performance counters.
For example, to raise an event named DatabaseFailedEvent that takes as a parameter
the name of the database, you can use code like the following.
C#
DatabaseEntity.API.DatabaseEntityAPI.GetInstance().RaiseDatabaseFailedEv
ent("SalesDatabase");
Visual Basic
DatabaseEntity.API.DatabaseEntityAPI.GetInstance().RaiseDatabaseFailedEv
ent("SalesDatabase")
For a detailed description of the instrumentation projects and artifacts, see Chapter 8 "Creating
Reusable Instrumentation Helpers".
An additional limitation in this release is that the instrumentation discovery process will not
locate instrumentation in an ASP.NET Web application written in Visual Basic.
• PerformanceCountersInstaller.dll
If you include Enterprise Library Log Events in your model, the configuration file created by the
TSMMD will contain the configuration information that Enterprise Library requires. You must
copy this into your application configuration file, as described in the section Specifying the
Runtime Target Environment and Instrumentation Levels. You must also ensure that Enterprise
Library is installed on the target computer(s) where you deploy your application.
The EventLogEventsInstaller class can install event logs only on the local computer.
Installing Windows Eventing 6.0 Functionality
The TSMMD creates a Windows Eventing 6.0 manifest file if the model defines any Windows
Eventing 6.0 events. Before your application can write event log entries, you must install the
publisher file, including the manifest, on the target system. To do this, you use the Wevtutil.exe
utility. The command you must execute on the target system is:
You will usually execute this command during the installation process for your application. The
Wevtutil utility can usually be executed only by members of the Administrators group, and must
run with elevated privileges.
The TSMMD can also generate a Windows Eventing 6.0 view that allows you to display events
from your application in a custom view of the Event Log. To create a Windows Eventing 6.0
view, right-click on the top-level entry in the Management Model Explorer window and click
Generate Windows Eventing 6.0 View. The TSMMD creates a new XML view file named [model-
name]View.xml and opens it in Visual Studio.
InstallUtil
Instrumentation\EventLogEventsInstaller\bin\Debug\EventLogEventsInstaller.dll
InstallUtil
Instrumentation\PerformanceCountersInstaller\bin\Debug\PerformanceCountersInst
aller.dll
InstallUtil
Instrumentation\WmiEventsInstaller\bin\Debug\WmiEventsInstaller.dll
InstallUtil EventLogEventsInstaller.dll /i
<configuration>
<configSections>
<section name="tsmmd.instrumentation"
type="Microsoft.Practices.DFO.Guidance
.Configuration.ApplicationHealthSection,
Microsoft.Practices.DFO.Guidance.Configuration"/>
<!-- this section included if model contains Enterprise Library Events -->
<section name="loggingConfiguration"
type="Microsoft.Practices.EnterpriseLibrary
.Logging.Configuration.LoggingSettings,
Microsoft.Practices.EnterpriseLibrary.Logging, Version=3.1.0.0,
Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" />
</configSections>
<tsmmd.instrumentation>
<!--
Attribute "targetEnvironment" can have values:
Extranet
LocalIntranet
...
... all other managed entities in model listed here ...
...
</tsmmd.instrumentation>
<!-- this section included if model contains Enterprise Library Events -->
<loggingConfiguration name="Logging Application Block" tracingEnabled="true"
defaultCategory="General"
logWarningsWhenNoCategoriesMatch="true">
...
... default logging configuration here ...
...
</loggingConfiguration>
</configuration>
Developers copy the contents of this file (excluding the <configuration> element) into their
application configuration file and edit the values as required. The <tsmmd.instrumentation>
element contains an <add> element for each managed entity in the model, identified by the
entity name. Each <add> element defines two other attributes:
• targetEnvironment. This is one of the target environments defined in the model, and
controls which of the concrete event and measure (counter) implementations the
abstract API class methods will use in the application at runtime. It defines mapping
between the target environments in the model and the concrete event and measure
implementations.
• instrumentationLevel. This indicates the level at which the instrumentation will raise
events or increment counters. Every abstract event and measure in the original model
defines a value for its Instrumentation Level property. The options are Coarse (all
operations), Fine (diagnostic and debug operations only), Debug (debug operations
only), and Off (instrumentation disabled).
The following table shows how the combination of the Instrumentation Level property of an
event and the setting of the instrumentationLevel attribute in the configuration file affects the
raising of events.
Fine Coarse No
Debug Coarse No
Debug Fine No
Off Coarse No
Off Fine No
Off Debug No
The instrumentation configuration file created by the TSMMD code generation routines contains
settings that specify the runtime target environments and instrumentation levels for the
managed entities within the application. When you deploy your application, you must copy the
contents of the instrumentation configuration file into your application configuration file and
edit it to specify the appropriate settings.
The configuration file, named InstrumentationConfiguration.config, resides with the generated
instrumentation classes in the Instrumentation folder of the TSMMD solution. It contains in the
<tsmmd.instrumentation> section an <add> element for each managed entity in your
application. This element defines the target environment within which that entity will execute,
and the granularity of the instrumentation. You must copy the contents of this file (excluding
the <configuration> element) into your application configuration file and edit the values as
required.
If you specified any Enterprise Library Log Events in your model, you must also copy the entire
<loggingConfiguration> section, and the corresponding <section> element from the
<configSections> section, into your application configuration file.
Remember that the term "target environment" refers to the capability for specifying multiple
events or performance counters for an aspect of an entity, and having the entity use a specific
one of these events or counters at runtime depending on the requirements of the application,
the execution permissions available, and the limitations of the runtime environment.
You can also specify the properties required for the management pack in the properties window
for the TSMMD project, and then have the TSMMD create the management pack automatically
when you build the project.
To change the settings for automatic System Center Operations Manager management
pack generation
1. In Solution Explorer, right-click the TSMMD project entry and click Properties to open
the project properties window in the main Visual Studio editor pane. The Settings page
of the project properties determines if the TSMMD will automatically generate a
management pack when you build the TSMMD project, and the parameters for the
management pack generation process
2. To enable automatic generation of a management pack, set the checkbox named
Enable Microsoft SCOM 2007 management pack generation at the top of the General
section.
3. Edit the default values in the text boxes below this checkbox as required. You can
specify the following properties:
◦ Management pack ID. This setting is the fully qualified identifier for the
management pack that the TSMMD will automatically generate. The default is
Application.[model-name]. The name must start with a letter or a number, and
contain only letters, numbers, periods, and underscore symbols. The total
length must less than 255 characters, and the value must be unique within the
scope of the System Center Operations Manager server to which you will import
the management pack.
◦ Management pack display name. This setting is the name for the management
pack. The default is the current management model name.
◦ Default namespace. This setting is the namespace in which the management
pack will reside. The default is Application.
◦ Output path. This setting is the full path for the generated management pack.
Click the Browse button next to the Output path text box and select the folder
where you want to create the management pack. The default is a folder named
ManagementPack within your project folder.
• Microsoft.SystemCenter.ASPNET20.2007.mp
The first two of these management packs are part of the Microsoft Windows Server 2000/2003
Internet Information Services Management Pack, which you can obtain from the Microsoft
Download Center. The third of the management packs in the previous list is provided with
System Center Operations Manager, and can be found in the %Program Files%\System Center
Operations Manager 2007\Health Service State\Management Packs folder.
Figure 2
Specifying class instances for components in the Distributed Application Designer
8. The designer will create the common dependency and roll up monitors for the
distributed application. However, you can delete some components if required; for
example, if you are creating separate environments for testing and production.
9. Click Save to create the new distributed application, or to save your changes if you are
editing an existing distributed application.
10. Unlike the original distributed application, you can modify the distributed application
afterwards if required by using the Operations Manager management console.
For details of how to edit and use a management pack for an application, see Chapter 17
"Creating and Using System Center Operations Manager 2007 Management Packs".
Appendix B
Walkthrough of the Team System
Management Model Designer Power
Tool
This topic contains a simple hands-on demonstration of the Team System Management Model
Designer Power Tool (TSMMD) that will help you understand what it does and how you can use
it.
Note that this walkthrough describes the minimum set of steps required to build a management
model and health definition, generate instrumentation, and generate a System Center
Operations Manager 2007 management pack. It does not implement good programming
practices, but it will serve as a valuable starting point for understanding the DFO process and the
TSMMD.
The process divides into discrete sections, so that you can complete as many as you want.
However, you must complete the first section if you want to generate the instrumentation code
and an Operations Manager management pack. The following are the four sections:
If you cannot see the Management Model Explorer window, click the View menu,
point to Other Windows, and click ManagementModel Explorer.
3. Ensure that the guidance packages for the TSMMD are loaded. To do this, click
Guidance Package Manager on the Visual Studio Tools menu. If the list of recipes in the
Guidance Package Manager dialog box does not contain any entries that apply to Team
System Management Model, follow these steps to enable the recipes:
◦ Click the Enable/Disable Packages button.
◦ Select the two guidance packages named Team System MMD Instrumentation
and Team System MMD Management Pack Generation.
If you do not see the two guidance packages in the list, you may need to reinstall the
TSMMD guidance package.
4. In Management Model Explorer, select the top-level item named Operations. In the
Visual Studio Properties window, change the Name property to MyTestModel, and then
enter some text for the Description and Knowledgebase properties. If you cannot see
the Properties window, press F4.
5. In Management Model Explorer, expand the Target Environments node and select the
target environment named Default. Change the value of the Event Log property to True
to indicate that you require instrumentation that writes to the Windows Event Log.
You use the properties of a target environment to specify that you require any
combination of Enterprise Library Logging events, Windows Event Log events, trace file
events, Windows Eventing 6.0 events, Windows Management Instrumentation (WMI)
events, and Windows performance counters for that target environment. You can also
add more than one target environment to a model to describe different deployment
scenarios.
The next stage is to create the graphical representation of the application entities.
To create the new management model
1. In Management Model Explorer, right-click the top-level MyTestModel entry, then click
New Managed Entity Wizard. Enter the name CustomerApplication for this entity,
select Executable Application in the drop-down list, type a description for this entity in
the Description box, as shown in Figure 1, and then click Next.
Figure 6
First page of the Add New Managed Entity wizard
Alternatively, you can right-click the top-level MyTestModel entry and then click Add
New Executable Application or you can drag an Executable Application control from
the Toolbox onto the designer surface and then edit the properties in the Properties
window.
2. On the Specify Managed Entity properties page of the wizard, make sure FilePath is
selected in the Discovery Type box, then type %Program
Files%\CustomerApplication.exe in the Discovery Target box as shown in Figure 2.
Monitoring systems such as System Center Operations Manager use the settings on this
page (which are exposed in the management pack you generate) to check whether the
application is installed on a specific target computer. Click Finish to create the new
CustomerApplication managed entity, which appears on the designer surface. The
Properties window shows the settings and values you entered in the wizard.
Figure 7
Last page of the Add New Managed Entry wizard
Some managed entity types, such as ASP.NET Application and ASP.NET Web Service,
have extender properties that specify additional settings for the management pack
generated by the TSMMD.
3. Drag an External Managed Entity control from the Toolbox onto the designer surface.
In the Properties window, change the value of the Name property to
CustomerDatabase.
The wizard does not allow you to create unmanaged entities because the only
property they have is the name. Unmanaged entities act as connectors or placeholders
for parts of the overall application or system that are outside the management scope.
4. In the Toolbox, click the Connection control, click the CustomerApplication entity, and
then click the CustomerDatabase entity. This creates the connection between the two
entities. You can edit or delete the Text property for the connection in the Visual Studio
Properties window.
5. In Management Model Explorer, expand the Managed Entities node to see the two
entities you added to the diagram. Notice that the External Managed Entity (named
CustomerDatabase) has no instrumentation or health sections. You do not create
instrumentation or health definitions for External Managed Entities. Figure 3 shows the
model at this stage.
6. On the Visual Studio File menu, click Save All.
Figure 3
The graphical representation of the application entities
The next stage is to populate the health definition and instrumentation sections of the
management model. The health model defines the health states for each entity as a series of
aspects and the indicators (the instrumentation) that causes a transition in these health states.
You can add events, measures, and aspects to the model individually and set their properties as
you develop and fine-tune your model. However, the TSMMD provides a wizard that helps you
create a new aspect and specify the associated instrumentation. When you build a complex
model, you will probably have to iterate through the process of using the wizard, and then
manually add and edit items in the graphical model as it evolves. However, the wizard makes it
easy to start adding instrumentation and health definitions to the model.
To add a health definition aspect and the associated instrumentation to the management
model
1. In Management Model Explorer, right-click the top-level MyTestModel entry, and then
click Validate All. The Visual Studio Error List window will show a warning indicating
that you must define at least one event or measure for the managed entity
(CustomerApplication).
This is a useful way to check that your model is valid as you work with it. You can also
validate individual sections of the model. For example, to check only the managed
instrumentation for this entity, right-click the Managed Instrumentation child node of
the CustomerApplication node in Management Model Explorer, and then click
Validate.
Figure 4
First page of the Add New Aspect wizard
These settings specify that you want to implement a two-state health indicator for this
aspect, which will be driven by two events—one that indicates connection failed
(RED), and one that indicates connection available or restored (GREEN). If you want to
implement instrumentation that displays a warning, you select Green-Red-Yellow and
will therefore need to specify three events. Alternatively, you can base an aspect on a
performance counter by selecting Measure instead of Event.
3. On the next page of the wizard, you specify the events for the NoDatabaseConnection
aspect. Click the ellipsis button (...) next to the Green Health State Event text box to
open the Browse Events dialog box (shown in Figure 5). The dialog box is currently
empty because your model does not define any events.
Figure 5
The Browse Events dialog box where you select an existing event or create a new event
6. In the Browse Events dialog box, click NoDatabaseConnection in the Events list and
then click OK. This adds the NoDatabaseConnection event to the Green Health State
Event box of the Add New Aspect wizard.
7. Repeat the process for the Red Health State. To do this:
◦ Click the ellipsis button (...) next to the Red Health State Event text box
9. Click Finish. The wizard creates the new aspect named NoDatabaseConnection and the
abstract event implementations NoDatabaseConnection and
DatabaseConnectionRestored. You can examine the new aspect and events in
Management Model Explorer, as shown in Figure 8.
Figure 8
The new aspect and events in Management Model Explorer
You can define parameters for events, which the instrumentation will populate and
expose to Windows event system when that event is raised. In this example, the two
events will pass the name of the database as a parameter to the events system.
Therefore, the next step is to define these parameters.
10. In Management Model Explorer, right-click the NoDatabaseConnection node, and then
click Add New Event Parameter. In the Properties window for the new parameter,
make sure the Index property is set to 1, and the Type property is set to String. Change
the Name property to DatabaseName.
11. Repeat this process for the DatabaseConnectionRestored event by adding a new
parameter and changing the Name property to DatabaseName. Figure 9 shows the
result.
Figure 9
The events and their parameters shown in Management Model Explorer
The two events you have defined are abstract events. Now you must create the
concrete implementations of these events. Each abstract event and measure must
have an implementation for every target environment in the model. In this example,
you use just the default target environment; therefore, you require only one concrete
implementation of each event.
12. In Management Model Explorer, right-click the NoDatabaseConnection node, and then
click New Event implementation Wizard. The first page of the wizard shows any
discovered (existing) events in your application and the managed implementations you
must create for each target environment (see Figure 10). There are no discovered
events in this example; and it shows only the single implementation technology you
specified when you created the model—Event Log—which is selected by default.
Figure 10
First page of the New Event Implementation Wizard
◦ Type Application in the Log Name box (if it is not already there)
◦ Select Error in the Severity list box if it is not already selected
◦ Leave the default setting in the Source box
◦ Click Finish.
Figure 11
Last page of the New Event Implementation Wizard
The value Database name: %1 in the Message Template box is a string that will be
passed to the event system; it must contain a placeholder (%1) for the event
parameter you defined when you created the abstract event definition. In general, a
Message Template string must include a placeholder for each parameter you define
for an event. The placeholders must start with a "%1" and run consecutively up to the
number of parameters you define for that abstract event.
The wizard creates the configurable implementation of the event. You can view this
event in the Management Model Explorer and see the property values you specified in
the Properties window. Figure 12 shows both of these windows at this stage.
Figure 12
The concrete implementation of the NoDatabaseConnection event
15. Repeat this process to create an Event Log Event implementation for the
DatabaseConnectionRestored event. To do this, execute the New Event
implementation Wizard, change the event name to
DatabaseConnectionRestoredEvent, change the event ID to 9001, set the severity to
Information, and enter the value Database name: %1 in the Message Template box.
If you have more than one target environment in the model, the wizard will display a
dialog box to collect information for the appropriate event or measure
implementation(s) for each target environment.
16. To confirm that you have created a valid model, right-click anywhere in Management
Model Explorer, and then click Validate All. You should see the following message in the
Visual Studio Output window:
You have now completed the simple health model definition and instrumentation definition for
the application, and you have validated the model. Figure 13 shows the model at this stage.
Figure 13
The management model showing the complete managed instrumentation definition
Of course, you will usually add more aspects to the model and specify the appropriate events
and measures (performance counters). Remember that the correct approach during application
and system design is to identify the health states and transitions first, which leads to the
definition of the instrumentation required to surface these transitions. This simple walkthrough
is designed to help you gain experience with the Team System Management Model Designer
Power Tool, so it assumes that you have previously identified the health states.
As you saw in this section of the walkthrough, the TSMMD can create the instrumentation
helper classes for an application and verify that your application actually does invoke all of the
instrumentation in the model. In other words, the application should raise every abstract event
and increment each abstract counter in at least one location in the code. Figure 1 shows the
instrumentation generated at this stage of the walkthrough.
Figure 1
The instrumentation projects, classes, and artifacts generated by the TSMMD
Figure 1
Adding a new application project to the solution
Visual Studio's IntelliSense feature will help you to enter the code quickly and easily.
8. Double-click the Connection Lost button to open the code editor with the insertion
point in the button2_Click method. Add the following line of code to the method.
C#
CustomerApplication.API.CustomerApplicationAPI.GetInstance().RaiseNoData
baseConnection("CustomerDatabase");
9. On the Visual Studio File menu, click Save All, and then close the code editor and Form1
designer windows.
10. In Management Model Explorer, right-click the top-level MyTestModel entry, and then
click Verify Instrumentation Coverage. You should see that the Visual Studio Error List
window now contains no errors or warnings because your code now invokes all the
abstract events defined in the model.
Figure 2 shows the completed test application in the Visual Studio designer.
Figure 2
The completed test application
You are now ready to run the application, but first you must configure it. The instrumentation
generation routines in the TSMMD create a configuration file that allows administrators to
specify the target environment and the granularity of the instrumentation. You must copy the
contents of this file into your application configuration file (App.config or Web.config) and edit
the contents before you run the application.
To configure and run the test application
1. In Solution Explorer, right-click the CustomerApplication project entry, point to Add,
and then click New Item.
2. In the Add New Item dialog box, select Application Configuration File, and then click
Add.
3. In Solution Explorer, double-click the InstrumentationConfiguration.config file (located
in the Instrumentation folder of the main solution) to open it into the editor. Select the
entire contents of the <configuration> element, excluding the opening and closing
<configuration> tags, and copy it into the App.config file between the opening and
closing <configuration> tags.
By default, the configuration settings you just added to the App.config file specify the
Default target environment for the CustomerApplication entity, with the
instrumentation level set to Coarse. These are the values you need. If you created
other target environments or specified different levels for instrumentation you
created, you would edit the values of the targetEnvironment and
instrumentationLevel attributes of the <add> element for each of the managed
entities in your model.
4. In Solution Explorer, right-click the CustomerApplication project entry, and then click
Set as Startup Project.
5. Press F5 to run the test application. Click the Connection Lost button to raise the
NoDatabaseConnection event, click the Database Connected button to raise the
DatabaseConnectionRestored event, and then close the test application.
6. In Control Panel, open Windows Event Viewer from the Settings item or Administrative
Tools item and view the contents of the Application log. You will see the two events
(with the Source set to MyTestModel_CustomerApplication) raised by the
instrumentation in the test application.
Notice that, although you raised the abstract events in your test application
(NoDatabaseConnection and DatabaseConnectionRestored), the settings in the application
configuration file specify that the instrumentation helpers should raise the concrete
implementations of these events that you mapped to the Default target environment (the Event
Log Events named NoDatabaseConnectionEvent and DatabaseConnectionRestoredEvent).
AverageCount64 An average counter that shows how many items are processed, on
average, during an operation. Counters of this type display a ratio of
the items processed to the number of operations completed. The ratio
is calculated by comparing the number of items processed during the
last interval to the number of operations completed during the last
interval.
Formula: (N 1 -N 0)/(B 1 -B 0), where N 1 and N 0 are performance
counter readings, and the B 1 and B 0 are their corresponding
AverageBase values. Thus, the numerator represents the numbers of
items processed during the sample interval, and the denominator
represents the number of operations completed during the sample
interval.
Counters of this type include PhysicalDisk\ Avg. Disk Bytes/Transfer.
CounterDelta32 A difference counter that shows the change in the measured attribute
between the two most recent sample intervals.
Formula: N 1 -N 0, where N 1 and N 0 are performance counter
readings.
CounterDelta64 A difference counter that shows the change in the measured attribute
between the two most recent sample intervals. It is the same as the
CounterDelta32 counter type except that is uses larger fields to
accommodate larger values.
Formula: N 1 -N 0, where N 1 and N 0 are performance counter
readings.
CounterMultiBase A base counter that indicates the number of items sampled. It is used
as the denominator in the calculations to get an average among the
items sampled when taking timings of multiple, but similar items. Used
with CounterMultiTimer, CounterMultiTimerInverse,
CounterMultiTimer100Ns, and CounterMultiTimer100NsInverse.
CounterMultiTimer A percentage counter that displays the active time of one or more
components as a percentage of the total time of the sample interval.
Because the numerator records the active time of components
operating simultaneously, the resulting percentage can exceed 100
percent.
This counter is a multitimer. Multitimers collect data from more than
one instance of a component, such as a processor or disk. This
counter type differs from CounterMultiTimer100Ns in that it measures
time in units of ticks of the system performance timer, rather than in
100 nanosecond units.
Formula: ((N 1 - N 0) / (D 1 - D 0)) x 100 / B, where N 1 and N 0 are
performance counter readings, D 1 and D 0 are their corresponding
time readings in ticks of the system performance timer, and the
variable B denotes the base count for the monitored components
(using a base counter of type CounterMultiBase). Thus, the numerator
represents the portions of the sample interval during which the
monitored components were active, and the denominator represents
the total elapsed time of the sample interval.
CounterMultiTimer100Ns A percentage counter that shows the active time of one or more
components as a percentage of the total time of the sample interval. It
measures time in 100 nanosecond (ns) units.
This counter type is a multitimer. Multitimers are designed to monitor
more than one instance of a component, such as a processor or disk.
Formula: ((N 1 - N 0) / (D 1 - D 0)) x 100 / B, where N 1 and N 0 are
performance counter readings, D 1 and D 0 are their corresponding
time readings in 100-nanosecond units, and the variable B denotes the
base count for the monitored components (using a base counter of
type CounterMultiBase). Thus, the numerator represents the portions
of the sample interval during which the monitored components were
active, and the denominator represents the total elapsed time of the
sample interval.
CounterMultiTimer100NsInverse A percentage counter that shows the active time of one or more
components as a percentage of the total time of the sample interval.
Counters of this type measure time in 100 nanosecond (ns) units.
They derive the active time by measuring the time that the
components were not active and subtracting the result from multiplying
100 percent by the number of objects monitored.
This counter type is an inverse multitimer. Multitimers are designed to
monitor more than one instance of a component, such as a processor
or disk. Inverse counters measure the time that a component is not
active and derive its active time from the measurement of inactive time
Formula: (B - ((N 1 - N 0) / (D 1 - D 0))) x 100, where the denominator
represents the total elapsed time of the sample interval, the numerator
represents the time during the interval when monitored components
were inactive, and B represents the number of components being
monitored, using a base counter of type CounterMultiBase.
CounterMultiTimerInverse A percentage counter that shows the active time of one or more
components as a percentage of the total time of the sample interval. It
derives the active time by measuring the time that the components
were not active and subtracting the result from 100 percent by the
number of objects monitored.
This counter type is an inverse multitimer. Multitimers monitor more
than one instance of a component, such as a processor or disk.
Inverse counters measure the time that a component is not active and
derive its active time from that measurement.
This counter differs from CounterMultiTimer100NsInverse in that it
measures time in units of ticks of the system performance timer, rather
than in 100 nanosecond units.
Formula: (B- ((N 1 - N 0) / (D 1 - D 0))) x 100, where the denominator
represents the total elapsed time of the sample interval, the numerator
represents the time during the interval when monitored components
were inactive, and B represents the number of components being
monitored, using a base counter of type CounterMultiBase.
CounterTimer A percentage counter that shows the average time that a component
is active as a percentage of the total sample time.
Formula: (N 1 - N 0) / (D 1 - D 0), where N 1 and N 0 are performance
counter readings, and D 1 and D 0 are their corresponding time
readings. Thus, the numerator represents the portions of the sample
interval during which the monitored components were active, and the
denominator represents the total elapsed time of the sample interval.
ElapsedTime A difference timer that shows the total time between when the
component or process started and the time when this value is
calculated.
Formula: (D 0 - N 0) / F, where D 0 represents the current time, N 0
represents the time the object was started, and F represents the
number of time units that elapse in one second. The value of F is
factored into the equation so that the result can be displayed in
seconds.
Counters of this type include System\ System Up Time.
RawBase A base counter that stores the denominator of a counter that presents
a general arithmetic fraction. Check that this value is greater than zero
before using it as the denominator in a RawFraction value calculation.
SampleBase A base counter that stores the number of sampling interrupts taken
and is used as a denominator in the sampling fraction. The sampling
fraction is the number of samples that were 1 (or true) for a sample
interrupt. Check that this value is greater than zero before using it as
the denominator in a calculation of SampleCounter or
SampleFraction.
SampleFraction A percentage counter that shows the average ratio of hits to all
operations during the last two sample intervals.
Formula: ((N 1 - N 0) / (D 1 - D 0)) x 100, where the numerator
represents the number of successful operations during the last sample
interval, and the denominator represents the change in the number of
all operations (of the type measured) completed during the sample
interval, using counters of type SampleBase.
Counters of this type include Cache\Pin Read Hits %.