Вы находитесь на странице: 1из 218

XXXXX (XX)

IT Operational Process

Capacity Management Process Guide


XXXXX Information Technology
IT Capacity Management Procedure

Preface
The purpose of this document is to outline the procedures that govern the Capacity
Management process in XXXXX (XX) Information Technology. This document
describes how to use the process and provides a definition of the management controls
required and some rationale for instituting those controls.

XXXXX Page 2
IT Capacity Management Procedure

Table of Contents
Preface ................................................................................................................. 2
Table of Contents................................................................................................ 3
Introduction ......................................................................................................... 5
Section 1.1 -- Capacity Management Activities and Process ...................................... 5
Section 1.2 -- A Capacity Planner's View of the xxx IT Environment ......................... 12
Section 1.3 -- Approach to Capacity Management.................................................... 14
Gathering & Forecasting "New" Application Resource Requirements ........ 19
2.1 - Capacity management interface to the Development Life Cycle ....................... 21
2.2 Techniques For Data Collection And Analysis .................................................... 24
2.3 Definition Phase ................................................................................................. 28
2.4 Requirements Phase .......................................................................................... 36
2.5 Design Phase ..................................................................................................... 40
2.6 Development and Testing Phase ........................................................................ 48
2.7 Installation/Transition Phase ............................................................................... 55
Gathering and Forecasting "Existing" Application and Overall Resource
Requirements .................................................................................................... 58
3.1. Activities Performed As Required ...................................................................... 60
3.2 Activities Performed Quarterly ............................................................................ 62
3.3 Activities Performed Semi-Annually .................................................................... 66
3.4 Activities To Perform Annually ............................................................................ 69
Producing Capacity Management Reports ..................................................... 72
Section 4.1 -- Designing and Documenting Capacity Reports .................................. 75
Section 4.2 -- Producing an Annual Capacity Plan ................................................... 86
Section 4.3 -- Producing Periodic Capacity Forecast Reports .................................. 92
Section 4.4 -- Producing Periodic Capacity Status Reports ...................................... 94
Section 5 -- Developing and Maintaining ........................................................ 98
Procedures, Tools, Techniques, and Standards ............................................ 99
Section 5.1 -- Data Management ............................................................................ 101
Section 5.2 -- Terminology and Metrics .................................................................. 118
Section 5.3 -- xxx IT Resource Model ..................................................................... 130
Section 5.4 -- Workloads and Traffic Types ............................................................ 138
Section 5.5 -- Network Connection Types............................................................... 144

XXXXX Page 3
IT Capacity Management Procedure

Section 5.6 -- Network Tools and Metrics ............................................................... 147


Analysis and Forecasting Techniques .......................................................... 156
Section 6.1 -- Forecasting Overview and Terminology........................................... 157
Section 6.2 -- Baseline Creation ........................................................................... 164
Section 6.3 -- Analysis Techniques......................................................................... 173
Section 6.4 -- Relative I/O Content (RIOC) Metric Analysis .................................. 180
Section 6.5 -- Cluster Analysis for Grouping Similar Applications ......................... 182
Section 6.6 -- Business-driven Forecasting Techniques ......................................... 185
Section 6.7 -- Additional Analysis and Forecasting Techniques .............................. 190
Section 6.8 -- Application of Techniques ................................................................ 199

XXXXX Page 4
IT Capacity Management Procedure

Introduction
This section introduces the host and network capacity planner to several key concepts and
constructs that are used throughout the rest of the methodology. Several of these
concepts will be entirely new to the intended reader while others will present a different
view of either the network or the host than the reader previously envisioned. In any case,
these concepts, definitions, and approaches to the xxx host and network capacity
management environment are the ones that lay the foundation for the rest of the material.

The following topics are described in this section:

1.1 Capacity Management Activities and Process


1.2 Capacity Planner's View of xxx IT Resources
1.3 Approaches to Capacity Management

Section 1.1 -- Capacity Management Activities and


Process
There is a consensus among capacity management practitioners in the nation's largest
firms about the core set of activities which are included within the capacity management
process -- whether host-based or network-based. These activities have been summarized
in Table 1.1 along with a reference to where the activities are discussed. All references
are to Sections in this manual except for the last activity where further information can be
found in a separate document entitled "Managing the Capacity Management Process".

XXXXX Page 5
IT Capacity Management Procedure

Common Capacity Management Activities Reference


Prepare for Capacity Management
v Establish and sustain communication and Sections 1, 5, 6
capacity management process buy-in
v Understand service requirements and define
service objectives and capacity thresholds
v Determine reporting requirements - purpose,
standards, frequency, etc.
v Determine data collection requirements -
workloads, metrics, tools
v Understand forecasting techniques and tools
Collect/validate information Sections 2 and 3
v System-generated usage and traffic data
v Application data - operating characteristics,
peaks, business and network transactions
v Business information - business drivers and
growth
v Environmental changes - changes to
hardware, protocols, architecture,
performance characteristics
Analyze information collected Sections 2 and 3
Forecast resources Sections 2 and 3
v Project and provide alternatives by workload
v Project and provide alternatives of aggregate
workloads
Produce periodic capacity management Section 4
reports
v Produce baseline report
v Produce capacity status reports - e.g.,
baseline, actual vs. projection variances
v Produce capacity forecast report
v Produce the annual capacity plan
Manage the process to ensure effective "Managing the Capacity Management Process"
communications, feedback, and quality
v Measure effectiveness and efficiency
indicators
v Regularly validate the effectiveness of reports
v Regularly validate input accuracy

Table 1.1. Capacity Management Activities and Manual References

XXXXX Page 6
IT Capacity Management Procedure

Within this list of activities, it becomes apparent that there is a distinction between
activities which are associated with capacity planning and those associated with
performance management or service level management. For this integrated methodology,
capacity management and performance management are uniquely defined:

Definition: Capacity Management is the process of planning and reporting on


future IT resource requirements in a cost-effective manner to meet the business
and service needs of xxx and yyy customers.
Definition: Performance Management is the process of planning and
managing day-to-day workloads to achieve response and throughput
requirements.

This clear delineation of each process and the identification of linkages to each process
helps organizations focus on achieving their processing objectives and measuring results.
This does not mean that a person performing capacity management activities would not
provide assistance to someone executing some performance management activities;
however, it does mean that the roles and responsibilities for each process are well
documented and understood. Figure 1.1 shows the other xxx systems management
processes that interface to the capacity management process. This is referred to as an
"Include/Exclude" chart for the capacity management process.

Each of the processes identified in Figure 1.1 has linkages to the xxx capacity
management processes. A linkage from capacity management to any one of these
processes means that the linked process is either a customer of the capacity management
process or a supplier to the capacity management process. A customer of the process
can be an individual, department, or process which has requirements for capacity
management. An example of a requirement would be for specific capacity-related
reports designed to support a decision that a customer has to make. A supplier to the xxx
capacity management process could be an individual, a department, a system, or a process
which provides input into the capacity management process based upon specific
requirements. An example of a supplier would be an business application owner
supplying business growth forecasts to the capacity planner. In this case, the business
application owner is both a supplier and a customer of the xxx capacity management
process.

Another perspective on defining the boundaries between what is done within the xxx
capacity management process and what is done by other xxx processes is by looking at
the information either sent or received by the capacity management process relative to the
linked processes. This is shown in Figure 1.2 for the major linkages to the xxx capacity
management process.

Figure 1.2 clearly defines what the relationship is between capacity management and
other systems management processes. This level of detail is needed to provide more rigor
into the capacity management process and clarify roles and responsibilities that lead to
the successful implementation of the process.

XXXXX Page 7
IT Capacity Management Procedure

Figure 1.1. "Include/Exclude" chart of process linkages between xxx capacity


management and other xxx I/S processes.

Figure 1.2. "Input/Output" diagram for the xxx capacity management process.

XXXXX Page 8
IT Capacity Management Procedure

The newly designed and implemented xxx host-based capacity management process was
defined with five sub processes that are summarized in Table 1.2. These sub processes
are mapped to the commonly understood activities of capacity management which were
described earlier and are the organizing principle for the document.

Sub-Process General Capacity Management Reference


Activities
1. Prepare for capacity 0 Prepare for Capacity Sections 1, 5, 6
management Management
2. Gather and forecast "new" 1 Collect/validate Sections 2 and 3
application resource information
requirements
2 Analyze information
collected
3 Forecast resources
3. Gather and forecast "existing" 4 Collect/validate Sections 2 and 3
application and environmental information
resource requirements
5 Analyze information
collected
6 Forecast resources
4. Produce capacity 7 Produce various periodic Sections 4
management reports reports
5. Manage the process to 8 Foster communications, "Managing the Capacity
ensure effective feedback, and quality Management Process"
communications, feedback,
and quality 9

Table 1.2. Cross Reference table for the xxx capacity management sub-processes.

So far, we have defined the xxx capacity management process by suppliers and
customers, activities, and inputs and outputs. Putting all of this together to create a
process diagram we arrive at Figure 1.3 which describes the new, integrated xxx capacity
management process for the host and network platforms.

Figure 1.3 summarizes the key relationships for the integrated process. Each of these five
subprocesses could be documented in the same fashion, except that the box which now
contains subprocesses as seen in Figure 1.3 will contain activities for that subprocess.
This would provide a high-level view of all subprocesses. Since the development of this
document was at a procedural level, creating additional diagrams for these subprocesses
was not deemed of high importance at this time.

XXXXX Page 9
IT Capacity Management Procedure

Figure 1.3. Simplified xxx Capacity Management Process Diagram

But, there is more to the process than just the diagram. As with the design of any new
process, there are design objectives and critical success factors which provide the
guidelines for process design. In developing the host-based xxx capacity management
process, a set of process requirements were obtained from xxx management. These
process requirements are:

1. Realistic and "accurate" forecasts of service demands and resource requirements for new
applications, major application changes, system transitions and migrations.
2. Realistic and "accurate" multi-year equipment budget and cost projections
3. Management reports designed to support capacity-related decisions
4. Timely and specific feedback to the business owners on the accuracy and timeliness of their
requirements estimates.

These process requirements guided the team's efforts and helped to direct the design
group to develop a business-driven process that focused on meeting xxx customer
requirements. It is described as being a business-driven process for at least two main
reasons:

1. Customer requirements for business decision-making play a key role in determining what
information is collected, analyzed, and presenting to the customer.
2. I/T resource usage, reporting, and forecasting are aligned to the business.

XXXXX Page 10
IT Capacity Management Procedure

Additionally, the importance of a set of rigorous data and requirements gathering


procedures for the capacity planner became very evident. Most capacity forecasting
efforts don't fail because of poor modeling techniques; they fail because either faulty or
inadequate data was used as input into the model or an erroneous assumption about the
business application environment went unchallenged. This led the team to spent a
considerable amount of time designing questions and forms which could help the capacity
planner develop a vision and plan for what data needed to be collected, when it
should be collected and to what level of detail.

While valid, accurate, and timely data from xxx customers is an important critical success
factor that is addressed in this methodology, several other critical success factors (CSFs
are those things which have to go right in order for the process to be effective and
successful) were identified for the xxx capacity management process:

1. Accurate and timely application growth projections


2. Accurate and timely change management information for significant system and application
changes and transitions
3. Provision of specific application information by application developers and business owners
throughout all phases of the application development life cycle.
4. Timely and useful feedback to application developers and business owners on the quality of
information received in support of the capacity management process.
5. Quantification of relationship between business volumes and system requirements.
6. Effective, ongoing communications with critical application developers and business owners.

Under girding these six factors is the importance of the capacity planner's personal,
ongoing communications and relationships with application developers and business
owners. As is the case with many processes, its success is directly related to the personal
commitment of process participants to the process and its goals.

Reference: The whole subject of process concepts and process management is


covered in another document, Managing the Capacity Management Process.

The combination of requirements for a business-driven process and the need for strong,
on-going communications between capacity planner and xxx customer led to the use of
business process management techniques for documenting the capacity management
process (described in the Managing the Capacity Management Process document).
These techniques also include an approach to managing a cross-functional process.

XXXXX Page 11
IT Capacity Management Procedure

Section 1.2 -- A Capacity Planner's View of the xxx IT


Environment
The integration of host and network capacity management into a single xxx-wide process
requires the establishment of a logical view of the xxx IT environment that encompasses
all of its IT components and links them with the business and service requirements of its
customers. This common objective of meeting the service requirements of xxx customers
provides a natural link between host and network views by showing that all IT activity is
driven by the needs of the business.

This logical view of IT is presented in Figure 1.5 as being composed of three layers: (1)
Business/Application, (2) Access, and (3) Backbone. Each of these layers forms a vital
role in delivering service to the client. From a capacity planning perspective, the
Business/Application layer generates and receives work to process. From there, the
Access layer connects the user to non-local IT processing resources either through other
Access points or through the Backbone layer. Once data is received at the Backbone
layer, it is transported at high speeds to a matching Backbone receiving point, passed
along to the Access layer, and processed at the intended Business attachment layer.

Physically, the Business/Application layer has several components. Here are a few
examples:

0 xxx LPAR running the applications, including the bbb application on the host.
1 DASD
2 Host workloads and applications
3 Server workloads and applications
4 Token-rings supporting ccs, ddd, eee, or xxx.
5 Workstations at ccs, ddd, eee, or xxx.
6 Terminal equipment at the ccs, ddd, eee, or xxx
7 fff Systems
8 Printers
9 ggg
10 Tape drives
11 hhh hosts

Physically, the Access layer has several components. Here are a few examples:
0 iiis (3745s), including hhh FEPs
1 All 3745s in aaa
2 Routers
3 Bridges
4 bbb lines, channel banks, modems, matrix switches, encryptors
5 Channel Extenders
6 kkk (kkks)
7 Host software such as MVS, IMS, VTAM, and CICS

XXXXX Page 12
IT Capacity Management Procedure

Figure 1.5. xxx IT layers from a capacity planner's perspective

Physically, the Backbone has several components. Here are a few examples:

0 Bandwidth Managers (BMs) and their associated kkks


1 CSUs
2 Data Encryption devices
3 T1, fractional T1, and T3 links

At this point, an analogy is appropriate to convey this multi-layered concept. Consider


the electronic shopper who dials into an electronic shopper network to acquire a product.
His Business/Application components include the shopper, his computer system, the
application providing support for the electronic connection, the modem, and the wire
connecting the system to the phone jack. The Access layer begins with the connection of
the phone jack at his home and into the local telephone system (overly simplified since I
am eliminating Telco's NAU at the customer establishment) of wires and switching
circuits. At Telco's Central office, the connection is passed onto a Backbone layer which
can be considered to be the long distance carrier providing the high-speed trunk line to
the appropriate Central Office to finally connect this shopper's computer system to the
remotely located electronic shopping center's computer system.
In this analogy, the electronic shopper goes through these layers in a transparent fashion
without any awareness to the IT components he is using or the communication path he
has taken. All the user knows about is that he was able to conduct his business through
the technology available. This is similar to the business user of the xxx systems who
doesn't need to be aware of the technology which enabled him or her to conduct his
business.

XXXXX Page 13
IT Capacity Management Procedure

Reference: For a complete and detailed discussion of the layer concept, its
components, and its purpose please see Section 5.3, xxx IT Resource model.

Section 1.3 -- Approach to Capacity Management


The layer concept works well for the capacity planner in visualizing what information is
needed for resource planning. The business/application layer is clearly driven by the
business activity since the load on the business/application layer components can be
directly correlated with this business activity. This overall business load translates into a
traffic load for the Access layer. However, the relationship between messages being
handled and the originating business transaction becomes blurred at this layer and so a
different view of the Access layer traffic needs to be taken. This new view of the Access
layer traffic focuses on network traffic types and is more location-sensitive rather than
business-sensitive. The amount of traffic going into and out of a location (or access
point) is one of the primary determinants of what capacity is required to service that
location.

This new model of the xxx IT environment allows for an approach to capacity
management that aligns the host and network resource planning efforts with the business.
Figure 1.6 shows a simplified flow of information in the development of a capacity
forecast.

Figure 1.6. Flow of information involved in forecasting.

However, the oversimplified view portrayed in Figure 1.6 begs an obvious question: how
do I gather the information required in each box in an organized fashion to develop the
host and network resource forecasts? The approach in this methodology is to use a series
of forms to thoroughly organize and document the data required. These forms help the
capacity planner to characterize the business workload and the network traffic in a

XXXXX Page 14
IT Capacity Management Procedure

consistent way which facilitates the forecasting process and fosters good, clear
communications with xxx' customers.

XXXXX Page 15
IT Capacity Management Procedure

Figure 1.7. Relationship between business and system model in the production of a
capacity plan.

Figure 1.7 presents an overall perspective of the two basic models that are used to
produce the capacity plan. The first model is the business model. The business model
receives business requirements and translates them into resource demands which feed the
system model. The system model produces estimated load information along with
alternative configuration data.

This capacity management methodology focuses on the business model to insure that the
process has the correct forecast information on estimated loads which in turn on one of
the primary inputs to the capacity plan. The forms associated with Sections 2 and 3 help
to organize the information to be used in the business model.

Figure 1.8. Structure of host and network capacity planning tables.

XXXXX Page 16
IT Capacity Management Procedure

Figure 1.7 describes the three main types of characterization data that are needed to
support the flow of information into the system model for developing an IT resource
forecast, as pictorialized in Figure 1.6. The organizational scheme portrayed in Figure
1.6 is a simplified way of viewing some of the more important data collection forms.

This scheme ties together all four main views of xxx IT resources: Business application,
User, Location, and Component. Location Characterization tables focus on the
business, the end-user, and the end-user's network connections to the host or LAN server.
Network Characterization tables focus on organizing the information about the current
network resources, the type of traffic that they transport, and the tools/metrics needed to
measure the volume of traffic. Host & Applications Characterization tables describe
the host or LAN server applications and workloads, the way the run on the host or server,
and the tools and metrics needed to measure their load and performance behavior.

XXXXX Page 17
IT Capacity Management Procedure

XXXXX Page 18
IT Capacity Management Procedure

Gathering & Forecasting "New" Application


Resource Requirements

This section focuses on the capacity planner's involvement during the development life
cycle. This applies to any new service, whether it is developed in-house or purchased. If a
project is managed in phases, from definition and requirements through development,
test, and installation, it provides the points at which a capacity planner can get involved to
ensure resource requirements are accurate and future configuration alternatives are
modeled.

It is not assumed that all projects will involve or have involved capacity planners at all
phases of development. But, it does expect critical production workloads to be
decomposed into business and system/network constituents as described on the
Production Phase forms in support of the business-driven capacity management process
model. Thus, actions to complete pertinent information on previous phase forms should
be initiated as required.

Throughout the data collection activities, the 80-20 rule is applied. Only 20% of the work
will be responsible for 80% of any usage changes. Thus, it is not necessary to track all
projects. Likewise, when examining an application's transactions, look only at the big
hitters: the 20% or so that consume 80% of the resources either now or in the future. The
same rule applies for workload decomposition, i.e., only the significant traffic flows for a
workload need to be considered when determining the expected use of network
components.

This document will discuss the process for collecting information, analyzing it, and
determining resource needs throughout the development life of an application or
workload. Its purpose is not to define a development or maintenance life cycle
methodology, nor to document activities comprising such a methodology. However, it will
examine the linkage to these typical life cycle methodologies from a capacity planner's
viewpoint and will document activities required by the capacity planner.

Key Point: The term "capacity planner" refers to the lead individual who has the
responsibility for gathering the necessary business information, translating it into
system/network demands, and projecting or forecasting resource requirements. Skill
requirements are effective communications, both verbal and written, in addition to a
broad knowledge of the system and network. Initially a team may be engaged to gather
business information to ensure both system and networking technical depth is available.
The capacity planner must have a "management" discipline to ensure a project is
established and managed on an ongoing basis for all critical applications or user
groups.

Each phase of development will be individually addressed. Each will begin with a set of
activities that need to be performed to produce and document application requirements
and forecasts for that specific application or workload. Highlighted will be the activities
typically included in the capacity management process. When the application goes into
production, the decomposition and information collection is covered under "existing"

XXXXX Page 19
IT Capacity Management Procedure

applications in Section 3. Section 3 contains all activities to be performed for the


production environment, including the aggregation of "new" , "existing", and environmental
changes into an overall forecast.

For each Capacity Management activity, the specific tasks, inputs, outputs, and control
mechanisms will be discussed. Together, they become the process for obtaining
requirements for new applications being purchased or developed in-house.

To assist in determining which of the forms in Appendix C to use and to ensure all
the necessary activities are performed, the following are included in the Appendix:

v A quick index of the forms and their use throughout the phases
of development and after going production is provided in Appendix
A.

v A checklist of which forms to complete and what questions need


to be asked to collect the desired information for a particular phase
is provided in Appendix B.

v Capacity planner activities for each phase are summarized in


Appendix D.

Samples of completed forms are included in Appendix F as a reference. Version 2


added these forms to illustrate network capacity planning. However, the methodology
samples were completed before a pilot was performed. Therefore, the application is
fictitious and only information relevant to network capacity planning is illustrated for the
sample application. One of each form is provided. The phase it represents may be
different. The intent is to provide an example of the data that should be entered, not a
case study to demonstrate the methodology.

XXXXX Page 20
IT Capacity Management Procedure

2.1 - Capacity management interface to the Development


Life Cycle
Regardless of the existence of formal methodologies for managing the development of
applications, capacity planners need to be involved throughout a project to continually
assess the impact to system resources and to forecast resource requirements well in
advance of actual need.
The formality and enforcement of a single systems development life cycle methodology
varies throughout the data processing industry. However, all methodologies, formal or
informal, consist of phases similar to the following:

v Definition Phase - The major activity that is performed in this phase is a


study describing the application and determining its feasibility. Cost benefits
and risk assessment are key outputs. These, along with the identification of
the target information technology environment, application functionality,
service expectations, and business volumes provide the necessary
information to capacity planners to build initial estimates and models for the
proposed application. Identifying comparable applications or workloads is a
forecasting technique commonly utilized during this early phase where limited
real data may exist. Forecasting inaccuracies are highest during this phase.

v Requirements Phase - In this phase, functional requirements are provided in


more detail allowing the capacity planner to build the first baseline model for
performance and capacity. More useful information becomes available and
forecasting accuracy increases a notch.

v Design Phase - Outputs during this phase may consist first of conceptual, or
external, design specifications, followed by detailed, or internal, design
specifications. During this phase, capacity planners need to translate business
workloads and volumes into application and system/network workloads and
demands. Service expectations need to be reexamined and designs adjusted
to ensure satisfaction of requirements. Also during this phase, performance
and capacity testing scenarios need to be planned. The accuracy of forecasts
continue to improve with the additional design information.

v Development and Testing Phase - Testing against defined performance


expectations can expose problems before they become embarrassing and
allow timely changes to prior resource estimates. Capacity planners and
application test staff should be on the alert for significant variations from
expectations. These could be in the form of transactions spawning more
physical I/Os than expected, use of resources not originally expected, etc.
Although some performance and capacity estimate validation can be
performed during early test stages, i.e., unit and component testing, it is during
integration and stress testing that validations and variances are most likely
realized. These are the test stages when specific performance testing should

XXXXX Page 21
IT Capacity Management Procedure

be done. Detailed capacity and performance measurements provide


information that can greatly improve forecasting accuracy.

XXXXX Page 22
IT Capacity Management Procedure

v Installation/Transition Phase - This phase begins with installation planning


and includes the periods of transitions until production status is declared. A
phased production implementation, where the initial implementation is
sometimes called a pilot, provides important performance and capacity
information for quick adjustments to oversights in the application workload
assumptions. In this document, these pilot production implementation phases
will be addressed as part of this phase. This phase ends and actions related
to production applications begins when non-pilot production status is declared.

An information technology project should be managed via a consistent Development Life


Cycle Methodology. The capacity planner has the responsibility for maintaining current
application resource requirements throughout the development life cycle and forecasting
overall resource needs. The development life cycle is defined as beginning with a
request for service (project initiation) and ending when the service is declared to be
running in production status (project end). The scope of the projects included are:
purchased applications; applications designed and developed in-house; and any other
change requiring formal or informal project management.

Throughout the development phases, capacity-related information is available to the


capacity planner. What needs to be collected and how to use this information to
refine resource forecasts is the subject of this document. The capacity planner must
get involved at least once during each phase to refine individual application requirements
and forecasts, and, equally important, to maintain open communications with application
staff and information technology clients. Early involvement also allows sufficient time to
explore unclear workload characteristics through prototyping or simple benchmarking.

Usually it is the Business Owner or Application Project Manager who initiates the inclusion
of capacity planners to help determine the resource requirements and forecast. When to
involve a capacity planner and for which projects is dependent on decisions normally
performed outside the capacity process. The Definition Phase in Section 2.3 suggests
criteria that should be established as an initial standard for notifying capacity planners
when significant or complex resource forecasting can be expected. Capacity planners
should be involved by at least the early part of the Requirements Phase in order to
determine the depth of forecasting analysis, the techniques, and the
monitoring/prototyping/benchmarking activities needed to ensure best possible resource
forecasting throughout the development project. Both capacity planners and performance
analysts should be involved in this phase for high exposure applications.

XXXXX Page 23
IT Capacity Management Procedure

2.2 Techniques For Data Collection And Analysis

During the early phases of development, little technical information is available. It would
be meaningless to ask for information about which IMS transactions will be used or actual
network traffic sizes. Techniques generally used during the Definition and Requirements
Phases usually focus on either or both of the following techniques: (1) comparing the
proposed application to existing ones and (2) the decomposition of the application
into business drivers and transactions. Both techniques are of value and can be used
as a cross-check of estimates. Section 6.6 discusses the technique of decomposing a
workload into business drivers and business transactions, as well as the further
decomposition into DP transactions. Figure 2.2.1 illustrates this technique for the ZZZ
application.

Figure 2.2.1. Decomposition of ZZZ into Business and DP Transactions

During the Design Phase, business transactions also get decomposed into more
detailed network traffic characterization and demands. A correlation of measured usage
to a business transaction and business drivers is important in order to better relate
resource usage and costs to the business. Unfortunately , particularly in the planning of
network resources, not all usage of resources can be easily attributed back to an
application or business user without the use of detailed monitoring that is expensive in
terms of cost and/or resources. Various forecasting techniques will need to applied as
appropriate. These techniques are documented in Section 6, Analysis and Forecasting
Techniques. Section 5, Developing and Maintaining Procedures, Tools, Techniques, and
Standards, also provides necessary information about the various ways to view
information and, therefore, collect and organize it.

Three terms used in the xxx business-driven capacity management process are worthy of
being repeated; their understanding is imperative before continuing:

XXXXX Page 24
IT Capacity Management Procedure

Definition: A Business Driver is an element of the business that drives the


need for I/T resources; e.g., for ZZZ, the drivers are items and files.

Definition: A Business Transaction is a specific business function


accomplished by an application or the end user; e.g., for ZZZ business
transactions are business functions such as ZZ-Edit and ZZ-Distribution.

Definition: A DP Transaction is a unit of work as seen by the subsystems


servicing the application and user; e.g., IMS transaction, batch job step or job ,
etc. These can usually be measured directly by current monitoring tools.

Business drivers affect the number of business transactions to be executed. Business


transactions translate into DP transactions, which in turn demand CPU, DASD, and
network resources. Network traffic is also a result of executing business and DP
transactions, but the network demands are best described in terms of traffic flows
decomposed into characters/sec or bits/sec across the network components as illustrated
in Figure 2.2.2. The resulting system and networking metrics that measure the resource
loads are described in Section 5.2.

It is during the Development and Testing Phase that system-generated measurements of


the key metrics become possible. This is the point at which hypotheses about how the
system will perform are tested. Up to that point, the collection of information focused on
decomposing the business metrics into expected DP metrics and descriptions of business
and service requirements. The refinement of estimates utilizing system-generated
measurements of key work should improve the accuracy of estimates significantly. In
most cases the technique or tool used will be driven by transaction volumes and/or
expected I/O events. For example, the VENDOR tool can be used to describe the logical
I/Os and their frequency patterns and then project CPU and DASD resource
requirements. In some cases, such as network traffic across channel extenders
(classified as the channel extender traffic type), the relationship between processor
metrics (I/Os) and network metrics (messages) becomes obvious.

Figure 2.2.2. Business-To-Network Traffic Decomposition

XXXXX Page 25
IT Capacity Management Procedure

XXXXX Page 26
IT Capacity Management Procedure

Capacity planning for the network requires additional decomposition steps to describe the
information flowing over the network. Also, the information begins to lose its affinity with
the original business transaction and DP transaction due to the inadequacy of monitoring
tools and programming techniques to retain a transaction's identify from/to the application.

Analysis of information collected on forms xxxCP02 through xxxCP09 is specific to an


individual application/workload. xxxCP10 and xxxCP17 provide a network-wide
view of resource demands and views.

Figure 2.2.3 illustrates the two key forms (xxxCP08 and xxxCP09) that drive more
accurate forecasting of individual "new" applications.

Figure 2.2.3 also illustrates the two key forms (xxxCP10 and xxxCP17) that drive the
aggregation of individual workloads or measure system-wide metrics in preparation for
development or update to the overall Capacity Plan . Aggregation of individual workloads
and overall system-wide forecasting is covered in Section 3, Gathering and Forecasting
"Existing" Application and Overall Resource Requirements. Also, Section 3 is where the
periodic resource reports get created for reporting usage. These will permit the resource
managers to track exceptions to projections and to present the resource usage in the
view most appropriate for the designated recipient.

Section 2 will focus ONLY on the collection and analysis and reporting of individual
application/workloads, and ONLY for applications/workloads still in development.

Regardless of the type of information collected or its completeness, analysis and


forecasts are still necessary. One cannot wait until xxxCP08 and xxxCP09 get completed
in the Design and Development & Test phases. The methods for analyzing the
information and producing the forecasts will vary from phase to phase. Analysis and
resource projections can be performed via simple paper-and-pencil techniques, via linear
or analytical modeling tools, or even simulation and benchmarks. The technique selected
depends on the accuracy (risk of being wrong -- usually based on cost impact to the
business), the time available to do estimates, and the granularity of the information itself.

XXXXX Page 27
IT Capacity Management Procedure

Figure 2.2.3. Use of Forms for "New: Application Forecasts and the Development of the Overall
Capacity Plan
2.3 Definition Phase

The key activities to be performed during the Definition Phase are listed in Table 2.3.1.
The activities performed by the Business Owner are not part of the Capacity Management
process. They are activities usually included in a systems development life cycle
methodology and therefore part of the Application Development process. Because these
activities are critical to the initiation and success of the Capacity Management process,
they are also listed. It is imperative that the Application Development process monitor
these activities continuously to ensure its process is effectively and efficiently providing
the input and support to the Capacity Management process.

ID Activities Primary Responsibility


- Identify and schedule development phases Business Owner
- Determine and initiate capacity planner involvement Business Owner
1 Establish, maintain, and promote capacity planner Capacity Management
involvement standards
2 Initiate application resource forecast tracking - form Capacity Management
xxxCP01
3 Review xxxCP11 for system and environmental changes Capacity Management
that may impact the resource estimates. Collect
information - forms xxxCP02, xxxCP03, and xxxCP04.
4 Assess resource alternatives and project required Capacity Management
resources - form xxxCP15
5 Deliver recommendation summary package - form Capacity Management
xxxCP16
- Determine feasibility and approval Business Owner

Table 2.3.1. Definition Phase Activities

A summary of the steps to perform and key questions to ask to get the desired
information is provided in Appendix B . This also serves as a checklist to ensure all
activities are completed. For a quick reference to the different forms and their use
throughout the phases of development and production, see Appendix A. The table on
page 2 provides an index or key to which forms need to be completed, when, and where
to look for details.

XXXXX Page 28
IT Capacity Management Procedure

Activity 1 - Establish, maintain, and promote capacity planner involvement


standards

In general, it is not justifiable to require capacity planning involvement in all projects. Most
projects are small in nature, i.e., of little impact or little cost. xxx has existing standards for
assessing new application risk and will utilize some assessment mechanism like the one
that follows to trigger involvement of capacity planners. This same mechanism should
document which phases are mandatory for capacity planning involvement.

A key step in project definition is to perform a technical assessment. This activity is


initiated via the Development Life Cycle Methodology used within the Application
Development Process and is defined as a review of the technical aspects of the
application to determine the impact on existing information technology resources.
Although the performers of the technical assessment may be the xxx capacity planners or
performance staff, the Application Development Process must first determine the need to
notify capacity planners and then define the extent to which they need to be involved.
Some standard for determining capacity planner involvement must be part of the
Application Development Process and may take the form of questions where an answer
of "yes" to any question will constitute continuous involvement. The following questions
are a starter set:

v Is this a complete replacement of another application?


v Are the number of users impacted > 100?
v Are the expected costs > $500,000
v Is more than one location impacted?
v Are new technology or state-of-the-art techniques being planned?
v Is backup/recovery or availability a major concern?
v Are networking requirements unknown?

Management will continually review the criteria that will trigger capacity planning
involvement, and revise those criteria as dictated by necessity and experience.

Activity 2 - Initiate application resource forecast tracking - form xxxCP01

Precede the first meeting with a phone call to determine if any documentation is available
for review. If so, review the documentation and complete as much of the form as possible
before the first meeting. It is not necessary to transpose all the information from supplied
documentation onto the forms; simply refer to the document provided. However, in some
cases, it may be more constructive to summarize information from other sources onto the
forms. Document the application and project description. Review the technical
assessment and denote the interface points deemed necessary to ensure accurate
forecasts and timely delivery of resources. Also review the project schedule and complete
the Scheduled Phase Begin and End date fields for the interface points. Document who
will or has provided the input and who will receive the output for each phase known. Since
it is normal not to have all the information this early in the project, note on the "Action
Items and Tracking" part of the form any follow-up actions needed, along with

XXXXX Page 29
IT Capacity Management Procedure

responsibilities and due dates. This form needs to be kept up to date to reflect current
application and resource forecast status.

Activity 3 - Review xxxCP11 for system and environmental changes that may
impact the resource estimates. Collect information - forms
xxxCP02, xxxCP03, and xxxCP04.

Review xxxCP11

xxxCP11 documents current environmental and business changes that have a potentially
significant impact on I/T resources. It should be reviewed to determine the impact of any
changes on this development effort and, if appropriate, updated. For example, a
technology change, such as the replacement of compression algorithms for DASD or
network traffic, may alter a specific or all applications' resource demands and
requirements.

Complete xxxCP02

A description of expected business drivers and transactions (xxxCP02, page 1) should


be obtained in this phase. Leave the 'technical' dictionary in your office and go out and
speak in business terms with the application owner to understand what in the business
will drive resource usage. This information will be used to determine the degree of
correlation to actual usage and provide an intellectual base for executive
communications.

An important part of this form is the application's daily work flow pattern which describes the applications
key business workloads and usage throughout the 24-hour day. The identification of both Host and Network
application peaks provide a basis for discussing the events that are expected to occur during the peaks.
These peaks are the time periods during which the application expects to impose the heaviest workload on
the host (CPU and DASD) and network resources. Note, the peaks for host and network may be different;
also, network peaks here do not refer to any specific network component, but the time the application
expects to transfer the most traffic through it access point. Consider also the period when database backups
are done. Are these considered business transactions within the application, or operational functions at a
fixed time of day for all applications? They may be included here as a different workload. Other types of
peaks are also important and should be recorded on this form -- e.g., end-of-month, end-of-year, seasonal,
etc. Figure 2.3.1 illustrates a method for documenting the Daily Work Flow Pattern. The capacity planner
should start by listing the key business workload types. Then, for each business workload, map the expected
window of operation, denote both the host and network peak periods, and the workloads operating priorities
relative to other work. One hour processing peaks are represented by an "H" for Host resources and "Nx"
for networking resources, where x equals the traffic type. Traffic types for xxx are defined in Section 5,
"Procedures, Tools, and Standards" for traffic types. In the example below, N1 = Transaction, N2 = File
Transfer (Bulk Data), N3 = Channel Extender (MICR).

XXXXX Page 30
IT Capacity Management Procedure

Workload 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 Priority
N M
Interactive IMS <--- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ---> High
DB Query H H
DB Update N1 N2 N1
Interactive TSO <--- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ---> Medium
Submit, validate, H H
output N1 N1

Batch - Updates, <--- ----- ----- ---> High


edits, reports, data H H
entry from other N2 N3
applications
EOD Backups <--- ---> Medium
H H
N2 N2

Figure 2.3.1. Daily Work Flow Pattern -- Fictitious Application

A description of the target environment (xxxCP02, page 2) gives the capacity planner information
needed for modeling the future environment. It is not necessary to request the expected
processor size here; just determine the processor platform, i.e., will it be a S/390 running
MVS/ESA or a UNIX processor. It is possible that you will be asked to model multiple platforms to
determine the resources needed and associated costs. The "Host" and "Network" target
environment entries also give an indication of transaction complexity.

To understand the initial networking requirements (xxxCP02, page 2), capture


expectations of network flow: where will the transaction begin? where is it destined? what
type of traffic load will it impose?

xxxCP02, page 3, should be duplicated and marked as either Host information or


Network information. It provides a place to document additional operating and work flow
characteristics, such as:

v expected peak-to-typical usage ratios


v an estimate of the amount of resources consumed during the peak relative
to the remaining periods of operation
v an estimate of the peak number of concurrent users for an interactive
workload or the peak number of concurrent jobs
v a estimate of the maximum percent of the expected total number of users
or tasks using the application at the same time

How this application will interact with other applications and systems (xxxCP02, Page 3) is
also important. A clear understanding of how application/workload growth will impact the
resource consumption of other workloads and vice versa are needed to accurately
forecast overall resource requirements. A documentation aid to visualize the interfaces is
a simplified data flow diagram such as the Application Interface Diagram in Figure 2.3.2
based on original information obtained for ZZZ.

An important element in estimating the required capacity for any resource is the
understanding of service requirements. Otherwise, there would be no need to determine
capacity needs -- all water in the world can be funneled through the same pipe to service
the thirst of all people. Only when a requirement that each and every human must be

XXXXX Page 31
IT Capacity Management Procedure

given ten glasses of water per day does a single pipe become an inadequate transport
medium to serve the world. Service requirements (xxxCP02, page 3) include response
time, throughput, and availability.

Figure 2.3.2. Example of an Application Interface Diagram

Complete xxxCP03

This form allows the documentation of comparable workloads that could be used to base
estimates for the new application. If it is a purchased package, the capacity planner
should seek the information from the vendor or from a user running the application.
Noting similarities and differences will help you translate how the application will run in
your environment as opposed to theirs. It is important to document any assumptions and
reasoning that have gone into this effort. It is also important to revisit this document after
the application has been implemented into production to assess accuracy and to improve
forecasting techniques for future applications.

Often there may not be a complete application to compare against. It is then that the
system commonalties need to be sought. For example, will this new application use DB2?
Will it generate code using a fourth generation language or CASE tool ? Will it utilize
utilities that are already running in production servicing other applications? When looking
for a workload that uses the network in a comparable manner, start with similarities in
traffic types. Will file transfer operations across the network utilize the same software and
compression schemes as another application? How do the average size of the files and
frequency compare? It is not very difficult to find commonalties that shed some light on
the resources that will be required when the new application goes into production.

XXXXX Page 32
IT Capacity Management Procedure

Another hint of what can be expected may lie in where the application is being developed.
Section 6.3 will present a technique called clustering that can be used to characterize
applications by development group, or other useful groupings. It is from cluster analysis
that conclusions are drawn for measurements like the Relative I/O Content (RIOC). This
metric provides a basis for determining CPU requirements from I/O expectations.

Key Note: RIOC is an important indicator of a workload's use of CPU and I/O
resources -- given the I/O expectations and the selection of an RIOC that is within a
range of numbers for the type of file accesses (DB2, IMS Fast Path, etc.) determined
from cluster analysis, the corresponding CPU can be estimated. See Section 6.4.

Comparing network usage is a bit more complex. To compare the "new" workload to an
existing one, a comparison of traffic types must first be made. The traffic type suggests
the type of business package that will be transmitted across the network: a file, a report, a
job, a transaction, etc. Knowing the business package, one can now compare its size and
quantity over a peak hour with that of workloads running today.

Definition: Business packages are those units of information sent across the network
as known by the business user. They represent only the user data portion of the actual
traffic transmitted and are not meaningful for video and voice traffic types. Examples
are files, reports, transactions, images.

If there is a comparable workload, its measured network statistics would be useful for
estimating the "new" workload's requirements. Many factors need to be considered when
documenting network similarities and differences; a few are:

0 Will the communications overhead be the same? This is a factor of


the application, communication protocols, components used, etc. See
the discussion in Section 6.
1 Will network compression/compaction be the same?
2 How do the business traffic rates compare?
3 Do the network statistics for the measured comparable workload
include overhead and compression?

Complete xxxCP04

Although not much will be known about the detailed flow of network traffic, at least the
workstations to be used and the location of the application and users should be
anticipated. xxxCP04 is the network capacity planner's tool for documenting the traffic
flows through the network. This should immediately alert the capacity planner and
network designers to the complexity of the user-application relationship: simple one-to-
one or many-to-one relationships, or more complex one-to-many or many-to-many
relationship.
A

XXXXX Page 33
IT Capacity Management Procedure

At this point, when even the application location may still be questioned, no forecasting of
individual workload network components will likely be done; however, should the FEP
and FEP access links at the xxx locations be known, some estimate of growth could be
applied if necessary. The real objective of beginning xxxCP04 in the Definition Phase is to
gain early insight about which components might be impacted and to what degree. The
key output from this activity is the documentation of unknowns and activities to get the
answers. Follow-on activities should be logged on xxxCP01. Therefore, a "guess" at the
expected paths for each significant traffic flow should be attempted and documented on
this form. You may want to use a dotted-line to represent low confidence levels for
connections in question. xxxCP04 needs to be refined at each checkpoint until the actual
and final traffic flows are documented.

Activity 4 - Assess resource alternatives and project required resources - form


xxxCP15

The key forecasting metrics for processor and DASD are respectively power and
gigabytes required. Data collection comparisons and forecasts for processors need to
take into consideration physical versus logical mainframe partition measurements and
configuration differences, application of capture ratios and overhead, subsystem and data
base complexity differences, platform adjustments, etc.

Forecasting network throughput requirements (characters/hr or bits/hr) for this


application/workload at this phase, as it also is for CPU and DASD, is subject to higher
degrees of error than in subsequent phases due to the uncertainty of the inputs.
Therefore, it makes no sense to use sophisticated forecasting techniques to produce an
application estimate for every potential resource that may be impacted. The uncertainties
in the network traffic paths and intermediate components themselves suggest a more
focused estimate of resources on an individual application basis. A quick look at the
current capacities of the access nodes for the Location and Application may be order. Will
an individual component model need to be run?

The objective of xxxCP15 for the network is to capture the network capacity planner's
estimates of application production throughput demands (characters per planned peak
hour) and alternatives for satisfying those demands; not a resource-by-resource estimate
or design. The overall forecasts (see Section 3) will address network-wide, point-in-time
aggregates of all workload demands and other environmental changes. Unless the
production date for this application falls within the overall forecasting period and the
expected loads will put this application within the top 20% of the applications contributing
to 80% of the workload, this application should be included in the "other" category (i.e.,
the 80% contributing to only 20% of the load) and not be estimated separately. Input to
the overall forecasts will typically be high-level percent growths by location based on
comparable studies documented on xxxCP03.

The purpose of analysis during the Definition Phase is really to define what is known
today and to communicate the actions needed to be done to prepare for the next phase of
capacity planning. xxxCP02 provides the initial considerations that will be used to
estimate planned capacity and contingent capacity. The capacity planner must also
consider additional overhead by processor supervisors or communication protocols and
report the overhead capacity. See Section 5.2 for discussion on capacity terms and

XXXXX Page 34
IT Capacity Management Procedure

thresholds. The capacity estimates should be in MIPS, GBs, or character/sec (CPS) or


per some other unit of time. Percentages are not meaningful to compare resources
having different theoretical capacities.

For costing purposes, general cost metrics for $/MIPS and $/Gigabytes will likely be used.
One should also assess the value of determining and providing an equivalent metric for
network usage, such as, $/(1000 characters transferred).

xxxCP15 should be viewed as a sample of the key outputs from analysis and an input to
the decision and budgeting processes. Since the specific resource may be unknown, and
the lead time before the application goes into production should be well within the
provisioning guidelines, one should only focus on the impact to a "type" of resource, e.g.,
s S/390 or 3745 FEP. Restating previous comments, documenting follow-on actions is of
prime importance. When will we know the true network traffic paths and components?
How do we get this information? Can the business drivers be quantified and measured?
How do we begin measuring the business drivers so historical data will be available for
regression analysis later? In other words, look ahead at the Requirements and Design
Phases and begin your plans now. Document immediate actions on xxxCP01.

The last page of xxxCP15 is a result of a joint work effort between the capacity planner
and the system/network designer.

Activity 5 - Deliver recommendation summary package - form xxxCP16

After evaluating xxxCP02 information, researching comparable workloads (xxxCP03),


beginning xxxCP14, and performing your analysis (xxxCP15), xxxCP16 should be
completed to inform the project manager and business owner of current analysis and
follow-on actions. Both xxxCP15 and xxxCP01 could be attached if desired. It is now their
responsibility to evaluate your report and incorporate the information into their cost/benefit
analysis. Upon approval of the project, the Requirements Phase begins.

XXXXX Page 35
IT Capacity Management Procedure

2.4 Requirements Phase

ID Activities Primary Responsibility


1 Update the Application Tracking Form (xxxCP01) and Capacity Management
Definition Phase forms (xxxCP02, xxxCP03, xxxCP04) as
applicable
2 Review xxxCP11 for impact. Collect information - forms Capacity Management
xxxCP05, xxxCP06, and xxxCP07.
3 Assess resource alternatives and project required resources Capacity Management
- xxxCP15
4 Deliver recommendation summary package - form xxxCP16 Capacity Management

A summary of the steps to perform and key questions to ask to get the desired
information is provided in Appendix B . This also serves as a checklist to ensure all
activities are completed. For a quick reference to the different forms and their use
throughout the phases of development and production, see Appendix A. The table on
page 2 provides an index or key to which forms need to be completed, when, and where
to look for details.

XXXXX Page 36
IT Capacity Management Procedure

Activity 1 - Update the Application Tracking Form (xxxCP01) and Definition


Phase forms (xxxCP02, xxxCP03, xxxCP04) as applicable

Update xxxCP01 noting changes to dates and contacts, and log interim tasks required to
complete this phase. This form needs to be kept updated to reflect current application and
resource forecast status. xxxCP02 should also be updated as required. Complete or
update xxxCP03 with any new comparable information as required. The network capacity
planner should revisit xxxCP04 and refine it if better traffic flow information is now
available.

Activity 2 - Review xxxCP11 for impact. Collect information - forms xxxCP05,


xxxCP06, and xxxCP07.

Review xxxCP11

xxxCP11 documents current environmental and business changes that have a potential
for significantly impacting I/T resources. It should be reviewed to determine the impact of
any changes on this development effort and if impacted updated accordingly.

Complete xxxCP05

During this phase, an application's functionality is defined. Business functions are


provided by one or more business transactions which should be defined and
documented on xxxCP05 along with approximate quantities and frequencies. For
example, an interactive workload's business transaction may be "Open New Account" and
10 of these transactions might be expected every minute throughout the period of 8:00
A.M. - 5:00 P.M. Expected service to perform this transaction might be 30 seconds. For
batch, a business transaction may be the set of jobs to perform a business function, e.g.,
for ZZZ, "Delivery" or "ZZ-EDIT". xxxCP05, page 1, should be duplicated if both
interactive and batch workloads exist. Indicate whether the workload is interactive or
batch in the column provided.

xxxCP05 also provides an initial pass at understanding the file processing expected. See
the form for an explanation of "File Access Characteristic" codes that may be used to
estimate the complexity of file accesses. Use this information to select a comparable
workload or file access transactions to use in modeling. This field will be evaluated
frequently to determine its value at this phase.

Any more detailed information about the business transaction or file accesses should be
documented in the Processing Flow column. Does it always follow some prerequisite
business transaction? Does it generate any new transactions, e.g., reports?

The most one might know about networking traffic at this time might be the types of traffic
that will flow on the network by business transaction. Inferences from experiences may

XXXXX Page 37
IT Capacity Management Procedure

also provide the application planner a guideline for estimating the size and transmission
rate of "business packages" for the various traffic types. Will the traffic result in transfers
of files, transactions, images, MICR pages, or reports? Although it is too early to know
the actual sizes of the business packages, a best guess based by traffic type using
"bigger-than-a-breadbox" logic should be attempted. xxxCP05, page 2, provides a table
that can be used to complete the Business Package Characteristics matrix. From a
business perspective, what is the business package that will be transmitted across the
network? A file? If so, how many in a typical hour? How many in a peak hour? What is the
minimum and maximum size of the files? If the traffic will be interactive, will the package
be the data transmitted upon hitting the enter key? What is expected back? Text? An
image? How may and how big? Where will the package enter the network? Where is it
destined?

Should the largest potential business package size be used? Or should it be the average
or 90th-percentile size? Which statistical method will be used? This logic and selection of
statistical method must also be done for both the business package size and demand rate
(frequency of business packages per minute). Statistical methods for selecting the
planning peak can be found in Section 5.2. Always document the method used.

Once the business package is defined, its size determined, and its transmission
frequency estimated, the network capacity planner can apply rules-of-thumb (ROT) to
determine the segmentation of the business package and communication overhead.

Complete xxxCP06

An increase or decrease in workload volume for this application may affect the amount of
work another application must service. Likewise, another application's workload demands
may affect this applications workload. Document the interfaces to other applications,
including program-to-programming interfaces and other application inputs and outputs,
and expected effects. If other major applications or workloads are affected, plans must
be made to resize their capacity requirements as well. If appropriate, document an action
on xxxCP01 at this time.

Complete xxxCP07

What will the development environment look like. The drivers of this workload are usually
the number of programmers and the number of testers. See the form and complete as
much as possible. Will the Development and Testing Phase require special test systems
and networking arrangements to support it? It is usually a good idea to note the minimum
number of major networking components of each type that will be required during
development and testing to ensure that reservation or provisioning of network resources
can be completed. Given practical service requirements, these should be the
determinants for resource demands. However, in some cases, the priorities for
development and test work are so low that existing resources are the only resources
available; hence, the quantity of available resources may determine the actual capacity
that will be consumed. These limitations are nice to know, but the job of capacity planning
is to determine the resource requirements given the service objectives. Alternatives for
applying resources becomes a trade-off of expense and service and is not part of the
capacity planning process.

XXXXX Page 38
IT Capacity Management Procedure

Activity 3 - Assess resource alternatives and project required resources -


xxxCP15

CPU usage can still be estimated based on comparable transaction data as determined
during the Definition Phase (see documentation describing this technique in that phase).
In addition, some modeling tools can take I/O activity as input and project CPU usage.
The decomposition of business transactions into file accesses is generally what is
needed. Performing both techniques will provide a cross check.

DASD space requirements are a function of the number of logical records and the
average size of each record. However, also factor in the effects of archiving policies. Can
all or any portion of a file be archived? What percent of the file must be active at any one
time? How much archived data must be kept?

The same level of analysis documented in the Definition Phase (see Section 2.3, Activity
4) needs to be applied here. Determining the communication overhead to transport
network user data (number of business packages x business package size) need not be
exact. Use some conservative estimate in the range of 15-80% of the user data
requirements based on observations of similar traffic (size, protocol, application
communication interface) running elsewhere.

As in the Definition Phase, specific physical technologies may still be in question. Thus,
for costing purposes, general cost metrics for $/MIPS and $/Gigabytes will likely be used.
One should also assess the value of determining and providing an equivalent metric for
network usage, such as, $/(1000 characters transferred).

Activity 4 - Deliver recommendation summary package - form xxxCP16

From the information collected on the requirements and forecast forms, summarize
your recommendations and analysis in a brief letter to the application owner and
project manager. It is now their responsibility to evaluate your report and
determine whether the project should proceed and what the new cost estimates
will be.

XXXXX Page 39
IT Capacity Management Procedure

2.5 Design Phase

ID Activities Primary Responsibility


1 Update the Application Tracking Form (xxxCP01) and forms Capacity Management
xxxCP02, xxxCP03, xxxCP04, xxxCP06, and xxxCP07 as
applicable
2 Review xxxCP11 and xxxCP05. Collect information - form Capacity Management
xxxCP08.
3 Assess resource alternatives and project required resources - Capacity Management
xxxCP15
4 Deliver recommendation summary package - form xxxCP16 Capacity Management

A summary of the steps to perform and key questions to ask to get the desired
information is provided in Appendix B . This also serves as a checklist to ensure all
activities are completed. For a quick reference to the different forms and their use
throughout the phases of development and production, see Appendix A. The table on
page 2 provides an index or key to which forms need to be completed, when, and where
to look for details.

XXXXX Page 40
IT Capacity Management Procedure

Activity 1 - Update the Application Tracking Form (xxxCP01) and forms xxxCP02,
xxxCP03, xxxCP04, xxxCP06, and xxxCP07 as applicable

v Update xxxCP01 with any changes to dates and contacts, and log
interim tasks required to complete this phase. This form needs to be
kept up to date to reflect current application and resource forecast
status.

v xxxPCP02 should be kept updated to reflect the expected business


drivers and any changes to business transactions and application
characteristics.

v Review the effects of other application/process linkages on xxxCP06.


Have the effects been included in the overall analysis of resource
requirements?

v Are the resources needed for development and testing committed as


per xxxCP07? Are any changes necessary?

v Once again revisit xxxCP03 to examine workloads that may be


comparable. Collect additional information as required.

v xxxCP04 will require refinement now that better traffic flow


information is now available.

Activity 2 - Review xxxCP11 and xxxCP05. Collect information - form xxxCP08.

Review xxxCP11

xxxCP11 documents current environmental and business changes that potentially may
have a significant impact on I/T resources. Although most beneficial for the overall
system/network point-in-time forecasts (see Section 3), xxxCP11 has importance to the
application development staff who may change their design based on new technological
changes. Thus, have an updated xxxCP11 available for their review. As stated in previous
sections, xxxCP11 may be an automated output from the Change Management process.
However, before conducting interviews to collect information during the Design Phase,
highlight changes that may impact the application/workload being reviewed.

XXXXX Page 41
IT Capacity Management Procedure

Review xxxCP05

Use the information from xxxCP05, if available, as a base for determining enhanced
estimates of similar metrics on xxxCP08.

Complete xxxCP08

Figure 2.5.1 illustrates the flow for documenting information on this form. It should be
referenced while reading this section.

xxxCP08, page 1, provides a place to document information about the users of an


interactive workload. A business driver of resources for an interactive workload may be
unique types of users. For example, a financial analyst may demand heavy CPU
resources to service financial modeling transactions; a clerk may impose high demands
on the DASD subsystem to perform database queries and updates; an operations person
may demand high network usage for DASD backups. Documenting a description of user
types and typical transaction mix (xxxCP08, page 1) allows the subsequent monitoring,
reporting, and correlation of usage to user type in later development phases and
production. The information on this page will be used with xxxCP12 and xxxCP13 when
the application goes into production. It is not used with any other forms during the
application development process.

Note: There are no pre-defined user types. Ask the business and application staff if the
users of the application can be grouped based on similar workload demands. Separate
users only if their expected resource usage is vastly different from other users and
growths to their group has unique impact considerations on future resource
consumption.

Another important consideration is how an end user will use the application; e.g., during
the expected peak period: how many users will be interacting with the system? how
frequently will they interact with the system (e.g., hit the "enter key")? what type of
interactions (business transactions) will they be performing? xxxCP08, page 1, provides a
place to log the expected interactions per hour and then to document the mix of
interactions, or business transactions.

xxxCP08, page 2, is the place where business and DP transaction relationships are
established. Document the mapping of business transactions to DP transactions on
xxxCP08, page 2, noting the subsystem or transaction processor used. Also, estimate
corresponding data base calls and physical I/Os per DP transaction. See the data from
the Requirements Phase for data base calls per business transaction.

Note: The information collected for the "Interactive User Profiles (Peak Hour)" table and
the "Business-To-DP Transaction Mapping" table should be kept in a performance and
capacity database repository that allows correlation of usage to the metrics. MICS is
a tool that allows this.

XXXXX Page 42
IT Capacity Management Procedure

Figure 2.5.1. xxxCP08 Flow

xxxCP08, page 3, can be used to describe the Host workload characteristics:

0 how many business transactions per hour must be serviced in a


peak hour?
1 what is the expected service per business transaction? Hopefully,
the number of expected transactions time the expected service time
is less than 3600 seconds (a peak hour).
2 what types of files will be utilized and how? Can an approximate
number of logical I/Os per business transaction be expected? How
many DB calls are expected for the typical logical I/O? How may
physical I/Os can be estimated?

The Design Phase is also the phase when logical data records are described in terms of
physical relationships and characteristics. Thus, system/network metrics become
available and plans for measuring and tracking the metrics need to be developed. Not all
the information may be available in the early design phase; however, all information
should be available during the detailed design stage.

One metric used by some host metric forecasting methodologies is the Relative I/O
Content (RIOC). It is the ratio of logical I/O operations to the processor power consumed
by the I/Os. This ratio will usually stay constant unless there are changes to the underlying
access methods or changes to the application. Another interesting observation when
plotting the RIOC of applications is that clusters are readily distinguished and can be used
to classify workloads. This technique allows a capacity planner to assume the
characteristics of a new application and fit it to a pre-defined RIOC category. This can
then be used to estimate either processor usage or I/Os given the other. Usually, the DBA
can give an estimate of the I/O patterns for a new application. Section 6.1 provides a
more detailed explanation of the relative I/I content metric.

xxxCP08, page 4, allows the capacity planner to document information provided by the
DBA relative to the actual data sets to be created, along with their estimated sizes. based

XXXXX Page 43
IT Capacity Management Procedure

on the number of expected records and average record size. To this number, space for
an index needs to be added if necessary.

xxxCP08, page 5 , is the beginning of the network traffic decomposition process.


The objective is to be able to estimate the traffic demands (characters per hour) for each
major network component utilized by the application. Six steps need to be taken to
determine the Hourly Network Component Demands finally reported on page 9:

Step 1: Sketch a diagram showing the significant traffic flowing between the application
and users.

The process begins by working with the xxx business and application staffs to sketch a
diagram showing the significant traffic flowing between the application and users. Figure
2.5.2 is an example of such a diagram. Note that it shows the direction of the flows, the
name of the business transaction, the DP transaction that processes the transmitted
information, the traffic type, and the subsystem servicing the DP transaction. One should
number the significant traffic flows that are relevant for estimating loads on the network.
These become the entries (Traffic Flow ID on this form) that will be used for estimating
network traffic demands.

Figure 2.5.2. Traffic flow mapping

XXXXX Page 44
IT Capacity Management Procedure

Step 2: Describe the business packages for each Traffic Flow ID.

Where on xxxCP05, network business package sizes were in described abstract terms
(small, medium, large, extra large) based on ranges of sizes common for the expected
business transaction, a more quantitative guess at the actual business package size is
required. The information of importance is the number of packages sent and received and
the size of the packages. Business packages are those units of information sent across
the network as known by the business user. They represent only the user data portion of
the actual traffic transmitted and are not meaningful for video and voice traffic types. For
example, the logical business package that the user, or the user's application, may
send/receive could be a "file". If the application is an interactive application, the package
may be "transaction", the data sent upon hitting an Enter key. Other business package
types, may be a "report", a "job", or an "image". Whatever the user can relate to
constitutes a business package. Speaking in business terms, one can ask how big the
packages will be and how many will be sent/received throughout the day. It is important to
estimate flows and packages in each direction, i.e., from the application and to the
application.

Step 3: Distribute the business traffic by hour of day by user location (page 6).

xxxCP04, which maps the resource path between the application and the users, and the
information on page 6, which indicates how the traffic will be distributed by hour-of-the-
day, are the essential information needed from the business and application staff.
Subsequent steps will be performed by the network capacity planner to translate and
apportion the traffic accordingly. Ask the business and application staff to apportion the
business traffic documented on page 5 to the locations being served. Maintaining the
independence of flows to and from the application, what percent of the overall traffic
originates in and terminates in each user location? List the xxx locations and percent of
daily traffic for each Traffic Flow ID. Now apportion the daily traffic across the day in
hourly intervals. Peak information already documented on xxxCP02 and xxxCP03 may be
used as a starting point for estimating the proportion by hour and the relative size of the
peak hour (s) compared to typical hourly flow. All numbers should be percentages.

Key Note: This step is important because actual traffic peaks for a specific network
component can only be determined after combining traffic from all locations together.
The goal is to determine the application's peak traffic demand on each network
component. Take for example an application with users distributed across the country,
such as ZZZ. The peaks, normalized to the same time zone, would most likely occur at
different hours. Thus, only by aggregating the traffic by hour can the real peak hour by
component be determined. Individual workload estimates will later be merged with
xxxCP17 to determine overall system demands.

Step 4: Translate the hourly business package traffic ("user" data) to characters and report
the number of characters per hour throughout the day by xxx location (page 7).

This is accomplished by first determining the raw "user" data characters transmitted
to/from the application for the day. Using the information on page 5, multiply the No. of
Business Packages x Business Package Size to derive the application's "Daily Traffic"
into characters transmitted/received. Use page 6 percentages to derive the hourly
characters per traffic flow. The formula for deriving characters per hour per location is:

XXXXX Page 45
IT Capacity Management Procedure

Characters/Hr = Daily Traffic x


Percent of Daily Traffic x
Percent of Traffic by Hour of Day

Do this for each hour of the day, and the results will be a matrix of Characters Transferred
by Hour of Day.

Step 5: Determine the hourly demands on each network component expected to be used to
transport the traffic to/from the xxx locations (page 8).

xxxCP04 provides a mapping of the network components utilized for each traffic flow.
How much of the hourly location traffic will be passed through each network component in
the path? All of it, or in the case where multiple paths exist, only a proportion for alternate
resources? Multiply the hourly information on page 7 by the estimated percentage to
derive the Hourly Network Component Demand by Traffic Flow.

Key Note: To this point only "user" data being transmitted has been considered.
Adjustments will need to be made to factor in:

v Communications Overhead: Additional control characters


resulting from protocol selection, application-specific message
building (such as IMS MFS), communication parameters (such
as RUSIZE and MAXDATA) will increase the total traffic.

v Compression/Compaction: Application of these will usually


result in lowering the traffic demands. Compression reductions
typically fall in a range of 30-50%. However, depending on how
compression is done, the reductions may only apply to specific
traffic types and flows. Compaction techniques usually can
reduce the number of characters transferred by 20-30%.

v Application-specific: Included here are things the application


programmer can do that may affect the real traffic demands.
For example, the use of "chaining" to reduce the size of a large
user business package into smaller parts for more efficient
transmission of the destination through alternative components
can improve the response time of the individual transaction at
the cost of sending more control characters needed to
segment the package.

Section 6.6 will assist in determining the factors to be applied. the total characters
transferred after adjustments typically will be in a range of 15-80%. Thus, the importance of
applying these adjustment factors.

XXXXX Page 46
IT Capacity Management Procedure

Step 6: Total each location's hourly network component demands to yield total demands
on each network component by hour of day (page 9).

An individual matrix for each location may also be produced if desired. As in Step 5,
adjustments must be made to reflect the actual number of characters transferred, not just
the user data. Section 6.6 will assist in determining the factors to be applied.

Activity 3 - Assess resource alternatives and project required resources -


xxxCP15

Unless a prototype is developed and executed, measured data is still not available.
However, many modeling tools will accept the information collected on these and
previously completed forms. The significant addition in the Design Phase was xxxCP08.
The focus was to capture the decomposition of business transactions to DP transactions
(xxxCP08, page 2), to identify user types that may be useful in business-driver correlation
analysis (xxxCP08, page 1), to estimate the CPU and DASD usage demands for the peak
business transaction period (xxxCP08, pages 3-4), and to decompose and estimate the
business transaction demands on the network resources (xxxCP08, beginning on page
5). Business-driver accounting needs to begin for this application. Business drivers were
described on xxxCP02. Consider recording the business drivers, which includes user
types on page 1 of xxxCP08, into MICS for long-term monitoring and trending.

Network analysis in this phase focuses on estimating this application's throughput


demands (characters/hr) on each major network component needed. The matrix on page
9 of xxxCP08 will be used for two purposes:

1. as input for an overall point-in-time estimate of network component demands and


subsequent updating of the current capacity plan (see Section 3),
2. as input for communicating the estimated resource demands back to the application
and business staff to allow them to understand and react earlier to expected usage
and costs.

The key forecasting metrics for processor and DASD are respectively power and
gigabytes required. Data collection comparisons and forecasts for processors need to
take into consideration physical versus logical mainframe partition measurements and
configuration differences, application of capture ratios and overhead, subsystem and data
base complexity differences, platform adjustments, etc.

The objective of xxxCP15 for the network is to capture the network capacity planner's
estimates of application production demands (characters per planned peak hour) and
alternatives for satisfying those demands. The overall forecasts, see Section 3, will
address point-in-time forecasting that aggregates all workload demands and other
environmental changes.

xxxCP02 provides considerations for estimating contingent capacity. The capacity


planner must also consider additional overhead by processor supervisors or
communication protocols when reporting these capacities. See Section 5.2 for discussion
on capacity terms and thresholds.

XXXXX Page 47
IT Capacity Management Procedure

xxxCP15 should be viewed as a sample of the key outputs from analysis and as input to
the decision and budgeting processes. The last page of xxxCP15 is a work effort of both
the capacity planner and the system/network designer. When appropriate, e.g., when
provisioning lead times dictate it, perform a joint study and complete page 3.

Activity 4 - Deliver recommendation summary package - form xxxCP16

From the information collected on the requirements and forecast forms, summarize
your recommendations and analysis in a brief letter to the application owner and
application project manager. Generic costs may have been determined to allow the
reporting of expected costs at this time. These costs should not be misinterpreted
as final costs, or costs based on any specific design study. The application project
manager and business owner now their responsibility to evaluate your report and
determine whether the project should proceed and what the new cost estimates
will be.

2.6 Development and Testing Phase

ID Activities Primary Responsibility


1 Update the Application Tracking Form (xxxCP01) and forms Capacity Management
xxxCP02, xxxCP03, xxxCP04, xxxCP06, and xxxCP07 as
applicable
2 Review xxxCP11 for impact. Collect information - forms Capacity Management
xxxCP09 and xxxCP12.
3 Assess resource demands and alternatives - forms xxxCP08 Capacity Management
and xxxCP15
4 Deliver recommendation summary package - form xxxCP16 Capacity Management

A summary of the steps to perform and key questions to ask to get the desired
information is provided in Appendix B . This also serves as a checklist to ensure all
activities are completed. For a quick reference to the different forms and their use
throughout the phases of development and production, see Appendix A. The table on
page 2 provides an index or key to which forms need to be completed, when, and where
to look for details.

XXXXX Page 48
IT Capacity Management Procedure

Activity 1 - Update the Application Tracking Form (xxxCP01) and forms


xxxCP02, xxxCP03, xxxCP04, xxxCP06, and xxxCP07 as applicable

0 Are the resources needed for development and testing committed as


per xxxCP07? Are any changes necessary?

0 Review the effects of other application/process linkages on xxxCP06.


Have the effects been included in the overall analysis of resource
requirements?

0 Update xxxCP01 with any changes to dates and contacts, and log
interim tasks required to complete this phase. This form needs to be
kept up to date to reflect current application and resource forecast
status.

0 xxxCP02 should be kept updated to reflect the expected business


drivers and any changes to business transactions and application
characteristics.

0 Once again revisit xxxCP03 to examine workloads that may be


comparable. Collect additional information as required.

v The network capacity planner should review xxxCP04 and refine it if better
traffic flow information is now available. The network components that will be
impacted by this application need to be firmly identified during this phase to
allow ample lead time for satisfying the demands.

Activity 2 - Review xxxCP11 for impact. Collect information - forms xxxCP09 and
xxxCP12.

Review xxxCP11

xxxCP11 documents current environmental and business changes that potentially may
have a significant impact on I/T resources. Although most beneficial for the overall
system/network point-in-time forecasts (see Section 3), xxxCP11 has importance to the
application development staff who may now have to coordinate their test and install plans
with other changes. Thus, have an updated xxxCP11 available for their review. As stated
in previous sections, xxxCP11 may be an automated output from the Change
Management process. However, before conducting interviews to collect information
during this phase, highlight changes that may impact the application/workload being
reviewed.

XXXXX Page 49
IT Capacity Management Procedure

Complete xxxCP09

xxxCP09 suggests desired capacity metrics that can be measured by monitoring tools.
This does not mean that it is necessary or practical to do so. The type of capacity study
will be one criteria when selecting tools and analysis/forecasting techniques (see Sections
5 and 6). To determine the practicality of measuring a specific capacity metric at a certain
level, a table such as Table 5.6.2.2 in Section 5.6.2 should be developed. Its intent is to
provide a quick index for determining whether the collection is possible and, if so, what
tool can be used to collect the information. An "x" in a table entry indicates that a
tool/technique is available to collect and report at this level of granularity, but it is not
practical to do so for xxx. This table is only a sample and needs to be developed for the
xxx-specific tools.

xxxCP09 begins with the CPU- and DASD-related metrics. Tool should be available to
obtain these metrics. Most of the required information would come from SMF records
produced from the MVS operating system. The information is probably already extracted
from the SMF records and available in the MICS performance and capacity data base
(Section 5.1.5 discusses data collection and consolidated capacity data base).

The information collected serves two purposes: (1) measured information can be
compared to expected values to determine the quality of the estimates, and (2) the
measured data provides quantitative metrics for forecasting techniques, such as the
Business Transaction Forecasting Technique discussed in Section 6.6. Service
information is also desired to ensure the capacity observed not only reflects the needs to
service the demands, but also whether the service objectives are being met.

Pages 1 and 2 for CPU and DASD resources allow the capacity planner to measure
resource usage per DP Transaction and then calculated usage by business transaction
utilizing the Business-To-DP Transaction mapping table on xxxCP08. The completed
table information provides the essentials for applying the Business Transaction
Forecasting Technique to estimate future resources.

The network measurements begin with page 3. Networking tools still lack the capability to
collect and report information for all desirable variations of granularity. Section 5.6.2 and
specifically Table 5.6.2.2 in Section 5.6.2 provide guidelines for understanding and
documenting the practical combinations for xxx. Therefore, the tables on this form for
networking statistics need to be selected based on the needs for them. Section 6.x
discusses capacity planning study selection criteria and potential implementation
considerations.

The sequence of the networking tables starts with levels that can easily be obtained to
levels that will require special tools or techniques.

Page 3 is provided as an example of the metrics that can be measured and reported on a
component basis. It does not distinguish between the type of traffic flows on the network
imposed by this application/workload. It also assumes that the network components are
isolated from other workloads so that the measurement of the resources is reflective of

XXXXX Page 50
IT Capacity Management Procedure

this application/workload's loads. This will not be likely for all components in a production
environment; in which case, a proportion of a shared resource can be assumed based on
past statistical studies. In reality, the purpose of an individual workload study is to
understand those resources unique to the workload and may be the case, by location. All
the possibilities can not possibly be captured in a form or forms. Instead, what to study,
how to study it, what to collect, how to analyze it, how to utilize the information in
forecasts, and how to report it are covered in "methodology" form in sections 4, 5, and 6.

The metrics here are similar to those estimated on xxxCP08 which provided the expected
values for this form. Special attention should be made to the footnotes. Many of the
necessary metrics representing actual loads need to be derived from the measured
values. As explained in the text for xxxCP08 in Section 2.5, and in Section 5.2, statistical
methods and adjustments need to be selected. One will also need to know the capacity
thresholds or limits for the different resources as explained in Section 5.2.

Where page 3 provided a means to report expected and measured loads on network
components, page 4 focuses on the demands placed on the resources by the workload.
The Tool used to capture this information will most likely be vendor-product. However,
host tools, like IMS for IMS applications, and vendor-product for channel Extender traffic,
etc. may be required depending on the nature of the study and traffic types. xxx' Table
5.6.2.2 will direct the capacity planner to the right tool.

Page 5 is the most granular that can be expected to get a handle on capacity
requirements. This table suggests the desired capacity metrics that might be captured on
a Traffic Flow (or message) basis. It is not anticipated that this will be done other than on
a test basis. The use of test basis here includes special tests during the testing of the
workload, as well as special artificially-generating transaction applications automatically
run to periodically determine the use of resources for a specific workload. If the
measurements are obtained, they can be compared to the estimates by traffic flow on
xxxCP08 to help quantify and validate the estimates.

The following notes apply to both host and network capacity planning unless indicated
differently:

Note 1: Adjustments to estimates or measured demands might need to be made to


ensure that both either include or exclude network overhead. One should also verify
that excessive re-transmission of characters due to errors do not distort the
measured capacities.

Note 2: Keep in mind the 80-20 rule. It is not important to individually monitor any
business transaction or even an application unless it is a major consumer of
resources. Focus on the 20% that consume 80% of the resources first. Group the
remaining entities into a single group for monitoring and reporting.

Note 3: Section 1 described the capacity management view of the xxx environment.
Because of the lack of efficient tools for collecting information of real production
transactions at the application and transaction level for all resources, particularly the
resources within the Network Backbone Layer and Network Access Layer, it is
advisable to seek higher-level forecasting techniques first in order to get a broad
picture. Only if necessary should more granular capacity forecasting

XXXXX Page 51
IT Capacity Management Procedure

techniques be utilized for these may require detailed network line and component
traces and synchronization techniques (see Section 6, "Selecting Forecasting
Techniques", for the options available). Thus, the completion of xxxCP09 for network
resource usage will probably only be done as required and only for a subset of the
network.

Note 4: A way to estimate resource requirements at the business and DP


transaction level, is to set up transaction-generating "drone" applications that
periodically release the transactions for execution during specified time periods.
Several large clients have personal computer systems and applications that do this,
allowing just those systems to be monitored. Given a know quantity of transactions
and their types, an estimate of usage per transaction can be determined and used
as a base for estimation.

Note 5: The identification of measurable business drivers and business transactions


and the periodic collection of these business measurements is imperative to relating
usage to the business. Actions must be taken by the xxx Applications and
Business groups to track the business drivers and populate the data base
with periodic status.

When reporting resource usage, it is important to determine all the usage. Capacity
planners need to ensure that measured usage which does not reflect actual usage gets
adjusted by capture ratios, and that the usage of system workloads accumulated on
behalf of applications be distributed appropriately if desired. The volumes and resulting
usage values should then be compared to the most recent baseline values for this
application -- establishing baseline values is discussed in Section 6. Once established,
they should be recorded on the first creation of xxxCP09 and used on subsequent
updates to forecasts. Exceptions should be investigated, differences explained, and the
baseline updated as required.

The specific metrics required for capacity planning are discussed in Section 5,
"Developing and Maintaining Procedures, Tools, Techniques, and Standards". Sample
reports described in Section 4 could be used to report the metrics. If a new report is
desired, develop report specifications and forward them to the "Producing Capacity
Planning Reports" sub-process (Section 4) for report design and generation.

Complete xxxCP12

During the latter part of development or early part of testing, it becomes important to take
the first cuts at future growth. This will be done via growths to expected business drivers
and business transactions. The business owner and application project manager should
provide these forecasts that can then be translated into DP transaction growths via the
Business-to-DP Transaction Mapping performed during the Design Phase and
documented on xxxCP08.

Activity 3 - Assess resource demands and alternatives - forms xxxCP08 and


xxxCP15

XXXXX Page 52
IT Capacity Management Procedure

Network analysis in this phase focuses on estimating this application's throughput


demands (characters/hr) on each major network component needed. Demand estimates
documented on xxxCP08 from previous checkpoints need to be updated appropriately.
Results of testing captured on xxxCP09 may indicate differences in the expected
business package sizes, distributions of traffic, or associated overhead. Refining the
estimates on xxxCP08 should provide more accurate demands. A change in the expected
traffic path, see xxxCP04, may also require that xxxCP08 be updated. The matrix on page
9 of xxxCP08 will be used for two purposes:

1. as input for an overall point-in-time estimate of network component demands and


subsequent updating of the current capacity plan (see Section 3),
2. as input for communicating the estimated resource demands back to the application
and business staff to allow them to understand and react earlier to expected usage
and costs.

The key forecasting metrics for processor and DASD are respectively power and
gigabytes required. Data collection comparisons and forecasts for processors need to
take into consideration physical versus logical mainframe partition measurements and
configuration differences, application of capture ratios and overhead, subsystem and data
base complexity differences, platform adjustments, etc.

The objective of xxxCP15 for the network is to capture the network capacity planner's
estimates of application production demands (characters per planned peak hour) and
alternatives for satisfying those demands. The overall forecasts, see Section 3, will
address point-in-time forecasting that aggregates all workload demands and other
environmental changes.

xxxCP02 provides the initial information that can be used to estimate contingent
capacity. The capacity planner must also consider additional overhead by processor
supervisors or communication protocols when reporting these capacities. See Section 5.2
for discussion on capacity terms and thresholds.

xxxCP15 should be viewed as a sample of the key outputs from analysis and as input to
the decision and budgeting processes. The last page of xxxCP15 is a work effort of both
the capacity planner and the system/network designer. When appropriate, e.g., when
provisioning lead times dictate it, perform a joint study and complete page 3.

Activity 4 - Deliver recommendation summary package - form xxxCP16

From the information collected on the requirements and forecast forms, summarize
your recommendations and analysis in a brief letter to the application owner and
application project manager. Generic costs may have been determined to allow the
reporting of expected costs at this time. These costs should not be misinterpreted
as final costs, or costs based on any specific design study. The application project
manager and business owner now their responsibility to evaluate your report and

XXXXX Page 53
IT Capacity Management Procedure

determine whether the project should proceed and what the new cost estimates
will be.

XXXXX Page 54
IT Capacity Management Procedure

2.7 Installation/Transition Phase

ID Activities Primary Responsibility


1 Update the Application Tracking Form (xxxCP01) and forms Capacity Management
xxxCP02, xxxCP04, and xxxCP06 as applicable
2 Review xxxCP11 for impact. Collect information - forms Capacity Management
xxxCP09 and xxxCP12.
3 Assess resource demands and alternatives - forms xxxCP08 Capacity Management
and xxxCP15
4 Deliver recommendation summary package - form xxxCP16 Capacity Management

A summary of the steps to perform and key questions to ask to get the desired
information is provided in Appendix B . This also serves as a checklist to ensure all
activities are completed. For a quick reference to the different forms and their use
throughout the phases of development and production, see Appendix A. The table on
page 2 provides an index or key to which forms need to be completed, when, and where
to look for details.

Activity 1 - Update the Application Tracking Form (xxxCP01) and forms


xxxCP02, xxxCP04, and xxxCP06 as applicable

0 Update xxxCP01 with any changes to dates and contacts, and log
interim tasks required to complete this phase. This form needs to be
kept up to date to reflect current application and resource forecast
status.

0 xxxPCP02 should be kept updated to reflect the expected business


drivers and any changes to business transactions and application
characteristics.

0 The network capacity planner should review xxxCP04 and refine it if


better traffic flow information is now available. The network
components that will be impacted by this application need to be firmly
identified during this phase to allow ample lead time for satisfying the
demands.

v Review the effects of other application/process linkages on xxxCP06. Have the


effects been included in the overall analysis of resource requirements?

Activity 2 - Review xxxCP11 for impact. Collect information - forms xxxCP09 and
xxxCP12.

XXXXX Page 55
IT Capacity Management Procedure

Review xxxCP11

xxxCP11 documents current environmental and business changes that potentially may
have a significant impact on I/T resources. Although most beneficial for the overall
system/network point-in-time forecasts (see Section 3), xxxCP11 has importance to the
application development staff who may now have to coordinate their installation and
migration plans with other changes. Thus, have an updated xxxCP11 available for their
review. As stated in previous sections, xxxCP11 may be an automated output from the
Change Management process. However, before conducting interviews to collect
information during this phase, highlight changes that may impact the application/workload
being reviewed.

Complete xxxCP09

See Section 2.6, Activity 2.

Complete xxxCP12

See Section 2.6, Activity 2.

Activity 3 - Assess resource demands and alternatives - forms xxxCP08 and


xxxCP15

See Section 2.6, Activity 3.

Activity 4 - Deliver recommendation summary package - form xxxCP16

See Section 2.6, Activity 4.

XXXXX Page 56
IT Capacity Management Procedure

XXXXX Page 57
IT Capacity Management Procedure

Gathering and Forecasting "Existing" Application


and Overall Resource Requirements
Unlike the Development phases (Section 2) where the activities were listed in the
sequential order of performance and where all activities identified for a particular phase
had to be completed for that phase, the activities for this phase are not necessarily
related. Instead each activity in this section is driven by a periodic time interval, except for
activities 1 and 2 which are driven by changes to the system or business and done on an
as required basis. Appendix A provides a summary of the forms needed for ongoing
capacity planning and suggests periodic intervals for performing the different activities.
The suggested periodicity is repeated by activity on the table below.

ID Activities Primary Periodicity


Responsibility
1 Update the Application Tracking Form as Capacity As Required
appropriate - xxxCP01 Management
2 Review and update existing application description and Capacity As Required
characteristics forms - xxxCP02, xxxCP04, xxxCP06, and Management
xxxCP08
3 Use measurement tools to collect resource usage for Capacity Quarterly
"existing" workloads - xxxCP09 and xxxCP10 Management
4 Collect information on changes that are expected to Capacity Quarterly
significantly effect resource planning and forecasts - Management
xxxCP11
5 Validate business drivers and collect future business Capacity Semi-
application growth estimates - xxxCP12 and xxxCP13 Management Annually
6 Assess key "existing" application resource demands Capacity Semi-
and project future resource needs - Activity 2, xxxCP09, Management Annually
xxxCP11, xxxCP14, and xxxCP15
7 Assess overall future resource requirements and Capacity Annually
alternatives - xxxCP08, xxxCP11, xxxCP10 and xxxCP17 Management
8 Update the annual Capacity Plan as required Capacity Annually
Management
Table 3.1. Activities for gathering and forecasting "existing" application and
overall resource requirements

XXXXX Page 58
IT Capacity Management Procedure

Table A.2 in Appendix A indicates the use of the forms during the "Existing"
phase. If the application had not gone through this process during its
development life cycle, it may be necessary to complete forms xxxCP02,
xxxCP04, xxxCP06, and xxxCP08 as per the discussion in Section 2. Refer to
Table A.2 to determine where the desired form should have been created ("C" in
the column) and refer to the section indicated.

The direct use of the forms created in this section is not anticipated. Most likely, system-
generated reports or spreadsheets will be created to evaluate the current use of
resources. Reports of current usage are normally an output of the Performance
Management process and needed to provide a base or baseline for forecasting
techniques. Forms in this section assist in understanding the capacity planning metrics
and data needed for forecasting resource requirements. They also suggest how the data
should be consolidated and viewed to support the forecasting effort. The Performance
Management and Capacity Management staff should work together to ensure the
baseline serves both functions.

Periodic reports drive normal capacity planning activities. Section 4, Producing Capacity
Management Reports, provides a discussion of the typical report outputs from capacity
management. Design considerations and general contents are discussed for status and
forecasting reports, as well as for the overall capacity plan.

Section 3 addresses the actions required for gathering information for "existing"
workloads, i.e., applications and supporting systems comprising the current processing
environment. It also provides an activity to solicit changes to the environment.
Environmental changes need to be considered when performing an overall, point-in-
time resource requirements forecast. Figure 6.1 summarizes the key forms and
information needed as input to the overall capacity plan, either the Annual Capacity Plan
as discussed in Section 4 or a subset or update to it.

Note: This section is not intended to be a tutorial in forecasting. Its sole intent is to
define a repeatable process for obtaining the necessary business information and
translating it into metrics that can be used by the capacity planner to feed desired
forecast or design tools or models. The metrics provided are general metrics used in
all forecasting. Variations may have to be derived to fit a specific tool. This section
references some key forecasting concepts and techniques described in Section 6.

Section 6, Analysis and Forecasting Techniques, provides several analysis and


forecasting techniques that a capacity planner can apply. The choice will depend on the
time available to complete the study, the accuracy required, the input data available, and
the cost of the effort.

Appendix B, Information Collection Checklist, provides illustrations and checklist to aid the
capacity planner with the use of the forms.

Table 3.1 describes the activities within the Capacity Management process that need to
be performed on an ongoing basis. Activities are initiated periodically or as required
according to the "Form Use By Phase" table in Appendix A. Each will be discussed in
subsequent sub-sections ordered by periodicity.

XXXXX Page 59
IT Capacity Management Procedure

Figure 3.1. Key forms for preparing an overall or update to a capacity plan

3.1. Activities Performed As Required

Activity 1 - Update the Application Tracking Form as appropriate - xxxCP01

xxxCP01 was designed as a tracking form for a specific application/workload. The


second page of xxxCP01 can be used after the application is in production as well to log
any action items relative to that application, e.g., investigate why the actual usage per
transaction measurements no longer match the baseline established 12 months ago.
Instead of xxxCP01, the capacity planner may prefer to log day-to-day tasks with some
other tool for the production environment.

Activity 2 - Review and update existing application description and characteristics forms
xxxCP02, xxxCP04, xxxCP06, and xxxCP08

This activity is driven by change or from one of the other activities. Examples of questions
that will ascertain if changes will necessitate an update are:

0 Have the operating or service requirements changed for any of the


major applications?
1 Have the business drivers or business transactions changed for
any of the major applications?
2 Have the workload schedules changed for any of the major
applications?

Page 60
IT Capacity Management Procedure

3 Have new technologies, e.g., DASD or network traffic compression,


been implemented?
4 Have interfaces between any of the major applications changed?
5 Has the network topology changed?

If any of the answers are "yes", you may want to review and update xxxCP02, xxxCP04,
xxxCP06, and/or xxxCP08 as required.

Key Note: Changes to the major network components for a large application/workload
will definitely require the capacity planner to update xxxCP04 and xxxCP08.

xxxCP08 contains information relevant to analysis and estimation of future resource


requirements. It is here where User Types (a possible business driver for interactive
workloads) are identified and Business-to-DP transaction relationships are defined. If this
information has changed, including invalidation of a driver because it does not correlate
high enough with the resources consumed, change this form. Relationships of
transactions and I/O demands are also defined. Assess the need to complete the rest of
this form when review is done to satisfy a change as described above. Periodically, as
indicated in the previous table, resource estimates for "existing" workloads should be
determined necessitating the update of this form.

Note: The information collected on xxxCP08 for the "Interactive User Profiles (Peak Hour)" table
and the "Business-To-DP Transaction Mapping" table should be kept in a repository that
allows correlation of usage to the metrics ( MICS is a tool that allows this). If this is done, an
update to MICS would be done and a new mapping and user profile report would be printed.

Unfortunately, there may not be a change management process in place that notifies the
capacity planning function when changes such as those above occur. Therefore, it is
advisable to periodically have the business owner and application owner for at least the
major resource consuming applications review the information on these forms to ensure
satisfactory delivery of service and continued attainment of business requirements.

Refer to Section 2.3, to complete xxxCP02 and xxxCP04.


Refer to Section 2.4, to complete xxxCP06.
Refer to Section 2.5, to complete xxxCP08.

Page 61
IT Capacity Management Procedure

3.2 Activities Performed Quarterly

Activity 3 - Collect current resource usage by workload and verify its baseline - xxxCP09
and xxxCP10

Update or Complete xxxCP09 (Individual Workload Measurements)

The purpose of xxxCP09 is to record measurements of capacity indicators for each


significant workload. The volumes of work (number of transactions or jobs) and
corresponding resource usage will be used to determine arrival rates and resource
consumption per transaction. xxxCP09 is not meant to be the only way information can be
organized to understand the current use of resources. Performance management staff
usually produce current utilization and service reports useful for analysis. Unlike the
workloads being defined and developed during the application development life cycle
phases, the key production, or "existing", workloads should be well defined with
business information already available and updated in the same performance and
capacity data base. Throughout this document, it was indicated that both business and
system/network data should reside in the same data base to allow correlation of usage to
business events. It is assumed that this is the case. Since the selection and use of tools
is beyond the scope of this version of the methodology, it is assumed that the tool and
data collection procedures have been executed and all data now resides in the
performance and capacity data base.

See Section 2.6 for details on completing this form. It is during the test phases of
development that monitoring tools can first be used to collect capacity usage and service
information. Unlike the data collected in the test phase of development though, the data
collected here now represents the real production environment. One must readjust the
baseline for capacity forecasting accordingly. Baseline creation is discussed in more
detail in Section 6.3.

Key Note: Network data by application/workload may not be a practical thing to


collect continuously. Although possible for some network topologies where
workloads are for the most part isolated with their own resources, the complete
isolation of all resources, such as backbone trunks, may not be possible. Since
today's day-to-day monitoring tools do not typically record usage by workload, except
in the case of some SNA workloads, techniques to collect data by workload become
ad hoc studies utilizing special tools or parameters to capture very detailed data at
the cost of significant overhead. It should be noted that this technique is feasible to
do periodically to better understand the critical, resource-consuming workloads.

Note 1: Keep in mind the 80-20 rule. It is not important to individually monitor any
business transaction or even an application unless it is a major consumer of
resources. focus on the 20% that consume 80% of the resources first. Group the
remaining entities into a single group for monitoring and reporting.

Note 2: Section 1 described the capacity management view of the xxx environment.
Because of the lack of efficient tools for collecting information of real production
transactions at the application and transaction level for all resources, particularly the
resources within the Backbone Layer and Access Layer, it is advisable to seek

Page 62
IT Capacity Management Procedure

higher-level forecasting techniques first in order to get a broad picture. Only if


necessary should more granular capacity forecasting techniques be utilized for
these may require detailed network line and component traces and synchronization
techniques (see Section 6, "Selecting Forecasting Techniques", for the options
available). Thus, the completion of xxxCP09 for network resource usage will probably
only be done as required and only for a subset of the network.

Note 3: A way to estimate resource requirements at the business and DP transaction


level, is to set up transaction-generating "drone" applications that periodically release
the transactions for execution during specified time periods. Several large clients
have personal computer systems and applications that do this, allowing just those
systems to be monitored. Given a known quantity of transactions and their types, an
estimate of usage per transaction can be determined and used as a base for
estimation.

When reporting resource usage, it is important to determine all the usage. Capacity
planners need to ensure that measured CPU usage which does not reflect actual usage
gets adjusted by capture ratios, and that the usage of system workloads accumulated on
behalf of applications be distributed appropriately if desired. Network throughput
estimates must also be adjusted to reflect traffic overhead.

The transaction volumes and resulting usage values should then be compared to the
most recent baseline values for this application -- establishing baseline values is
discussed in Section 6 and once established should be recorded on the first creation of
xxxCP09 and used on subsequent updates to forecasts. Exceptions should be
investigated, differences explained, and the baseline updated as required. This
information and business growth forecasts will be used in Activity 6 to estimate application
resource requirements.

Complete xxxCP10 (Aggregate Workload Statistics)

The purpose of xxxCP10 is to report historical trends of resource usage by workload.


Tables are provided to suggest the type of information that should be reported by
workload if possible. A consolidation all workloads utilizing a system onto one form is
good preparation for forecasting the overall system requirements. Unlike xxxCP09 where
the peak hour of measurement could be different for each application/workload, the
measurements on this form for CPU and DASD are over a designated period of time, the
measurement period, that can be compared to the same periods in the past. Sometimes
the period isn't necessarily the same time-of-day or day of a month, but the absolute
peak period for a month or quarter, one that represents the 90th percentile for peak
work, or a bouncing-busy peak. Multiple copies of this form may be needed to evaluate
the peak loads during different shifts or processing periods. Like the other forms in this
section, this form highlights the metrics of interest and does not necessarily represent the
finished report.

For the networking resources, xxxCP10 focuses on the metrics that can be collected at
the network component level; however, a column to record the workload is provided when
the overall traffic can be reported by workload. The aggregate network traffic traversing
specific network components is reported. This is relatively easy to obtain from network
monitoring and management tools reporting traffic received by the component or sent
from the component. Metrics include messages (metric closest to the business), packets

Page 63
IT Capacity Management Procedure

(protocol-specific units), and characters (universal metric that can be derived from
messages and packets and can also be converted into bits). Utilization is included, but
may need to be derived from the characters/sec rate compared to the theoretical capacity
(see Section 5.2 for capacity definitions) of the resource.

Note: In most cases the current traffic being sent from the component will be the indicator
of the throughput; i.e., the total traffic sent from the component (total traffic to the user and
to the application) is equal to the traffic received plus potential retransmissions of traffic
due to errors.

It is possible that other metrics unique to the component and not on xxxCP10 are
reported by a monitoring tool. A report unique to the component of concern is desired.
For example, some tools use Information Frames sent and received to refer to the user
data serviced by a 3745; whereas, Messages include both user data and overhead
characters. When selecting the desired metrics, the capacity planner should seek the
throughput metrics that can be used to determine both user data rates and overhead.

The Capacity Management process alone does not result in the detailed configuration of a
network component; this is the output of the network design process. Network designers
will need to determine the effects of the traffic demands predicted by the capacity planner
on elements within a component as well as the processing engines. For example, will the
ports on a specific component handle a specific traffic load?

Key Note: Since many applications comprise the total traffic on any component, the
concept of from/to the application loses is meaning. Most monitoring tools reporting
information about a specific component will refer to traffic relative to that
component; hence, traffic received or sent.

Note: The loads for the key network components (see xxxCP04) should be in the
Capacity Database. Capacity status and trend reports are an alternative to this
form. See Section 4 for assistance in producing status and trend reports.

The information on the completed xxxCP10 or alternate report provides historical trends
that can be used to project future requirements. However, this should be used with
discretion, since the history does not necessarily predict the future. Use this information
as another input to resource requirements analysis and forecasting. This type of report
also allows a comparison of the current component load to the effective capacity
threshold.

The information on this form will be used in Activity 7 to estimate overall resource
requirements for an entire system.

Page 64
IT Capacity Management Procedure

Activity 4 - Collect information on changes that are expected to


significantly affect resource planning and forecasts - xxxCP11

The purpose of xxxCP11 is to identify factors other than application growth that could
impact the capacity and forecasts of I/T resources. Areas that should be considered and
documented here are:

0 Business - changes to regulatory reporting and deadlines, interstate


banking, changes to input or output media requirements, etc.)
1 Workload Schedules - changes in deadlines, changes in priorities,
etc.
2 Software - operating system or subsystem upgrades, additional
monitoring programs or operating changes, implementation of
different technologies or exploitation of software features (e.g., VSAM
to DB2 conversion, further implementation of systems managed
storage)
3 Hardware - new machines, LPAR or hardware configuration changes,
implementation of new hardware features (e.g., data compression,
increased cache use, 36-track tape conversion, T1 line conversions,
etc.)
4 Environmental - consolidation of data centers or workloads, changes
to disaster or recovery plans, etc.

Page one of xxxCP11 provides a Data Source Matrix that shows where the information
can be obtained. It should be kept up to date as contacts are established. Alternatively,
one can produce the change list automatically from the change management process.
This assumes that resource impact analysis is performed for all changes and those with
significant impact are flagged to allow the generation of this report.

When performing collection activities during the development life cycle (see Section 2),
the most current update of this form should be checked to determine impact on the
application in development.

Page 65
IT Capacity Management Procedure

3.3 Activities Performed Semi-Annually

Activity 5 - Validate business drivers and collect future business


application growth estimates - xxxCP12 and xxxCP13

The following text does not imply an order of execution. Most likely, historical data will first
be entered on both xxxCP12 and xxxCP13. Then future business growth estimates will be
collected from the Business Owner and Application Managers for both forms. xxxCP13
would then be completed with future projections derived from its regression algorithms as
part of Activity 6. Regression techniques are assumed to be understood and are not
documented in this methodology.

Correlation of network usage to specific application business drivers is not, in general,


viable. The application's contribution to the total traffic through a resource cannot be
easily collected for components shared widely with other applications. This makes
regression analysis using business drivers limiting. xxxCP13 does however provide a
format that can be used to determine correlation between business drivers or user types
(use the business driver fields) and selective network component throughput (CPS) and
business packages. Although the form was initially designed to relate application business
drivers and resource usage, it does lend itself to the correlation of usage and user
tpyes/groups.

Complete xxxCP12

To determine resource requirements for existing workloads, the capacity planner must
meet with the business owner and application staff to get their forecast of how the
application workloads will grow. This conversation should be in "business-terms", not DP-
terms. xxxCP12 provides tables to capture growth estimates for Business Drivers and
Business Transactions. If it has been found that resource usage correlates highly enough
with one or a combination of business drivers, the job of projecting resource needs
becomes easy. Simply apply the forecasted Business Driver quantity to the regression
algorithm and take the result. This Business Driver Forecasting Technique is
explained in Section 6.6. Regression and correlation analysis is reviewed in Section 6.3.

A second business-driven forecasting technique, the Business Transaction


Forecasting Technique, can be used in addition to the Business Driver technique. It
requires input about the growth in terms of business transactions. Form xxxCP08 was
completed in the Design Phase to identify the Business Transactions and determine the
Business-To-DP transaction mappings. If the application has not gone through this
development life cycle interface capacity planning process (Section 2), it will be necessary
to complete this form before proceeding.

Always document the past business driver and transaction trends (nice reports or graphs
are a plus) as a guideline for projecting the future growth. xxxCP12 provides fields for
trending the business data.

Page 66
IT Capacity Management Procedure

Complete xxxCP13 as required

This form is a consolidation of all the information pertaining to business drivers. It


documents the:

0 Business Drivers
1 Regression Algorithms used to project resources
2 Correlation Coefficient for the algorithms (indicates degree of accuracy)
3 Business Driver trends and projections
4 Resource usage measured for past periods
5 Projected resource usage per regression algorithm
6 Selected resource usage projections for future periods

Business driver to usage correlation is highly desirable, but not always achievable. This is
not a regularly scheduled task and should not be allowed to consume much of a capacity
planner's time. Rather, it is an activity that should be performed occasionally when at
least twelve months of usage data and business driver history is available. One use is to
verify an individuals expectation that a business driver does in fact affect resource usage
in a mathematical way; or, just the contrary, to communicate that it is not a factor that
should be used to project resources.

It may be possible to find some correlation between resource usage for some network
components and business drivers or user types, but seeking correlation for components
that are widely-shared is not viable. In many cases, the number of components and
configuration are driven by backup/recovery and disaster needs as much as capacity
needs. xxxCP13 does provide suggested information to allow correlation analysis for
network variables.

Regression analysis cannot be performed without past data, both driver history and
usage. If either is missing, ignore the Correlation Coefficient field until this can be
determined. Regression analysis formulas are part of statistical packages, such as, SAS,
1-2-3, and MICS capacity planning interface. This document does not provide details on
how to do regression analysis, but see Section 6.3 for additional discussion.

Activity 6 - Assess key "existing" application resource demands and


project future resource needs - Activity 2, xxxCP09, xxxCP11,
xxxCP14, and xxxCP15

This activity is here to initiate any application-specific study as deemed necessary.

See Activity 2 with respect to the review of xxxCP02, xxxCP04, xxxCP06, and xxxCP08.
Apply the capacity planning forecasting technique most appropriate for this workload and
summarize the results onto form xxxCP15. See Section 6 for forecasting technique
discussion. Refer to Section 2.5, to complete xxxCP08.

Page 67
IT Capacity Management Procedure

Complete xxxCP14 if appropriate

The information on xxxCP09 is the basis for the Business Transaction Forecasting
Technique. Given the expected volume of business transactions for the peak hour
(reported on xxxCP12) and now the usage per business transaction, simple multiplication
can yield the expected future resource needs.

Key Note: The usage values reported on xxxCP09 must include all the usage. The
application of workload CPU capture ratios, system CPU/DASD workload distributions,
and network protocol/compression factors should be applied before utilizing the usage
data for future projections. These applications are commonly understood by capacity
planners and therefore the tasks and are not discussed in this document.

Complete xxxCP15

xxxCP11 provides input about changes going on during the forecasting period that may
also affect the capacity forecasts. Review this form and include this in your individual
application forecast on xxxCP15. See Section 2.6 Activity 3 for the use of other key forms
and assistance for xxxCP15.

Page 68
IT Capacity Management Procedure

3.4 Activities To Perform Annually

Activity 7 - Assess overall future resource requirements and alternatives -


xxxCP08 and xxxCP17

Version 1 of the methodology for CPU and DASD was not written to the point of
describing the forecasting analysis and techniques. Although version 2 is an extension of
version 1 for the networking resources, at least one example of a spreadsheet technique
for consolidating current and expected future network traffic demands is via the use of
forms xxxCP08 and xxxCP17. Both forms segment the data by component, since
forecasts must be derived for individual components in the network. xxxCP17 provides a
spreadsheet identical to the final page of xxxCP08. Whereas xxxCP08 is used to capture
future load estimates for both "new" and "existing" applications individually, xxxCP17
documents how the resources are being utilized today. This can be used as a base, or
baseline, to which the demands on xxxCP08 can be added. Only after the spreadsheets
are merged by component can the peak hour be determined.

Note: Limiting the network components to be forecasted may make this task more
manageable. You could apply the analysis to 'critical' components, and to
components that are known to be approaching the critical performance threshold,
based on resource metrics that you have captured.

The overall, point-in-time system analysis and forecast (physical complex, site,
LPAR, network components) also needs to aggregate all potential resource impacts
within a specific forecasting period to derive a composite forecast. A consolidated
schedule should be created for the forecasting period denoting the factors to be
considered. Factors to consider should be:

0 Impact of "New" applications in development (Section 2) -


development, testing, transition, and live production workloads
1 Growth or depletion of "Existing" applications and workloads
2 Impact of changes to the business, environment, workload schedules,
and hardware/software configuration changes

All the above information should now be available on previously completed xxxCPxx
forms, except the development of a consolidated schedule.

It is not the intent of this document to discuss the modeling and reporting tools to use for
a specific forecasting project, however, Sections 4, 5, and 6 provide basic considerations
and guidance for selecting the proper technique, the development of reporting standards,
and some insight on tools. The choice and use of a modeling tool will dictate the input
requirements and output. In most cases, the inputs should have been described and
captured on either xxxCPxx forms or other reports for input to the tool. The output MUST
be tailored for the recipient. Section 4 provides guidance on producing effective reports
for a variety of audiences and report occasions.

Page 69
IT Capacity Management Procedure

Activity 8 - Update the annual Capacity Plan as required

This activity is referenced here as a reminder. The development of a Capacity Plan is discussed in Section
4.

Page 70
IT Capacity Management Procedure

Page 71
IT Capacity Management Procedure

Producing Capacity Management Reports


The capacity planning reports that are distributed to the "customers" of the capacity
planning process are one of the most important means for communicating the results of
the process. Although face-to-face personal communication of the results of the capacity
planning process may be the most effective way of getting a particular message across, it
is not the most enduring method. The enduring nature of printed reports and charts tend
to have a life of their own -- far beyond the memories of personal conversations. Thus, it
is very important to make sure that report communications are done so that the results
are communicated effectively, succinctly, and with a clear, consistent message in support
of decisions to be made.

The following subsections cover the fundamentals of producing good capacity


management reports that support the goals of the business:

4.1 Designing and Documenting Capacity Reports


4.2 Producing an Annual Capacity Plan
4.3 Producing Periodic Capacity Forecast Reports
4.4 Producing Periodic Capacity Status Reports

This whole section of the xxx methodology describes the most important facets of
designing and producing effective capacity reports. The first subsection deals with the
basics of creating good reports along with the standards needed to manage the reporting
process. The final three subsections focus on the three basic types of reports that can be
produced by the xxx capacity management process: (1) Annual capacity plan, (2)
Capacity forecast report, and (3) Capacity status report.

In addition to these reports, two other report types produced by the Service Level
Management and Performance Management processes are needed by the capacity
planner to deliver cost-effective capacity recommendations that satisfy business and
service objectives:

1. Service level attainment reports


2. Performance baseline reports

The xxx capacity planning process can use service level attainment and performance
baseline reports in developing capacity forecasts or capacity impact reports; but the xxx
capacity management process does not have the responsibility for producing them.

In preparing for capacity reporting, it is helpful to visual the process as an iterative one
whose success is dependent upon receiving from xxx management and the location
recipients. This has been diagrammed in Figure 4.1. From a high-level perspective, the
reporting process begins with customer requirements from xxx and the locations. Next,
the design activity begins. The measurement process takes its requirements from the
customers and the design activity. At this point an ongoing iterative set of activities take

Page 72
IT Capacity Management Procedure

place: collect data, develop and generate report, receive feedback on the report, initiate
control procedures or corrective action, and improve the final report for the next iteration.

Figure 4.1 also shows that the overall process for reporting is a repeatable set of
activities. Throughout the next four subsections, we will be focusing on the design report
activity within the reporting process since the measurement activities are discussed in
several sections within Section 5. The control and feedback activities are addressed in
the Process Management Guide.

Throughout each section, a common set of report recipients for the xxx capacity reports is
used to represent the vast majority of the report recipients. Other recipient's
requirements can be grouped within one of these groups with some tailoring to that
group's or individual's xx requirements. These groups and the terminology used to
reference to them as types is:

Type I location management and the xx business owner


Type II xxx senior management
Type III xxx technical management and staff

Note: Sections 4.2 through 4.4 were written with the assumption that Section 4.1 is
thoroughly understood.

Page 73
IT Capacity Management Procedure

Figure 4.1. Capacity Management Reporting process

8/10/12 IBM and xxx Confidential Section 4 - Page 74


IT Capacity Management Procedure

Section 4.1 -- Designing and Documenting Capacity


Reports
This subsection discusses how to effectively design a report that communicates the
information needed to make the desired decision in the most efficient and appealing
manner. It also provides guidance in establishing report standards that facilitate the
production and apprehension of information. Lastly, it introduces the concepts utilized in
the description of the three capacity reporting types that follow.

There are three basic types of reports that can be produced by the capacity
management process. These are discussed in sections 4.2 through 4.4:

1. Annual Capacity Plan


2. Capacity Forecast Report
3. Capacity Status Report

For each of these reports and others produced by the capacity planner, there are many
questions that need to be answered before any reports are produced and delivered:
What will these reports contain? How will they look? Who will receive them? What
decisions will be made from the information on the reports?

These and other questions need to be asked to design the most effective report. A report
showing utilization by device may be important to a capacity planner or technician, but it is
not necessarily the best for an executive trying to determine a least cost alternative.
Essentially, seven areas must be considered when designing a management report.
These are represented by the following questions with design-consideration areas in
bold print:

1. Who are the recipients -- Executives, management, technical staff?


2. What decisions must be made from the information presented?
3. What view does the recipient need presented?
4. What is the necessary depth or granularity of the information content?
5. What presentation format is most appropriate?
6. What is the decision-making time horizon?
7. What is the periodicity of the report?

Who are the recipients?

Understanding who are the report recipients is a key consideration when designing a
report. Knowing the recipients, their role in the organization, and the reasons why they
need information are most important and will become the base consideration for your
design. Many of the remaining decision considerations can usually be hypothesized by
referencing Table 4.1.1, based on some xxx recipient types. Composing an overall

8/10/12 IBM and xxx Confidential Section 4 - Page 75


IT Capacity Management Procedure

perspective from all of the seven design considerations will result in a report that meets or
exceeds the recipient's expectations.
Key principle: All of the information needed to design meaningful capacity planning
reports can be obtained by asking key questions in each of the seven design
consideration areas.

Recipient Level of Decisions Views of the Content and Typical


Types Management generally xxx system granularity of standard
made data report
frequency
Type I: Middle or First Business By application High-level Annually to
location Line workload workloads Utilization correspond
Business Management balancing By locations aggregates with budget
Management Scheduling By resource Costs and service Semi-annually
Cost impact aggregates attainment or quarterly
evaluation By business Trends for forecasting
function Alternatives
Business drivers
and utilization
correlations
Type II: Executive or Application By business High to moderate Annually to
xxx Senior Middle priorities function level correspond
Management Management Cost per By application Utilization with budget
application workloads aggregates and Semi-annually
By locations details for major or quarterly for
By resource cost resources forecasting
aggregates Costs, utilizations, Quarterly or
service monthly for
Trends trends, service
Alternatives attainment,
Business drivers and other
and utilization status
correlations
Type III: First Line Tuning By application Low to moderate Annually to
xxx Technical Management or Workload workloads level correspond
Staff Staff balancing By subsystem Utilization by with budget
Technicians Capacity workloads resource and Semi-annually
recommendat By resource aggregates or quarterly for
i-ons component Historical trends forecasting
By resource Service Quarterly or
aggregates attainments and monthly for
exceptions trends and
status
Monthly or
weekly for
service
attainment
Table 4.1.1. Capacity Planning Report design considerations by recipient type

Asking the right questions

8/10/12 IBM and xxx Confidential Section 4 - Page 76


IT Capacity Management Procedure

Asking the right questions not only guides the capacity planner in properly designing the
reports but it provides the capacity planner with a great opportunity to develop an ongoing
relationship with the report recipient. A report is just one mechanism for communicating
with the report's recipients -- face-to-face communications is almost a prerequisite to
successful management reporting.

The following questions are organized by report design consideration areas. These are
only suggested questions -- some of them can be directly asked of the future report
recipient and others, such as "level of management", are answered through data
collection or experience with the organization. Table 4.1.1 can be used as a basis for
developing the questions to be asked. This table can be easily updated and expanded as
more information becomes available about the population of report recipient's.

(1) Recipient

a. Who is the recipient? Senior executive, middle management, first line management,
staff, technician?
b. For what area of the business is the recipient responsible?
c. What are this manager's responsibilities? Are they related to managing the
hardware, developing IT plans, developing applications, managing costs?
d. What style of management does this recipient have? (i.e., how does this manager
communicate with others and what level of detail does he expect?)

(2) Types of Decisions to be supported

a. What decisions will be made from the information in the report?


b. What type of data is needed for the decision to be made?
c. How much data is needed to support the decision? over what time period?
d. How immediate is the decision to be made?
e. How will the information be used?
f. What are the criteria to be used in making the decision?
g. Does this recipient just want to be kept informed or protected from 'surprises'?
h. Does the recipient expect to make any decisions based upon the capacity-related
reports he receives?

(3) View of the xxx system

a. What view of the system does the recipient desire? (e.g., xxx, zzz-oriented,
application-oriented, hardware-oriented, network-oriented?)
b. What "language" does the recipient speak in regards to their job and the computer
systems? (e.g., business-based, cost-oriented, application-specification, service-
oriented?)

(4) Content depth and frequency

8/10/12 IBM and xxx Confidential Section 4 - Page 77


IT Capacity Management Procedure

a. How much detail does this recipient expect?


b. Are trend reports needed? by day, week, month, quarter, year?
c. What metrics are needed to build the case for supporting any recommended action?

There are a variety of other questions which can be developed to better define the
contents of the report. At this point, the capacity planner should adopt a structured, single
page capacity planning Report Specification Sheet as seen in Figure 4.1.1.

Report Specification Sheet

For controlling the use of standards in generating reports that have a common "look and
feel" and for facilitating (as a checklist) good report design, a report specification sheet,
see Figure 4.1.1, is created for each key report. This specification sheet is used to:

0 Document the report requirements and formalize the agreement with the
recipient
1 Monitor the production, distribution, and usefulness of the reports through
a periodic review of the contents of the specification sheet.

The standard specification sheet serves as a documentation vehicle for insuring that all of
the report requirements are known. This is used along with a set of graphical, tabular
chart, and technical standards which provide each capacity planning report with a
common "look and feel" for the recipient.

The graphical standards contain the standard ways in which certain report types will be
portrayed. This means that all bar charts, all 3D charts, and all other chart types will
appear in the same way with the same type fonts, same positioning of graphs on a page,
same use of color or shading, same date formats, same labeling techniques for axes, and
the same proportionality guidelines (e.g., the "three-quarter high rule").

There are standards for tabular charts as well. These standards contain guidelines for
row and column headings, shadowing of tables, relative type font sizes for column and
row headings versus the normal cell text, cell size, gutter size, and other presentation
standards.

The technical standards describe the mechanics of producing the report, i.e., how to get
the data, how to generate the report, and how to distribute it. It is also a place to define
terminology and data transformations. Terminology definitions within the technical
standards extend to very precise definitions of such key terms as capacity, peak hour,
daily average peak. These become especially important when different platforms are
involved, but may be just as important in enforcing the consistent use of terminology
between different hardware components such as CPU, Tape, and DASD. Some of the
key terms are described in Section 5.2, Terminology and Metrics; others are described in
the Glossary. However, much of it will need to be refined or created by the capacity
planning staff.

8/10/12 IBM and xxx Confidential Section 4 - Page 78


IT Capacity Management Procedure

The use of these standards across hardware components and platforms is especially
important for reports designed for xxx and zzz executives. The common "look and feel"
makes the charts more usable and eliminates confusion about the chart's message.

The completion of the Report Specification Sheet also provides the opportunity to
establish or enhance communications with the recipients. Revalidation of the specification
sheets would be done regularly to sustain good working relationships with the recipients
and to ensure each report still satisfies the requirements and still supports the decision
that needs to made from the report information content.
Figure 4.1.1. Report Specification Sheet example

What Views of the xxx System Need to be Presented?

There are several ways of presenting, or viewing, the xxx system in a report. In producing
the reports to meet the capacity-related decision-making needs of the recipient, the
capacity planner needs to determine what view or set of views would best support the
recipient's needs. There are four general views for presenting zzz capacity information:

1. Location: For example, a location or a xx location


2. User: For example, all xxxx, or even an entire location
3. Topology or Component: Such as the Backbone layer, or an individual
resource.
4. Business/Application: These can be further divided by:
a. location xxxxx (XX) and xxxxx (XX) applications or workloads
b. location xx (xxxx) workload (a variant on the XX application's view)

8/10/12 IBM and xxx Confidential Section 4 - Page 79


IT Capacity Management Procedure

Figure 4.1.2. Four views of the xxx network.

8/10/12 IBM and xxx Confidential Section 4 - Page 80


IT Capacity Management Procedure

These four views are summarized in Figure 4.1.2. Determining which view or
combination of views will usually be dependent on the information that needs to be
analyzed and the decision that needs to be made.

Report language and message clarity

In designing reports to support capacity-related decisions, it is important that all attempts


are made to communicate the information in the language of the report recipient and to
have a clear message. A report's message can be:

1. Directive -- Action needs to be taken by the recipient and here is our recommendation
2. Informative -- (1) Action has already been taken and here are the results, or (2) No action
needs to be taken; here is the status of the capacity plan

The directive reports need to present the case for action to be taken and then support
the recommendation that is being made. The case needs to be supported with clear
evidence in charts and narrative that require no interpretation by the recipient. The report
becomes the message and is conveyed in language that the recipient understands.

An informative report is very, very simple. It can be a one page report stating that
everything is fine, everything is going according to the capacity plan, and no action is
required since the trend lines are understood and have been accurately predicted. If
action has been taken to correct a deviation in a forecast or avoid a capacity problem,
then that would be highlighted here -- not all reports have to contain problems to be fixed!

Key principle: A good capacity planning report should stand on its own and need no
further interpretation by the capacity planner. Although further discussions on alternative
recommendations may result, there should be no question or disagreement on the facts
presented in the report. As one noted capacity planner has said "Facts are facts -- you
can't argue with the facts". This leads to the next key principle.

Key principle: Terminology used for the metrics in capacity planning reports has to be
consistently used, clearly understood by all recipients, and unambiguously presented.

Content and Granularity

Table 4.1.1 column Content and Granularity indicates levels of low, moderate, and high
information depth and the general content each recipient would be interested in.

Reports designed for location management as well as xx business owners and


application managers would be summarizations of the key information. For each of the
twelve locations and the xx business owners, the information could be organized by: (1)
that location's "location xx" applications, essentially just their LPARs, (2) xx application
resource consumption or, (3) location xx. These views of capacity would include both
network and host-related information.

8/10/12 IBM and xxx Confidential Section 4 - Page 81


IT Capacity Management Procedure

Reports designed for xxx Senior Management would be summarizations of information.


Typically the reports would be from one to three pages in length, preceded by a cover
letter. Only the essential pieces of information would be provided in narrative form. They
focus on several areas: (1) CPU, DASD and the network, (2) location xx applications (3)
xx Applications, and (4) location xx. If available, service attainment and service
exception information could be presented together in this report.

Reports for technical staff and management would be oriented toward meeting the
operational needs of the xxx organization. These reports are tailored to meet the detailed
requirements of capacity planners and performance management specialists. They could
report on a variety of metrics for CPU, DASD, and network loads by: (1) LPARs or
images, (2) Processor complex, (3) location xx applications, (4) xx Applications, and (5)
Location.

Presentation Format

There are several formats for presenting data: (1) narrative (2) tabular (3) chart or
graph (4) mixed. Of these four, the mixed format can be the best choice for conveying a
lot of information in a concise, easily understood fashion to the recipient. The narrative
format is useful for conveying summary information at a high level.

The chart or graph format is one of the most effective formats to use if the right chart or
graph type is chosen. There are seven commonly used types of graphs that are used in
capacity management:

1. Bar chart -- Used to present discrete measured data for comparative purposes and trending
2. Line chart -- Used to show trends. Best for either trending or continuous data.
3. Pie chart -- Used to show relationships and proportions.
4. Stock Market chart -- This chart is used to convey information where the distribution of data
such as the highs and lows are important. An example would be to have a high bar at the 90th
percentile of response time, a low bar at the 10th percentile, and a marker at the average.
5. Area chart -- Used to show continuity and the proportion of resources consumed by a
workload or an application group.
6. Scatter chart -- Used for representing sample points. Useful for representing clusters of data
as in a "cluster analysis" of application programs grouped by performance behavior.
7. 3D or Manhattan chart -- Useful for conveying three dimensions or units for particular
workloads or resources.

8/10/12 IBM and xxx Confidential Section 4 - Page 82


IT Capacity Management Procedure

Table 4.1.2. Example of a tabular report used for technical analysis: NetSpy Line Usage
for a particular NCP under study.

The tabular format is useful for conveying a variety of related data in a convenient
organized fashion. This can be very effective if important numbers can be highlighted
within the table. It is also very useful in presenting a number of characteristics, aspects,
or variables related to a single resource, application, or workload. It is usually very
difficult and confusing to try to develop a single graph that has more than two dimensions
(i.e., a three-dimensional chart or a two-dimensional chart which requires one dimension,
such as the y-axis, to have two units of measure). Table 4.1.2 provides an example of a
tabular report useful for technical analysis.

The mixed form is a combination of the graph and tabular formats. If viewed on a single
page, a graph would appear in the upper two thirds of the page and the table containing
the values for the y-axis would be on the bottom third of the page. The y-axis values
would be lined up underneath their corresponding x-values. Figure 4.1.3 is an example
of the mixed format.

Effective Presentation of Forecast Information

The selective use of thoughtful, well-designed graphs and charts can significantly
enhance the recipient's understanding and retention of key messages highlighted by the
capacity planner. Figure 4.1.3 is an example of how a graph can summarize data
concisely and convey a lot of critical information about the current environment, significant
capacity events (e.g., a processor upgrade or the addition of new work), and forecasted
capacity requirements over time. Figu

Page 83
IT Capacity Management Procedure

Another useful chart and capacity planning approach is to introduce the concept of
unplanned capacity or unplanned reserve capacity. The chart shown in Figure 4.1.4
would then present two types of capacity information: planned and unplanned. This is
very helpful to track and differentiate those drivers of growth that were planned in the
original annual capacity plan from those which were unplanned. This is not to attribute
blame for the unplanned growth, but rather to acknowledge that it is occurring and must
be included in the capacity plan as unplanned reserve when forecasting requirements into
the future.

Unplanned reserve capacity is identified in the capacity plan and is tracked in the forecast
reports as updates to the plan. Items which arise as requirements that fell outside of the
normal planning or budgetary cycle would be identified as unplanned items. These would
be tracked by showing how much of the unplanned reserve capacity that was
forecasted in the capacity plan was consumed by the actual unplanned activities and
projects.

Page 84
IT Capacity Management Procedure

Figure 4.1.4. Combined chart of current environment, key capacity events and forecasted
requirements.

Page 85
IT Capacity Management Procedure

Section 4.2 -- Producing an Annual Capacity Plan

The annual capacity plan is an iterative report which starts with the known requirements
for the next three years. After it has been reviewed, a final capacity plan is produced
which reflects the organization's objectives and strategy. This is just the beginning of
many iterations which serve to refine this plan as more information becomes available --
especially as new applications near there production roll-out dates.

The primary questions that this report answers are:

0 What Information Technology resources are needed?


1 When are those resources needed?
2 Why are those resources needed?
3 What assumptions went into deriving those Information Technology
requirements?

These questions are addressed in a report format that is easy to follow and meets the
decision-making requirements of the targeted audience. This type of report is primarily
directive. Its purpose is to communicate the IT resource requirements for meeting the
business and service needs of the zzzs, xxx, and other xxx customers. Basically, this
communication is the recommendations for IT resources. The informative part of this
report is the section which builds the case for the recommendations.

The Annual Capacity Plan establishes standards and requirements for most of the other
reports produced by the capacity planning process. The standards inherent in the annual
capacity plan relate to workload definitions, terminology, service and performance levels,
and presentation style. The requirements for the other reports are supportive of the
Annual Capacity Plan:

Capacity Forecast Reports Provide updates, corrections, or validation of the IT


resource plans outlined in the Annual Capacity Plan.

Capacity Status Reports Provide status on the IT resources consumed by the


workloads defined in the annual capacity plan,
highlight exceptions, recommend corrective action,
and document corrective action that has been taken.

Baseline Report Provide status on the overall system metrics and


establishes a base for projections and Capacity
Forecast Reports. It is beyond the scope of this
document normally is produced out by the
Performance Management Process.

There are three basic types of xxx Annual Capacity Plans that can be produced based
upon the targeted audience: (1) Type I: location management and xx business owner,

Page 86
IT Capacity Management Procedure

(2) Type II: xxx senior management, and (3) Type III: xxx technical management
and staff. There may be other groups of people that receive these reports that are not
included in these types. But, the basic information to support all of the customers of the
capacity planning process should be in at least one of these report types.

Basic Elements of An Annual Capacity Plan


There are several basic elements that are considered standard for capacity plans although
their actual format and placement within the report can vary. These elements are:

1. Executive Overview
2. Summary
3. Introduction or Preface
4. Methodology or approach to deriving the IT resource forecast
5. Current Environment
6. Business-driven forecasts and rationale
7. Alternatives
8. Capacity schedule
9. Appendix

Key point: If a xx capacity plan is produced for different technologies (e.g., if there was
a CPU plan, a DASD plan, and a network plan) then they should all have the same
presentation format since they are a part of a single, cohesive capacity plan.

Element 1 -- Executive Overview

For senior management, this is the most important page or two of the entire annual
capacity plan. It is focused on answering the questions: "what IT resources do we
need?" and "when do we need them?" The answers should be very concise, such as "In
fiscal year 1995 we will need xxx MIPS to support y additional location transitions,
production cutovers of x xx Applications, and ongoing workload growth of nn %". A
brief description of alternatives considered should be presented which makes the capacity
planners recommendation the obvious choice. This section is all narrative -- no charts!

Element 2 -- Summary

This is different than the Executive Overview in that it is a summary of the entire capacity
plan. It is perhaps five to ten pages in length and highlights the main points from the
report with more details than the Executive Summary. Here, charts are used to
summarize the amount of resources needed by quarter for each major view of the system
(e.g., location-xx, xx Applications, location xx, Location, Component). The basic
rationale for the recommendation is given and a short statement about the alternatives
considered is presented.

Page 87
IT Capacity Management Procedure

Element 3 -- Introduction or Preface

The purpose of the capacity plan, its goals and objectives are described here. This is
where the benefits of capacity planning can be stated. It is also an overview of the
organization of the report and how it is to be used. This is also a very short section -- just
a few pages.

Element 4 -- Methodology

This doesn't have to be very long. The capacity planner needs to explain the basic
process which was used to derive the resource forecasts. Enough information should be
provided to the report recipient so that they feel confident that the best approach was used
to derive the recommendations. The evaluation criteria for alternatives should also be
described here.

An important element of this section is the listing of key assumptions that were made in
support of the recommendations. Any assumptions about peak periods, workload
patterns, planned application cutovers to production, planned transitions, and technology
upgrades should be listed here.

Element 5 -- Current Environment

The ongoing monthly capacity status reports should provide sufficient information to the
recipient to understand the current xxx environment. However, some recipients may not
read those reports. Consequently, this section would describe the current status of the
system's capacity to service its workloads and would indicate growth over the past year, a
comparison with previous year's forecast against the actual results, and exceptional
events. This section would also contain information on current workload performance
profiles or characterizations.

Element 6 -- Business-driven forecasts and rationale

This is the heart of the report. The actual forecasts are described through charts and
graphs. The IT resource forecast is presented in the appropriate views (e.g., location xx,
xx Applications) for the audience. The rationale or basis for the forecasts is provided.
This will require presenting the growth in business drivers, such as number of funds
transferred, along with the growth of the workload.

Element 7 -- Alternatives

A part of the capacity planning process is to identify and evaluate alternative scenarios
and configurations which could meet the growth requirements of the zzz. This section
discusses the alternatives that have been evaluated, explains their pluses and minuses, and
explains why the capacity plan's recommendations were chosen from the alternatives.

Page 88
IT Capacity Management Procedure

This introduces how the evaluation criteria (described in the methodology section of the
capacity plan) was used to determine the forecasted environment and select the IT
resource requirements that would meet the business requirements.

Element 8 -- Capacity Schedule

This item could be relegated to the appendix. This is the schedule of what equipment will
be acquired and when it will be installed. It can be considered as a calendar of capacity
events.
Element 9 -- Appendix

All of the details go here. The appendix needs to be organized by some logical grouping
which is consistent with the rest of the document. New terms or information shouldn't be
introduced here unless they are referenced in the main body of the capacity plan. This is
where the bulk of the charts and tables will reside.

Page 89
IT Capacity Management Procedure

Customizing the Annual Capacity Plan by Recipient Type


Type I: location Management and xx Business Owner

This report is tailored to the operating environment of that particular location, application
development manager, or business owner. There is no need to cover what resources are
consumed by other locations or other business application areas since each location's and
business owner's requirements would have been properly collected and used in the
forecast. The CPU, DASD, and network information can be presented for their location
or business application area by:

0 location xx applications
1 location xx
2 xx Applications -- use and forecast
3 Overall xx Applications forecast and usage for the xx Application(s) for which
they are responsible
4 Host and Network Component forecast.

The elements of the capacity plan that they could receive would contain:

0 Executive Summary (optional)


1 Summary (for the location or business owner)
2 Introduction (optional)
3 Methodology (optional), assumptions should be covered in Summary if this
section is left out.
4 Current Environment -- only for their location
5 Business-driven forecasts and rationale -- only for their location
6 Alternatives -- system-wide tradeoffs and location-specific alternatives
7 Capacity Schedule (optional)
8 Appendix (optional)

Type II: xxx Senior Management

This type of report is probably the most difficult report for the capacity planner to
produce since it requires them to synthesize all of the information that they have analyzed
into a single short document. The focal point of the document is the Executive Summary.
This one to two page summary was described earlier and is the primary written vehicle
for the capacity planner to communicate capacity recommendations.

Typically, there are only a few elements of the annual capacity plan that the executives
would need. From the list of basic elements the following would apply:

0 Executive Summary (required)

Page 90
IT Capacity Management Procedure

1 Summary
2 Introduction
3 Methodology (optional), assumptions should be covered in Summary if this
section is left out.
4 Current Environment (optional)
5 Business-driven forecasts and rationale (optional)
6 Alternatives
7 Capacity Schedule (optional)
8 Appendix (if the other sections are covered properly you should never need
to distribute the Appendix to the executives).

Type III: xxx Technical Management and Staff

This is the most detailed of all of the types of annual reports. All elements of the capacity
plan are produced. One of the purposes in this report is to be an historical document for
the capacity planning process. Another purpose is to be the reference point for the
capacity planner in producing quarterly forecast updates and capacity status reports.
Another purpose is to provide detailed information on the plan for the xxx technical
management team. In this sense the capacity plan becomes the basis for building the xxx
tactical plan.

Page 91
IT Capacity Management Procedure

Section 4.3 -- Producing Periodic Capacity Forecast


Reports

The Forecast report is an update to the Annual Capacity plan and is both informative and
directive. Exceptions, unexpected trends, and significant deviations from the forecast are
highlighted with corrective actions and cost impacts identified. Explanations are
provided for unexpected exceptions, unplanned changes in resources consumed, or
unexpected trends.

This report addresses the following questions:

0 What has changed in the forecasts?


1 How are those changes being addressed?
2 Are there any trends that look ominous?
3 What is the rationale behind the forecast changes?
4 What action needs to be taken?

The periodic forecast report is produced on a regular basis and its main purpose is to
identify what has changed. Its format contains some of the same elements of the annual
capacity plan and is produced with the same views of the data. The complete set of
common elements for a capacity forecast report are:

0 Executive Overview
1 Summary of Changes -- i.e., what has changed from the last forecast or the
annual capacity plan
2 Details of the changes:
Why they occurred
What the impact is
How they will be addressed

There are three basic types of xxx forecast reports that can be produced based upon the
targeted audience: (1) Type I: location management and xx business owner, (2) Type
II: xxx senior management, and (3) Type III: xxx technical management and staff.
There may be other groups of people that receive these reports that are not included in
these types. But, the basic information to support all of the customers of the capacity
planning process should be in at least one of those report types.

Page 92
IT Capacity Management Procedure

Customizing the Capacity Forecast Plan by Recipient Type

Type I: location Management and xx Business Owner

Periodically, each location or business owner can receive a report on the changes that
affect them directly. This may form the basis for periodic budget update reports. Each
element of the forecast report could be used but would be specific to their location or the
business owner's application. They could be presented in the following format:

0 Executive Overview for the location or business owner


1 Summary of Changes -- i.e., what has changed from the last forecast or the
annual capacity plan for that location or applications for that business owner
2 Details of the changes for that location or business owner (optional):
Why they occurred
What the impact is
How they will be addressed

Type II: xxx Senior Management

This report is produced semi-annually, is very simple and contains only the Executive
Summary. Optionally, the Summary of Changes would be included in the report. The
items of interest will be primarily cost and the impact to the quality of service delivery
caused by these changes. Most of the issues will have action items already taken to
address the change. Any exposures to the capacity plan or any ominous trends that have
been detected need to be presented with recommendations for action to be taken.

Type III: xxx Technical Management and Staff

Semi-annually, the capacity planner provides all of the update details to the capacity plan
in this report to xxx technical management and to other levels of management through
this report. It is the capacity-related update to the xxx tactical plan. The report would
contain the following elements:

0 Executive Overview (optional)


1 Summary of Changes -- i.e., what has changed from the last forecast or the
annual capacity plan for the whole xxx environment
2 Details of all changes:
Why they occurred
What the impact is
How they will be addressed

Page 93
IT Capacity Management Procedure

Section 4.4 -- Producing Periodic Capacity Status


Reports

The capacity status report is predominantly an informative type of report and is used to
communicate the usage of computer resources over the last month. However, any
exceptional conditions or recommendations for action to be taken should be included in
the report. The reports focus on answering some key questions:

Are resources being used as planned?


Are the workloads behaving as planned or expected?
How are current resources being used?
Were there any unplanned workload increases or capacity constraints?
What exceptions occurred during this period?
What were the underlying causes for the exceptions?
What action has been taken to correct problematic exceptions?

There should not be any decisions made from these reports unless some exceptional
situations have arisen. However, these exceptions should already have been addressed by
the performance management group based upon more detailed measurements. Since this
is a status report, the emphasis is on letting the recipient know what the current
consumption of resources are by xx Application, location xx application, location xx, or
hardware type.

There are three types of xxx monthly capacity status reports that can be produced based
upon the targeted audience: (1) Type I: location management and xx business owner,
(2) Type II: xxx senior management, and (3) Type III: xxx technical management and
staff. The information base from which these reports are derived is the same, the only
thing that is changing is the packaging of the information to the needs of the recipient.

Type I: location Management and xx Business Owner

This type of report is designed to summarize the key information needed for location
management as well as business owners and application managers on a periodic basis.
Consequently, each location or business owner is presented with only the capacity
information relative to its use. A separate report is produced for each of the twelve
locations. Within each of the location's reports, information could be organized by (1)
that location's "location xx" applications, essentially just their LPARs, (2) xx application
resource consumption, and (4) the location's xx.

Page 94
IT Capacity Management Procedure

Type II: xxx Senior Management

This type of report is a high-level exception report. Produced periodically, it focuses on


highlighting exceptions in: (1) usage of hardware resources, (2) usage of location xx
applications, (3) usage of xx applications, and (4) location xx. Also, unexpected trends
in hardware, software, and applications are addressed. An abbreviated and very simple
example of that type of report is shown in Figure 4.4.1.

Type III: xxx Technical Management and Staff

This type of report covers all IT resources at xxx and is produced monthly. These are
primarily for the use by the capacity planner and the performance management staff.
They are used in conjunction with the performance management and service level
management reports to determine if the month's capacity needs have been met and if
there is any need for corrective action.

Page 95
IT Capacity Management Procedure

Executive Level Capacity Status Report for Second Quarter, 2006

Hardware Resource Exceptions during Second Quarter, 2006

An unforeseen increase in the volume of work for the ABC application caused System A to exceed its planned usage during
the first three days of May. The increased volume of work was not caused by an increase in business volumes but by a
change to the structure of the application's data base.

Impact: Service levels were not missed during those three days. However, peak planned resource levels were 20%
higher than planned for those three days.
Action initiated: The xxx performance team quickly addressed the problem with the application developers and brought the
increased volume of work down to the planned level. No further action required.

...
Hardware Forecast Trends

A planned upgrade of DASD in support of location mm will be implemented in July, 2006 to meet a planned change to
Application WW.
...
location xx Exceptions during Second Quarter, 2006

All location xx applications performed within 10% of their planned usage of system resources during the month. The ABC
application aberration during the first three days did not affect service levels and had an explainable cause.

...

location xx Forecast Trends

No significant trends have been noted for any location xx application. The current forecast for Third Quarter, 2006 is
unchanged.
...

xx Application Exceptions during Second Quarter, 2006

The migration of location nn to the xx Application yyyy required 20% more resources during the peak hour than planned.

Figure 4.4.1. Sample of a High-level Type II Capacity Status Report for xxx senior management.

Page 96
IT Capacity Management Procedure

Page 97
IT Capacity Management Procedure

Section 5 -- Developing and Maintaining

8/10/12 IBM and xxx Confidential Section 5 - Page 98


IT Capacity Management Procedure

Procedures, Tools, Techniques, and Standards

In order to prepare for capacity planning, there needs to be a set of standards and
procedures in place. This section begins with some basic principles and guidelines for
the xxx capacity management process and builds upon that to cover the xxx-specific
details needed to implement the process.

Although the establishment of a process begins with the documentation of the repeatable
activities, the identification of customers and suppliers along with their inputs and outputs,
and value-add activities for continuous improvement incentives, implementation requires
a set of foundational standards and procedures upon which to build. There is a lot more
involved in implementing a process than just outlining what needs to be done. The
capacity planner needs to assemble a tool kit of standards, definitions, techniques, and
concepts to execute his job. This section covers the basics and provides a high level
customization guide for developing that foundation:

5.1 Data Management


5.2 Terminology and Metrics
5.3 xxx IT Resource Model
5.4 Workloads and Traffic Types
5.5 Network Connection Types
5.6 Network Tools and Metrics

Standards are necessary for an effective capacity management process. They provide a
common definition and description of what is to be collected, when it is to be collected,
how it is to be collected and how it is to be reported. Their use simplifies communications
and facilitates a common understanding of the systems environment. Standards for
capacity planning fall into three general categories: (1) data collection, (2) management
reporting, and (3) performance thresholds. While other areas may need standards,
these three are the most important for the capacity management process.

Formalized procedures and techniques are needed to facilitate the execution of


capacity planning activities. Although not all techniques can be anticipated and
documented in a single methodology, the subsections in Section 5 and 6 focus on
techniques which have been customized to the xxx environment. A goal of formalizing
procedures is to automate as many of the repetitive activities as possible so that the
capacity planner can be freed-up to focus on more challenging analysis, forecasting,
modeling, and communication activities. Automation has become a critical success
factor in many organizations to effectively and efficiently manage the capacity of its
Information System's resources. To prepare for automation, one must first:

1. Review the existing procedures


2. Identify the time consuming, repetitive procedures that need to be performed by the capacity
planner

8/10/12 IBM and xxx Confidential Section 5 - Page 99


IT Capacity Management Procedure

3. Define automation criteria (e.g., frequently executed system tasks with very little variability)
and objectives (e.g., improve productivity)
4. Select automation candidates
5. Formalize all procedures
6. Automate those of immediate benefit.

Large benefits can be gained by automating labor-intensive data collection, extraction,


and summarization activities. Most organizations already have automated procedures to
dump monitor data and build the capacity and performance database for system and
network data. If monitoring tools are used that do not write SMF records, their outputs
should also be automated. One future plan for xxx is to consolidate all system resource
data into the MICS database.

Additional benefits can be gained by standardizing and automating the production and
distribution of periodic capacity-related reports, especially Capacity Status and historical
trends reports. Formalized procedures will also allow a less skilled individual to perform
tasks that can not be automated.

This section addresses network tools needed to facilitate the process. Contrary to
standard practice in most firms, it should be noted that tool selection criteria for
capacity planners is defined by the needs of the business as reflected in the capacity
planning process. This is a different approach than what occurs by default in most
organizations. In implementing a new capacity management process, it quickly becomes
evident that a small set of tools can do all that is needed. Standards are more easily
maintained, productivity is improved, and transferable expertise is built up on a smaller
tool set within the capacity planning team.

Key point: Process-driven tool selection will lead to a common set of fewer tools, greater
breadth of understanding of tool capabilities among capacity planners, and improved
productivity.

8/10/12 IBM and xxx Confidential Section 5 - Page 100


IT Capacity Management Procedure

Section 5.1 -- Data Management

The complexity of the xxx environment necessitates an organized approach to the


collection of system and network data and the development of documentation standards
in order to maintain control over the potentially large volumes of data needed to support
the capacity planning process. This subsection covers this topic by exploring five areas
which are crucial to the successful implementation of a Capacity Management data
management plan:

5.1.1 Overview of Data Management Activities


5.1.2 Classifying the Capacity Planner's Data
5.1.3 An Approach to Capacity Data Management
5.1.4 Techniques for Managing the Data
5.1.5 Capacity Data Base

8/10/12 IBM and xxx Confidential Section 5 - Page 101


IT Capacity Management Procedure

Section 5.1.1 -- Overview of Data Management Activities

At times, more than half of the capacity planning effort is spent collecting and managing
data. While some of the data gathering process is automated, a lot of the most important
information (e.g., business metrics, application plans) has to be collected through
research or personal interviews. The absence of a data management plan for capacity
planning is one of the major causes of inefficiency and ineffectiveness in the capacity
management process.

This topic outlines the five key activities which compose the data management
subprocess. Each activity is critical to the success of the rest of the capacity
management process:

1. Collecting data
2. Storing data
3. Archiving and accessing data
4. Analyzing data
5. Reporting data

Collecting Data

The collecting data activity focuses on insuring that the right data is collected in a timely
and efficient manner. This is accomplished by identifying the capacity data to be
collected, creating a data source matrix to locate the data's source, and identifying the
format and the means by which the data is collected.

The capacity planner is responsible for defining his data collection requirements to the
data collection specialist (or whoever does the data collection):

0 What data/metrics are to be collected?


1 When it is to be collected?
2 In what format the data is to be stored?
3 How is the data to be stored?
4 How critical is the data?
5 How long is the data required to be retained?
6 How data validation is to be done?

The data collection specialist is responsible for putting procedures in place to meet these
requirements. Section 5.1.4 provides sample reports that can be used to help summarize
some of the requirements.

Storing Data

The data collection specialist is responsible for the storing data activity. As a part of this
activity, the specialist defines:

8/10/12 IBM and xxx Confidential Section 5 - Page 102


IT Capacity Management Procedure

0 the place where the collected data will reside


1 the format of the data
2 the method for extracting and summarizing the data from the original source
and storing it into the MICS data base
3 what the metric means (e.g., "messages per second" needs to be broken
down into what constitutes a message for the tool that measured and
reported the data).
4 the method for summarizing the data and storing it into a MICS data base
5 the procedures for maintaining a data inventory (including a data dictionary
type function for identifying where the data is used).

Archiving and Accessing Data

The main purpose for the archiving and accessing data activity is to ensure that the
required capacity data is available when needed by the capacity planner. This includes
determining how long the data needs to be retained and how it can easily be retrieved.
Related responsibilities to this activity for the data collection specialist are:

0 documenting the procedures and tools for accessing the data to be used by
the capacity planner for analysis, forecasting, and reporting
1 documenting and executing tools for archiving the data (this includes the
identification of retention periods and levels of aggregation)
2 defining and implementing methods to retrieve data from archived files
3 maintaining the security of the data.

Analyzing Data

The analyzing data activity within the capacity data management function does not relate
to the analysis of the content of the data, rather it focuses on analyzing the data to make
sure it conforms to the needs of the capacity management process. In other words, it
seeks to validate the data and provide feedback if the data doesn't conform to
requirements in either accuracy, format, or content.

The analysis activity looks at how well the data management process is functioning and
looks for ways to improve the process to better meet the capacity planner's data
requirement. The data management specialist focuses on answering these questions:

0 is the data being collected and stored properly?


1 is the data always formatted correctly?
2 are data archival and security requirements being met?
3 can the procedures used for data management be improved?
4 is there a better capacity data base product which would improve the usability
of the data and its quality?

8/10/12 IBM and xxx Confidential Section 5 - Page 103


IT Capacity Management Procedure

5 is there a better capacity data base product that can better meet the capacity
planner's requirements at a lower cost?

Reporting Data

The reporting data activity focuses on delivering information on the status and "health" of
the data. It also reports information to interested parties on operational statistics related
to data management such as the number of errors detected during a month, number of
bytes stored and/or archived, and any issues related to the process.

The reporting data activity would respond to questions such as:

0 how many data files (or data bases) do I have and what is the content and
size of that data?
1 where are certain data fields used?
2 can I merge data from these two sources?
3 do I have any room in this data base to accommodate the new SMF records
that I need to collect?

8/10/12 IBM and xxx Confidential Section 5 - Page 104


IT Capacity Management Procedure

Section 5.1.2 -- Classifying the Capacity Planner's Data

The amount of data to be collected by the capacity planner for the host and network
environment can be staggering. The way to begin to make decisions about what data to
collect, which data is important and which data needs to be retained is to create a
classification scheme for the capacity planner's host and network data. This data
classification scheme is based on eight major types of data that the capacity planner
handles:

0 business
1 host load
2 traffic load
3 estimates of demand
4 service
5 performance and capacity usage
6 operational measures
7 product.

That data can be further classified as metrics and non-metric data. Our focus in this
subsection will be primarily on the metric data type.

Definition: A data metric is a single numeric value representing a characteristic or


attribute of the business, host, or network. It can be either a directly measured value or a
derived value.

Non-metric data is composed of such diverse information as: interview notes, customer
surveys, hardware vendor product specification sheets, business plans. This data is
usually maintained in the capacity planner's personal filing system. Numerous references
to this type of data are made within various sections of this methodology.

Data Classification Scheme

Table 5.1.2.1 lists the eight data types in the first column and identifies which of the four
host and network views is the Primary (P) data source and which is the Secondary (S)
data source. This data classification scheme helps to identify and assign data ownership.

8/10/12 IBM and xxx Confidential Section 5 - Page 105


IT Capacity Management Procedure

Type of Data Componen Location Bus/Applic. User


t
(1) Business P
(2) Host Workload P S P S
(3) Traffic Load P S S S
(4) Estimates of Demand P S P S
(5) Service P P P P
(6) Performance and Capacity Usage P S
(7) Operational P P P P
(8) Product P

Table 5.1.2.1. Classification of Capacity Metrics

(1) Business: This type consists of data which characterizes the business activity that drives
the IT resource load at the host and in the network. Most of this data is time series data,
i.e., it is a series of data points sampled or collected at regular intervals of time.

Examples of this type of metric are: Daily or hourly counts of business transactions, daily
or hourly quantities of funds transferred, number of checks processed, daily count of
customer requests, total number of customers at the end of a month, net number of new
customers added per month. This category quantifies business activity over time and data
can include the forecasted or estimated values for business metrics as well as the
historical data.

Examples of non-metric data that fits into this category are: business plans, business
driver correlation and regression studies, application plans, application design documents,
interview data, business reports on operational data (e.g., business volumes for a central
application).

(2) Host Load: This type of data characterizes the IT resource load on the major host
components -- processor, storage, DASD, and Tape.

Metrics can be either directly measured or derived and can be expressed as specific to an
application, workload, business area, location, or user group depending on the level of
granularity allowed by the measurement tool. They are also time-specific and usually
expressed as data points measured or derived on a periodic basis to form a time series.
There are some basic types of host component load measurements that fit in this
category: (1) throughput (e.g., jobs/second, transactions/second), (2) utilization (as a
percent or as an amount consumed, e.g., MIPS), (3) Activity rates (e.g., I/Os per second),
(4) workload characterization ratios (e.g., Relative I/O Content), and (5) overhead factors
(e.g., measured capture ratios, LPAR management overhead).

Examples of non-metric data within this data type are: capacity reports, performance
graphs, performance studies, "morning" reports, problem log for host performance, vendor
documentation on processor, DASD, and Tape performance.

8/10/12 IBM and xxx Confidential Section 5 - Page 106


IT Capacity Management Procedure

(3) Traffic Load: This type is very similar to the host load metrics except that the IT
resource load is relative to network nodes and links.

The same basic types of load metrics as the host are created for the network: (1)
throughput (e.g., characters/second, messages/second, bits/second, frames/second,
cells/second, and packets/second), (2) utilization (as a percent or an amount that has
been allocated or consumed -- this could apply to access nodes or links) (3) Activity rates
(e.g., "Reads and Writes per second" for Channel Extenders, error rates, number of bytes
retransmitted, percent of messages retransmitted), (4) traffic characterization (e.g.,
percent mix of traffic types on a particular 3745, average message/cell/frame size), and
(5) overhead factors (e.g., measured average protocol overhead, measured compression
factors).

Examples of non-metric data within this data type are: network capacity reports,
performance graphs, performance studies, "morning" reports, problem log for host
performance, vendor documentation on network component performance studies.

(4) Estimates of Demand: This is estimated or forecasted data for the network and the host
environments. The particular metrics to represent the demand for IT resources depend
upon the source of the estimate. Most estimates for applications, locations, or systems will
be translated into an overall estimated resource load. Some common metrics that can be
used to characterize that estimate are: characters per second, MIPS, transactions/second,
I/Os per second, messages/second, and business packages/second.

Examples of non-metric data within this data type are: business forecasts (if not kept in
the Business category); change list for planned changes in hardware or software for the
next 6-12 months.

(5) Service: This data relates to the level of service delivered to xxx customers.

All metrics identified in service level agreements are included in this category. Also, the
comparison of actual performance against internal service objectives is included in this
category. Examples of this type of metric are: number of service level exceptions per
month, response time, network transit time, host transit time, percent of reports delivered
on time, average batch job turnaround time during the prime shift, and percent of time an
application was available during a month.

Examples of non-metric data within this data type are: service level agreements, service
level compliance reports, any service reports, service delivery plans.

(6) Performance and Capacity Usage: This is the data related to special performance
studies, benchmarks, and intensive performance measurements.

Examples of metric data are: performance data from a particular benchmark run, program
timings, processor and router capacity ratings.

8/10/12 IBM and xxx Confidential Section 5 - Page 107


IT Capacity Management Procedure

Examples of non-metric data within this data type are: reports from special studies,
vendor performance and capacity information, performance and capacity reports, capacity
plan, capacity reports.

(7) Operational: This data is a measure of the efficiency and effectiveness of host and
network operations.

Examples of metrics of this type are: number of Help Desk calls received, number of
batch jobs received during the day, number of job reruns during a shift, number of
diagnostic line traces done per day, number of tape mounts, number of print lines (or
boxes of paper) per shift.

Examples of non-metric data within this data type are: operation schedules, problem logs,
change reports, production schedules, configuration reports, topology diagrams.

(8) Product: There is only one metric type within this category that is relevant for the capacity
planner and that is the product type which describes the attributes (of which capacity is an
attribute) of the software or hardware product. Much of the data is what would be seen in
the Resource tables that a capacity planner keeps except this is stored in a capacity data
base.

Some of the metrics one would expect to find here are: hardware component "feeds and
speeds", rated capacities, data compression factors and algorithms, overhead factors,
rules of thumb for capture ratios, protocol overheads, capacity charts for CPU and DASD,
and other "overhead" items. The lack of volatility for this data is its primary characteristic.
There are no time series data at all in this category. The data captured is driven primarily
by events such as the addition of a new hardware device, the migration to a new release of
an operating system, and updated rated capacity numbers for IT resources, and changes
to compression algorithms.

Examples of non-metric data within this data type are: vendor documentation, vendor
product specification sheets, advertisements, articles from magazines and periodicals.

Table 5.1.2.2 provides a summary of these eight data types along with examples of which
data collection techniques are used for this data type and some examples of data that
may be expected to be collected within this category.

8/10/12 IBM and xxx Confidential Section 5 - Page 108


IT Capacity Management Procedure

Type of Data Data Collection Tool or Examples of Data


Technique (examples) to be Collected
(1) Business 0 Use forms for Sections 2 and 0 Business volumes
3 0 Business drivers
0 Interviews 0 Business plans
0 Automate reduction of
business data
(2) Host Load 1 Use forms for Sections 2 and 1 CPU utilization
3 2 I/O rates
2 RMF/SMF data 3 Jobs/second
(3) Traffic Load 3 Use forms for Sections 2 and 4 FEP utilization
3 5 Error rates
4 NetSpy 6 Characters/Second
5 NetView
(4) Estimates of Demand 6 Use forms for Sections 2 and 7 Business forecasts
3 8 Growth statistics
7 Interviews 9 Transactions/seco
8 User surveys nd
(5) Service 9 Use forms for Sections 2 and 10 Response time
3 11 Error rates
10 Selective sampling of
response time via PC drone
tool
11 Interviews
12 NetSpy
13 RMF/SMF
(6) Performance & Capacity 14 NPM or NetSpy 12 FEP buffer
Usage 15 NetView utilization
16 RMF/SMF 13 Message rate
14 Message size
(7) Operational 17 Problem Logs 15 Component
18 Interviews availability
16 Number of
problems
17 Problem resolution
time
(8) Product 19 Reading 18 Component "Feeds
20 Vendor & Speeds"
discussions/presentations 19 "Rated" capacity of
a component

Table 5.1.2.2. Examples of Tools and Data for each major Data category.

8/10/12 IBM and xxx Confidential Section 5 - Page 109


IT Capacity Management Procedure

Section 5.1.3 -- An Approach to Capacity Data Management

While planning for data collection usually occurs on a project-by-project basis, a rigorous,
systematized approach to data collection is typically not done for a capacity planning organization.
For the xxx capacity management methodology, the development of a plan or strategy for
capacity data management needs to consider inclusion of these key elements:

0 vision of where data management fits within the overall capacity management
process and clear outline of roles and responsibilities
1 classification scheme for capacity data
2 classification of levels of intensity for data collection efforts
3 what is the source and purpose of the data to be collected
4 for each data type: a description of how it is to be collected, a description of who is
to collect it and how often it is to be collected, where the data is to be stored (see
Section 5.1.2 for suggestions on how to document this)
5 identification of tools to collect the data, who owns and manages the tools, how is
access to the data controlled, how it is backed up and recovered, as well as a
retention plan (i.e., how long is the data needed at this level of granularity)
6 process for validating the data
7 feedback mechanism for ensuring valid data and that the process is functioning as
designed.

While this methodology does not provide xxx with a data management plan for capacity
management, various sections within this document provide a significant amount of guidance on
how to develop the information for each of the key elements of a plan identified above:

Section 5.1.2 provides ways to structure data. This looks at the categories of data to be
collected and where they fit into an overall schema that can be handled by a capacity data base
system like MICS.

Section 2, along with its forms in Appendix C, provides an approach to collecting data at each
major milestone within the application development life cycle. A review of this section and
the forms provides the data collection specialist and the capacity planner with a comprehensive
list of data to be collected.

Section 3, along with its forms, focuses on capturing the appropriate data on the existing
host and network environment to support the capacity management reporting function and
capacity forecasting.

Section 6.2 describes the creation of a host and network baseline. This involves the collection
of additional metrics specific to building the reference point upon which the forecast can be built.
From this section, the appropriate metrics for baselining are described.

8/10/12 IBM and xxx Confidential Section 5 - Page 110


IT Capacity Management Procedure

Section 5.1.4, Techniques for Managing the Data, describes some approaches to documenting
an inventory of the data collection facilities.

Many of the sections also describe precisely what data needs to be collected; however, it is
recognized that the implementation of these approaches must be practically adapted to xxx'
existing tool set and disciplines.

As can be seen from the numerous references to data collection activities, metrics, and
requirements within this methodology, this particular process is considered to be critical to the
success of the new xxx capacity management process. In fact, implementation of the many data
collection suggestions contained within this methodology is expected to be a priority within the
new process. It is a prerequisite to maturing the other activities of capacity management:
forecasting and reporting.

8/10/12 IBM and xxx Confidential Section 5 - Page 111


IT Capacity Management Procedure

Section 5.1.4 -- Techniques for Managing the Data

This topic focuses on a method of formalizing the inventory of data and tools to clearly reflect what
system-generated data is produced, how it is produced, and how it gets summarized by extraction
and reporting tools. The formalization is in terms of inventory reports for three key capacity data
management functions:

1. Data collection
2. Data summarization
3. Data reporting

Examples of inventory reports are provided via Figures 5.1.4.1 through 5.1.4.3.

The Data Collection Inventory Report, Figure 5.1.4.1, is an example which shows the monitor
records and their formats as produced by the system and network data collection tools. Figure
5.1.4.1 uses the SMF record format as an example. The level of detail is up to the individual and
should support its shared use across departments. Using this example as a starting point will
provide insight as to what gets produced, when it gets produced, how, and where it ends up. The
final data repository should be the performance and capacity data base -- an extraction of the
pertinent performance and capacity information and summarization of that information.

The Data Summarization Inventory Report, Figure 5.1.4.2, is a description of the performance
and capacity data base: how it is built and maintained, the frequency of extractions, and the
grouping of information are some of the information to be documented here. Tools such as MICS
provide facilities for reporting the information that is stored. These reports should be reviewed
periodically to ensure that: all of the information that is needed is extracted and stored; and, any
extraneous information is eliminated -- thus, saving disk space.

The Data Reporting Inventory Report, Figure 5.1.4.3, documents the reports produced, how
they are produced, the owner of the report (i.e., its creator), the report's recipient, and the
frequency of its distribution. Additionally, this report provides a quality check by ensuring that the
report specifications are identified that support the recipient's needs.

8/10/12 IBM and xxx Confidential Section 5 - Page 112


IT Capacity Management Procedure

________________________________________________________________
___

Data Collection Inventory Report

Record Format: SMF


Collection Tools and Source: IBM: MVS, RMF, DB2, CICS
Collection Frequency: Continuously
How Often Records are Created: Varies by record -- event and interval driven
Record Type and Size (bytes): VBA, 32760
Archive Requirements: 5 years off-site
Permanent Data Storage: Tape cartridges, SMFARCHx names
File Size: 600MB, 1 cartridge per day
Number of copies: 2
Online Disk Space Requirements: 458 MB/day, 340 3390 cylinders

--------------------------------------------------------------------------------

Data Description When Written Written By Extracted To

SMF0, IPL Each IPL MVS MICS-Acctg


SMF6, JES Print Job purge MVS MICS-Acctg
SMF14/15, NonVSAM datasets Dataset close MVS MICS-Acctg
SMF70-78, Performance Every 15 minutes RMF MICS-Performance
SMF245, DASD Cache Daily Cache monitor MICS-Performance
SMF28, Network Sample Interval NPM MICS-Performance
SMF132, NetSpy Sample Interval NetSpy MICS-Performance

_____________________________________________________________________________

Figure 5.1.4.1. Sample Data Collection Inventory Report

8/10/12 IBM and xxx Confidential Section 5 - Page 113


IT Capacity Management Procedure

_____________________________________________________________________________

Data Summarization Inventory Report

Record Format: MICS


Summarization Tools and Source: Legent: MICS
Summarization Frequency: Daily at end-of-day
Archive Requirements: File extract tape cartridges kept 5 years off-site
Data details summarized after 2 weeks
Permanent Data Storage: 3390 volumes, MICS01-02
File Size: Various databases totalling 1200MB

File Names: SYS1.MICS.ACCTG, MICS-Accounting


SYS1.MICS.CMPM, MICS-Performance & Capacity Planning
SYS1.MICS.NW, MICS-VTAM & NCP

SYS1.PROCLIB Extraction Extraction


Data Source Procedure Frequency Destination

SMF Records XTRACTS1 Daily at EOD MICS-Acctg


CICS Journal Records XTRACTC3 Daily at EOD MICS-Acctg
Netview Log Data XTRACTN1 Daily at EOD MICS-Acctg
DB2 Analyzer Records XTRACTD1 Weekly MICS-Performance
SMF245, DASD Cache Daily Cache monitor MICS-Performance

Note: A record of the specific fields extracted and new variables produced are also recommended. This
can easily be accomplished by highlighting the fields on a copy of the data source record
descriptions and printing a list of variables and descriptions from the extraction data base.

_____________________________________________________________________________

Figure 5.1.4.2. Sample Data Summarization Inventory Report

8/10/12 IBM and xxx Confidential Section 5 - Page 114


IT Capacity Management Procedure

_____________________________________________________________________________

Data Reporting Inventory Report


Reporting Tool and Source: MXG - Merrill Consultants
Report Procedures: SYS1.REPORT.PROC
Report Specifications: CP.REPORT.SPECS

Report Report Report Report Report Report


Specs Description Owner Recipients Procedure Frequency
CRPT LPAR Status Joe Sue, Bob, Len CRPT22 Monthly
CTREND CPU Util. Trends Joe Sue, Len CTREND3 Quarterly
DTREND DASD Space Trends Karen Sue, Charlie DTREND2 Quarterly
SFCAST Semi-annual Projection Harry Mike, Sue SFCAST13 Semi-annually

________________________________________________________________
___

Figure 5.1.4.3. Sample Data Reporting Inventory Report

8/10/12 IBM and xxx Confidential Section 5 - Page 115


IT Capacity Management Procedure

Section 5.1.5 -- Capacity Data Base

Dealing with large volumes of data is a challenge to the capacity planners. Efficient handling of all
of the data identified within this methodology document necessitates the use of a robust capacity
data base tool. Currently, xxx uses a well-established capacity data base tool called MICS. It was
designed for capacity planners and performance managers and greatly reduces the manual effort
that could be involved in handling all of the data.

The MICS product supports each of the five data management activities identified in Section
5.1.1: collecting data, storing data, archiving and accessing data, analyzing data, and reporting
data. However, in the implementation of the capacity data base there are at least two main areas
that need to be well understood by the capacity planner and the data collection specialist:

0 single versus multiple data repositories for all xxx capacity data
1 control over access to the data

The first area to be understood is that of controlling the potential proliferation of data bases. This
proliferation of data bases can occur if different organizations believe that to "use the data, you
must own the data". This view is not only false, but can lead to a some problems with capacity
data being used incorrectly.

The use of a single xxx capacity or performance data base is necessary to insure that capacity
data is controlled by and available to the capacity planners. This is done by extracting only the
necessary data from raw SMF data or other monitoring tool output and structuring it into a
database (i.e., MICS), organized to produce reports as required and summarized to keep disk
space to a minimum. A single repository for capacity data is desired to alleviate hunting for the
data and to provide a single capacity reporting interface.

The next issue deals with control over the capacity data and its use. From the standpoint of this
methodology, it was assumed that the control and maintenance of the capacity data was the
responsibility of skilled performance or capacity personnel. However, because the same data is
of value to others, such as operations and performance analysts, the sharing of the information
may be necessary. Two approaches may be taken to deliver the necessary information to others:
(1) provide a reporting service to handle their requests and produce the reports presenting the
information they need or (2) make the database directly available to them and let them produce
their own reports.

Regardless of which approach is taken, misuse and misinterpretation of capacity data needs to be
minimized. The former approach may provide the better alternative to manage this; however, the
number of capacity data requests by other groups must be understood and monitored. One
approach to handling those requests is to consider the advantages of providing a data
consolidation and report service that complies with defined standards and procedures. This would
provide for a number of benefits to the capacity management process:

8/10/12 IBM and xxx Confidential Section 5 - Page 116


IT Capacity Management Procedure

1. Ability to control the number of unique reports, and therefore the resources needed to produce
them
2. Ability to assure that the reports clearly present the information and that the data is indeed what the
recipient needs to make a decision
3. Ability to assure that the data used is valid and is not compromised by some abnormal event
during the processing period.

Caution: The exposure to the integrity of the capacity planning process is that non-capacity
planners may use the MICS data inappropriately and draw the wrong conclusions about current
capacity status and usage. The guiding rule should be that the capacity data is owned by the
capacity planners and its interpretation is the responsibility of the capacity planners.

8/10/12 IBM and xxx Confidential Section 5 - Page 117


IT Capacity Management Procedure

Section 5.2 -- Terminology and Metrics

One of the strongest requirements for an effective integrated capacity management process is to
have crisp, clear terminology which is used throughout an I/S organization like xxx. This facilitates
clear communications with IT management executives, Location management, and end-users.
This subsection defines the essential terms needed for a common host and network process.
The following terms are defined:

Capacity
Utilization
Workload
Bandwidth
Throughput
Response time
Performance Thresholds
Service Level Objectives
Order point, or Acquisition point
Peak Period
Performance Spike

Appendix G, Glossary, contains a very brief high level description of these terms. Section 5.6,
Network Tools and Metrics, describes some additional terms that are more platform-specific. It
also ties together the metrics needed and what tools help to gather or create them.

8/10/12 IBM and xxx Confidential Section 5 - Page 118


IT Capacity Management Procedure

Section 5.2.1 -- Definitions

Capacity

Of all capacity management terms, the most important and the most misused of these terms is
capacity. In its broadest terms, the word capacity usually conveys the idea of some physical limit
on the capability of a host resource or network component to perform work over a period of time.
For processors, a common descriptor of capacity has been MIPS; although other measures such
as ITR (Internal Throughput Rate) can be more accurate in conveying a processor's capability to
process work. Overall, the capacity of a component or system is usually expressed as the
component's or system's throughput (see definition for throughput later in this section).

A broad, operating definition of capacity used in this methodology is:

Definition: The capacity of an IT resource or component is the maximum amount of work that a
component can process within a given period of time.

However, this is not sufficient for planning purposes. Further classification of capacity into
descriptive types is useful at this point to differentiate the different uses and purposes for which
capacity is used. There are several useful ways of defining capacity and capacity-related
thresholds for host and network components:

Capacity-related Threshold Definitions:

1. Theoretical (or Engineering)


2. Effective
3. Practical

Capacity Definitions:

1. Available (or Usable)


2. Allocated (and it opposite Unallocated)
3. Used or Planned
4. Buffer for Growth
5. Contingent (or Contingency)
6. Overhead

The relationship between each of these is illustrated in Figure 5.2.1.1.

8/10/12 IBM and xxx Confidential Section 5 - Page 119


IT Capacity Management Procedure

Max Theoretical Capacity Threshold


Overhead

Unallocated or Reserved
Effective Capacity Threshold
(e.g., MIPS, GBs, CPS)
Contingent Practical Capacity Threshold
Throughput

Available Buffer for Growth


or
Useable Allocated

Used or Planned

Figure 5.2.1.1. Definitions of "capacity" for a host or network resource.

Theoretical or Engineering capacity is used to describe the vendor's stated maximum capacity
threshold of the component. This can be expressed as the capability to transfer data at 56k
bits/second. However, this can be misleading since 20% of that "capacity" can be consumed by
non-data transmissions required by the line protocol. The concept of "available" capacity helps to
describe this gap, but not all components are like that. For example, channel speeds on S/370
processors were typically reported as being the rate at which actual data can be sent across the
channel -- protocol time was excluded from the channel speed rating.

Effective capacity is a threshold that takes into consideration the service objectives for a given
resource. Usually this translates into the capacity point at which queuing begins -- this is also
typically expressed as a percent utilization of a resource. For the xxx methodology, that is the
definition that will be used. There are many cases where the effective capacity is much less than
the theoretical capacity.

Practical capacity is a threshold that takes into consideration service objectives (effective
capacity threshold), estimated downtime (component availability), and overhead. It is equal to :

Estimated Component Availability x Effective Capacity Threshold x Available Capacity

Available capacity describes the amount of technical capacity that is actually available for doing
work. Essentially, for a host environment this would be the technical capacity minus the LPAR
management overhead (typically around 5-7%). This is the amount of processing power available
to applications to do work on that processor. For the network, the same computation would apply:
subtract the protocol "overhead" from the technical capacity to get the true available capacity of
the network resource to do work.

8/10/12 IBM and xxx Confidential Section 5 - Page 120


IT Capacity Management Procedure

Allocated capacity is the amount of available capacity that is either physically allocated (as in the
case of the channels within the Backbone trunks) for a resource or the planned amount of
capacity needed to meet the service objectives for the system, application, or workload. For a
fully allocated resource, it is the sum of: (1) Buffer for Growth, (2) Contingent capacity, and (3)
used or expected capacity to be used during the peak period (either hour , half hour, or 15-minute
period).

Unallocated capacity is the gap between what is allocated and the sum of the buffer for growth,
contingent capacity, and "used" capacity. This is usually not a meaningful number until one deals
with physical allocations. Examples of allocated resources are: (1) allocating or dedicating
bandwidth on the Backbone, (2) allocating memory for logical partitions (LPARs) on a processor
complex, and (3) allocating ports on a 3745.

Used (or planned) capacity is that amount which is actually used or planned to be used during a
production period. For a capacity planner, this used capacity has two components: (1) the
amount used (or planned) for "typical" processing and (2) the amount used (or planned) for
"peak" processing. The capacity planner determines what is "typical" and what is "peak" (e.g., is
it the peak hour of the month or the average daily 15-minute spike).

Buffer for Growth is the required amount of capacity that is estimated to meet future growth
requirements. This buffer is the amount of capacity that is expected to be equal to 0 when this
hardware resource is "out of capacity".

Contingent (or Contingency) capacity is that amount of a resource's capacity that is "set-aside"
or "reserved" for use in emergency situations such as a component or site failure. Through
thorough planning, the amount of the contingent capacity needed to meet recovery requirements
is determined and included in the capacity plan.

Overhead capacity is that capacity used by the component and its software to service the user's
work.. For example, on a partitioned processor, it would include the capacity used to managed the
logical partitions. Overhead on a host processor may also include operating and subsystem
consumption. Network overhead would be the additional information attached to user data to
transmit it through the network and network protocol management messages. Overhead is not a
trivial amount that can be ignored and must either be accounted for separately in this overhead
category, or distributed to the user workloads and accounted for in the used or planned category.
It should be clearly delineated and documented.

Another term used to describe a workload-specific capacity for a particular vendor's hardware
device is rated capacity. This is a figure usually provided by a vendor or service group and
represents an estimated maximum capacity based on some measurements obtained from the
vendor or independent benchmarking group. For example, in an IMS environment, a particular
vendor's processor complex may be rated at 100 MIPS or as having an Internal Throughput
Rate (ITR) of 21 IMS transactions per CPU busy second. A rated capacity conveys with it the fact
that some benchmark has been performed for a particular type of workload for a particular vendor
hardware product. This term applies equally as well to network hardware products.

8/10/12 IBM and xxx Confidential Section 5 - Page 121


IT Capacity Management Procedure

Utilization

As with the previous term, utilization is a very misunderstood and misused characterization of the
usage of a network or host resource over some period of time. People always attach some value
judgment to the term when it is connected with a lower-than-expected numeric value. In this
methodology, this term has two very particular meanings:

Definition: Utilization is: (1) measured -- the percent of time that a host or network resource
was found to be busy during a particular measurement interval, or (2) derived -- the percent of
resource "capacity" that was used during a particular measurement interval.

An example of the first definition is a host processor that was measured by the operating system
to be 80% busy. A complication that arises from this approach is when a processor complex
contains more than one processor. In this case, is the utilization contained within the range from
0% to 100% busy or is it the sum of all of the processor busies for the entire complex. The
simplest representation is to use the range from 0% to 100% and describe the utilization metric is
being for the entire processor complex.

An example of the second definition for utilization is a router that processes an average of 150
pps (packets per second) for an hour. In this case, the utilization needs to be derived based upon
the "capacity" of the router. If the capacity for the router is 300 pps, then the utilization is derived
to be 50%.

Others have taken the definition further and differentiated several types of utilization such as
effective utilization, adjusted utilization. These other definitions help clarify the meaning of
the term within a particular context. Generally, at xxx neither of these two terms have much use.

Workload

This term is described within Section 5.4, Workloads and Traffic Types. As a summary,

Definition: A workload is a logical grouping of processing work for the purpose of


management, measurement, or control.

Bandwidth

This term is frequently used to describe the carrying capacity of a link. It has a very technical
definition which is not normally used: the difference between the highest and lowest frequencies
on a link (expressed in Mhz). Although the more frequent used definition is fine since most
everyone knows what it means, this methodology avoids the use of this term in order to maintain
consistency.

8/10/12 IBM and xxx Confidential Section 5 - Page 122


IT Capacity Management Procedure

Throughput

Definition: Throughput is a metric which describes the rate of work or traffic that is processed
by an IT resource over a period of time.

Throughput is closely related to capacity. Usually the capacity of an IT resource is stated in terms
of throughput such as packets per second for a router. On a host processor, two terms can be
used to describe the capacity of the processor: Internal Throughput Rate (ITR) and Internal
Execution Rate (IER). While the ITR value is a throughput measure since it reflects how many
transactions or jobs can be processed per CPU busy second, the IER metric reflects the speed of
the processor (similar to the less robust construct called MIPS).

Similarly, in the network environment, packets per second is like an ITR (as long as the resource
is run at 100% of capacity) and bits per second is like an IER. Unlike the host processor
environment, there is no clear standard for stating the capacity of a network resource. So, this
methodology will not differentiate between a measure of speed for a resource from a measure of
work in describing throughput for a network resource.

Some of the commonly used metrics for throughput that are used in this methodology are:

0 bits per second (bps)


1 characters per second (cps)
2 packets per second (pps)
3 frames per second
4 cells per second
5 messages per second

Throughput needs to be expressed in terms of a rate of work per period of time. The units per
second expressed in the previous list could easily have been expressed in terms of an hourly,
daily, or monthly rate when a resource is viewed over a long period of time.

This methodology focuses on expressing throughput in the lowest common denominator for
network resources. This means that bits per second will be used to express throughput demands
and loads wherever possible. For each of the network measures of throughput a bits per second
value can be derived. This means that a packets per second value will need to be translated into
bits per second and perhaps needed to use an average packet size (in bits) to compute the
throughput rate.

Response Time

Definition: Response time is a transaction-oriented metric that has been either measured,
estimated, derived, or sampled and represents the total amount of time between the initiation of

8/10/12 IBM and xxx Confidential Section 5 - Page 123


IT Capacity Management Procedure

a transaction by an end-user and the receipt of a response by that end-user. It is the "round-
trip" time for the transaction.

The definition sounds very simple and straightforward; however, in reality there are a lot of
different ways of measuring that "round-trip" depending upon the tool and person taking the
measurement. Whatever the delineation is, the capacity planner (and especially the service level
manager) needs to be consistent across a workload and hopefully across the entire system.

The areas to consider in measuring response time are:

0 delineation of start and end of the "round-trip"


1 whether to measure all response times or selectively sample response times
2 whether to measure real end-user response times or provide a Drone PC which
executes a script of standard representative transactions on behalf of that user
for service level agreement measurements
3 whether to look at and report on: averages, medians, 90th or 95th percentiles,
High-Average-Low, control limits, exceptions

For initiating a transaction, here are some of the choices:

0 End-user presses the ENTER key


1 First character sent
2 Last character sent
3 First character received at the host
4 Last character received at the host

For completing a transaction, there are several valid ways of marking the end of the transaction
as seen from the end-users standpoint.

0 First character received


1 Last character received
2 First keyboard unlock after ENTER key depressed

The best approach, if all of the tools would support it, is:

0 measure response time from the first character sent to the last character
received
1 use a drone PC to run through a standard set of application scripts at a user
location or location and possibly by the top 5 key applications or workloads
2 focus on 90th percentile of response time

8/10/12 IBM and xxx Confidential Section 5 - Page 124


IT Capacity Management Procedure

Performance Thresholds

Definition: A performance threshold value is a metric which has been determined to reflect a
point at which the quality of service delivery begins to degrade.

Performance Thresholds are usually established by the performance management group based
upon service level agreements. The performance managers set these thresholds to alert
themselves to potential exposures to meeting performance-oriented service level objectives and
other performance problems before they occur. The capacity planner needs to be aware of those
thresholds since they should reflect what it means to be "out of capacity" on a particular resource.

The normal guidelines for these thresholds are widely known in the MVS performance community
but are usually tailored to fit the requirements of a particular organization. Also, the group
responsible for setting those thresholds is performance management. Guidelines for these
thresholds have already been established by xxx.

Service Level Objectives

Service level objectives are established externally in Service Level Agreements and internally
within xxx. Overall, service level objectives represent the target levels of performance and
throughput that xxx and its customers expect.

Definition: An external service level objective is a statement of the service to be delivered


from xxx to its customer. This statement can include measures of responsiveness, availability,
problem handling, throughput, and other measures of the attributes of service delivery. It is
sometimes formalized into a service level agreement.

Definition: An internal service level objective is a performance goal or threshold that is more
aggressive than the external service level and is used to insure that the externally committed
service level agreements can be met.

Order Point, or Acquisition Point

This is the point in time which the capacity planner has determined that an order for a new
component or a new IT resource needs to be placed. To determine this point, the capacity
planner needs to be aware of the ordering lead times for each key component and determine
either the date or the throughput threshold at which point a new component or an upgrade to
existing equipment needs to be made so that the product will arrive before the "capacity" limit is
exceeded.

8/10/12 IBM and xxx Confidential Section 5 - Page 125


IT Capacity Management Procedure

Peak Period

The term peak period applies to applications, business workloads, business drivers,
components, systems, and nearly everything that is either forecasted or used in a forecast of
capacity requirements. The peak period can be as short as 15 minutes or as long as an entire
shift (and in some cases, longer). However, for most purposes, the peak period is considered to
be the peak hour within a day, week, or month, when the most work is being processed by the
component, application, or system under study.

One of the biggest sources of confusion about this term is how it is determined and what it
represents. For example, the peak hour for a particular 3745 or a processor can be any one of
the following:

0 the busiest integral hour during a 24-hour day (integral hour meaning, from 10
a.m. to 11 a.m. or for example 1300 to 1400)
1 the busiest 60 minute period of contiguous time during the day. Those 60
minutes do not have to begin on the hour.

The first description is referred to as a time-constant busy hour. The second description is for
the bouncing busy hour. While each is valid, the capacity planner needs to understand and
communicate clearly which one is being measured and forecasted.

Performance Spike

A performance spike is typically the peak 15 minutes within the peak hour. The duration may be
as small as 1 minute or 5 minutes. The idea of the performance spike is that it represents a
unique performance characteristic of the data and helps the capacity planner understand some
idea of the burstiness of the traffic under study.

8/10/12 IBM and xxx Confidential Section 5 - Page 126


IT Capacity Management Procedure

Section 5.2.2 -- Adjustments to Data

The capacity planner needs to take into account the various adjustments that the application data
package undergoes as it moves from its origination point to its destination. The forms in
Appendix C and Sections 2 and 3 refer to various "overhead" factors or adjustments that need to
be made to the basic data as seen from the application. There are three data adjustment topics
that will be considered in this subsection:

1. Adjustments to "user data" traffic demand estimates


2. ROT Technique for Estimating traffic overhead
3. Data Compression

Adjustments to "user data" traffic demands.

Data on a network can be described as user data or system data. User data is often referred to
as a message. This message is transformed into a transmission unit or units to get it from its
source to its destination.

Definition: A Message is the application-related information that one end of a session sends
to the other end as a whole object. It is an application-defined data unit, therefore not
architected, and not always explicitly defined; for instance, in an interactive application the
block of data may be screen formatted data of variable size. Sometimes a message is more
definitively called a block data message. Block sizes may be more important for massive
data transfer applications, such as bulk data file transfers, or tape channel extender traffic.
Besides, in some cases block size may also be the recovery unit for an application

Definition: A Transmission Unit is a block of information actually sent between network


components. It will include additional information to identify the part of the message being sent
and architecture-specific information for transmitting the data through the network
components.

System data is everything else. System data can be one of two types: (1) data sent without user
data, such as traffic associated with managing the network, establishing sessions, or response
acknowledgments, or (2) additional information attached to the user data as a result of
segmenting the message into smaller units to facilitate transfer or information needed to transmit
the unit to its final and/or intermediate destination. For the latter type, segmentation information
will be added to the transmission unit to identify it as a one part of the whole message. Likewise,
transmission information will be added by various protocols for transmission through the
network components. Thus, actual number of characters transferred will increase beyond the
original message size.

System data is commonly considered overhead characters, or simply overhead. Overhead is a


factor of the traffic type, the application transaction subsystem, communication network
architectures, protocols, and hardware/software parameter settings. Overhead can be as much as

8/10/12 IBM and xxx Confidential Section 5 - Page 127


IT Capacity Management Procedure

an additional 15-80% of the user data on a network. Thus, it must be considered when doing
capacity planning.

Key Note: The contents of data collected and reported can vary from collection tool to collection
tool. It can also differ within a tool from collection function to collection function. The metrics used
to report the size and frequency of data on a network need to be understood before applying
them. Do they include just user data, or do they include all or part of the additional characters
added to transmit the data across the network or between specific component architectures? The
key capacity metrics by tool need to be summarized in an inventory report discussed in Section
5.1.2.

To what level overhead needs to be considered, that is its exactness, depends on the capacity
effort being performed. Details are not necessary when estimating an individual
application/workload's throughput (characters per hour) before the Design Phase of the
development life cycle. At that time it may not even be known what network components will be
used, let alone, where the application will reside. Thus, the need for a Rule-Of-Thumb (ROT)
technique.

ROT Technique for Estimating Traffic Overhead

For quick capacity throughput estimates, a simple method could be applied to approximate the
average overhead in a network. Instead of applying a resource-by-resource overhead factor, an
average should suffice when no measurement data is yet available. For this case, one could
simply use a ROT, say the mid-point between the typically seen range, or 45%. ROTs, however,
need to be customized for xxx' network. Thus, a study to understand various overheads would be
beneficial.

Such a study may begin with the use of a tool that can collect and report the elements comprising
the transfer of a user/application message. Some tools, such as NetSpy, provide information that
can be used to derive an estimate of protocol overhead for workloads that pass through
components it supports. See the discussion on reporting of control PIU's in the NetSpy
documentation.

A technique used by some planners to estimate the overhead for a particular component is as
follows:

1. Measure the throughput (characters/hr) through a resource


2. Derive resource utilization based on the resource's theoretical capacity, i.e.,
throughput/theoretical capacity
3. Measure the utilization
4. Overhead = #2 - #3

Design and forecasting tools must, and usually do, apply overhead to determine overall resource
capacity and response. They will usually ask or obtain information about the maximum amount of
user data that can be transferred in one piece. This is normally defined by the application, the

8/10/12 IBM and xxx Confidential Section 5 - Page 128


IT Capacity Management Procedure

communication software, or hardware architecture. They may also account for additional system
data traffic to setup and acknowledge the transfer of user data.

Key Note: This methodology is not a cook book that will explain the multitude of considerations
that need to be considered to determine overhead -- this will be left to the design tools. Its
purpose is to better estimate the user data to be transferred. This is more important. For
example, assume that the capacity planner inaccurately estimated the user data for a workload
at 7 million characters per hour at its peak and applied an accurate 50% overhead factor to
yield 10.5 million characters per hour. But, when the workload went into production its
measured user data was 10 million characters per hour with an overhead of 50% yielding 15
million characters per hour. Now suppose a more accurate user data estimate, say 9 million
characters per hour, was obtained and a less exact overhead percentage, say 25% (100% off),
was used to yield 11.25 million characters per hour capacity requirements. Which was more
correct? The focus of this methodology is to improve the accuracy of the user data estimates.

Data Compression

Compression of user data can be considered a tuning technique from the perspective of network
transmission time. Like the analysis for overhead, one could apply a general ROT, e.g.,
compression suppresses unnecessary characters by a range of 30-50%. One can also derive the
information to be used in the conversion tables by taking certain steps during production runs of
existing file transfer programs. By noting the size of the files (in characters) to be transferred, and
recording characters actually transferred during execution of the program, one can determine a
conversion factor to be used in estimating the effect of future workloads of the same traffic type.
This process can, and should, be repeated to determine if the effect is consistent or highly
variable. If consistent, the planner has a reliable factor to be used; if highly variable, the planner
can use a conservative estimate to ensure adequate capacity is planned for.

xxxCP08, explained in Section 2.5, provides a place to document general overhead and
compression percentages. As explained in the key note above, the methodology is more
concerned with obtaining accurate user data estimates than with overheads. When network
design is done, overheads will be applied.

8/10/12 IBM and xxx Confidential Section 5 - Page 129


IT Capacity Management Procedure

Section 5.3 -- xxx IT Resource Model

In order to conceptualize all of the IT resources that need to be forecasted and to clarify
communicates to others, the capacity planner uses high-level models to organize the planner's
thoughts. This subsection presents two models for the xxx capacity planner. The first one is the
Overall xxx IT Resource Model. This model portrays all of the IT resources in a way that uniquely
classifies each resource as being a member of only one of the three layers. Section 5.3.1 covers
this model in detail.

The second model used by the capacity planner is the component-specific IT resource model.
From the planner's standpoint, any IT resource can be viewed as having no more than three
components that need to be forecasted: (1) processor (2) storage (3) channels. Each of these
three components have platform-specific metrics. Section 5.3.3, IT Resource Component Model,
describes this model for the host and network environments at xxx.

A third section, Section 5.3.2, "Typical" Location Model, describes a simple model of the network
components at a Location. This is done at a high level without any intent to minimize the
complexity of the actual configuration at a Location.

The three subsections covered here are:

5.3.1 Three-Layer Model of the IT Environment


5.3.2 "Typical" Location Model
5.3.3 IT Resource Component model

8/10/12 IBM and xxx Confidential Section 5 - Page 130


IT Capacity Management Procedure

Section 5.3.1 -- Three-layer Model of IT Environment

This topic covers the details behind the new xxx IT resource model which was introduced at a
high level in "Section 1 -- Introduction". Figure 1.5 is repeated here (as Figure 5.3.1.1) as the
starting point for our discussion.

Business/Application Layer

Business Attachments
Workstations
LANs and Servers
Access Layer Workstation
Controllers
FEPs Dial lines
Routers & Bridges
Backbone Layer Bridges
Dial-Out Manager
DBMCs components
CSUs Access Links
Trunks Backbone Links Business Applications
Access-Backbone LAN-based server
Links applications
Channel Extenders Host-based
CMC applications like
VTAM, NCP ACH, IAS...
Host transaction
processors like
IMS and TSO

Figure 5.3.1.1. xxx IT Resource Model.

Briefly, the layers can be defined as:

Business/Application layer: Generates and receives work from the business or end-user.

Access layer: Connects the user to non-local IT processing resources either through other
Access nodes or through the Backbone layer.

Backbone layer:Transports data at high speeds to a matching Backbone receiving point. It


interconnects access points.

This particular model is an obvious oversimplification of the xxx IT environment. However, it can
serve as an understandable picture of that environment that an end-user can readily comprehend.
In personal communications, reducing complex ideas or concepts at the lowest common
denonminator can greatly facilitate understanding.

8/10/12 IBM and xxx Confidential Section 5 - Page 131


IT Capacity Management Procedure

The Business/Application layer is made up of business attachments and business


applications. Business Attachments act as surrogates in the network for the end users. They
establish business connections to the Access Layer enabling the transfer of user data between
end points (the business attachment and business application).

Business applications are the entities with which the business attachments maintain a
relationship. They execute business processes on behalf of the end users. Business applications
and attachments communicate using the facilities of the Access Layer and the Backbone layer.

The Access layer is composed of links and networking components which transport data from
one location to another. Access layer components can provide additional services such as
protocol conversions, error recovery, and rerouting. The non-link components of the Access layer
are referred to as Access Nodes. These access nodes are the access layer components that
provide the additional connection services between the Business/Application layer and the
Backbone as well as connecting two Business/Application layers without going through the
Backbone. The key here is that the Access layer is the entry point for end-users and applications
to access non-local computing resources in the Business/Application layer.

The Backbone layer is composed of three basic components: BMs (Bandwidth Management
Controllers), CSUs (Communications Service Units), and Trunks. These components serve to
provide a high-speed digital transport mechanism to move data from one access node to another.
For the capacity planner, the Trunks are the T1, fractional T1, and T3 lines. These particular
lines are composed of channels. These channels can be though of as chunks of bandwidth on
the BM circuits that allow data on a backbone Link to pass transparently through the BMs. The
channels have the same data carrying capacity as the Backbone Links with which they are
identified. It is actually the channels on the Trunks that are included in the Capacity Management
process.

The relationship between the layers and their components is illustrated in Figure 5.3.1.2. As can
be seen from the figure, some Business/Application components communicate by attaching
directly to the Backbone Layer, while others attach to the Backbone Layer through the Access
Layer facilities, as illustrated in the figure. The choice of attachment is generally governed by the
requirement to take advantage of an additional level of concentration provided by the Access
Layer. This requirement for a higher level of concentration results from a large number of
resources at a single location site or a large number of resources spread across multiple location
sites.

We have selected certain components from the network layers to include in the xxx Integrated
Capacity Management methodology. These components have been selected due to their critical
nature and because they include resources whose capacity can be managed. The network
component types included in the process are listed below.

1 Access Links
2 Backbone Links
2a FEPs (Front-End Processors)
2b Aaa Router
2c Bbb

8/10/12 IBM and xxx Confidential Section 5 - Page 132


IT Capacity Management Procedure

2d Channel Extender
3 Access-Backbone Links
4 Trunks
5 FEP (Front-End Processor)
6 Token Ring

The numbering scheme above is used to map to the charts in Figures 5.3.1.2 and 5.3.2.2
(contained in the next section, Section 5.3.2). Use of those two figures will be helpful in
understanding the following definitions:

1 Access Link: This is indicated on Figure 5.3.1.2 by a 'I' and is the link that connects the
Business workstation to an Access node. For SNA links, they are referred to as an SNA
Access Link. On Figure 5.3.2.2, a '1' indicates that the link connects a PU (SNA term for
a Physical Unit) in the Business/Application layer to an IBM 3745 in the Access layer.

2 Backbone Link: This is indicated on Figure 5.3.1.2 by a 'II' and is the link (or line) that
connects the Access node to the Backbone layer (a BM). On Figure 5.3.2.2, '2a' through
'2d' provide examples of the various Access node connections to the Backbone that can
be made. An example of how to use this list is to note that for 2a, FEP refers to a
Backbone link attached to a FEP. This link is called a FEP Backbone Link. Similarly, for
2b Aaa Router refers to a Backbone link attached to an Aaa Router and is called an Aaa
Router Backbone Link.

2a FEP (Front-End Processor) represented by '5' connecting to a BM.


2b Aaa Router represented by link '2b' connecting to a BM.
2c Bbb represented by link '2c' connecting to a BM.
2d Channel Extender represented by link '2d' connecting to a BM.

3 Access-Backbone Link: These are links that connect PUs (SNA Physical Units) directly to
the Backbone. This is indicated on Figure 5.3.1.2 by a 'III'. On Figure 5.3.2.2, a '3'
indicates that the PU connects directly to a BM.

4 Trunks: These links are totally contained within the Backbone layer and connect BMs.
They are indicated on Figure 5.3.1.2 by a 'IV'. On figure 5.3.2.2, a '4' indicates the trunk
between BMs. On the same figure, the encryption device is noted between BMs.

5 FEP (Front-End Processor): This component is an Access Node. It is identified in Figure


5.3.2.2 as '5', a 3745. Another FEP on that chart is the Unisys FEP.

6 Token Ring: In Figure 5.3.2.2, this is indicated by a '6', connecting a 3745 to an qqq 3745
within one of the QQQn sites. This component resides totally within the Access layer.

8/10/12 IBM and xxx Confidential Section 5 - Page 133


IT Capacity Management Procedure

The next section, Section 5.3.2, provides a high level description of the "typical" Location model
which was used to describe the various network components as to how they fit within the three-
layer model of the xxx IT environment.

Figure 5.3.1.2. Network Layer Components and Terminology.

8/10/12 IBM and xxx Confidential Section 5 - Page 134


IT Capacity Management Procedure

Section 5.3.2 -- "Typical" Location Model

From the xxx capacity management point-of-view, a simplistic model of a "typical" Location was
constructed to represent all of the network components and link types that are covered by this
methodology. Figure 5.3.2.1 diagrams this "typical" (or model) Location and shows the various
connections that can be made between the Location and a qqq site.

Figure 5.3.2.2 expands upon the prior diagram and identifies which network components and links
are covered by this methodology and ties it in with the three-layer model and terminology
described in Section 5.3.1. This diagram has included inter-Location communications through
routers within the Aaa project by having one router in one Location linked to one router in the
other Location (which is shadowed in Figure 5.3.2.2).

Figure 5.3.2.1. "Typical" network connections for a Location to a xxx center .

Page 135
IT Capacity Management Procedure

Figure 5.3.2.2. "Typical" Location connections tied to capacity management components in the
methodology.

Page 136
IT Capacity Management Procedure

Section 5.3.2 -- IT Resource Component Model

Each IT resource can be characterized by three key components: processor, storage,


and channels. For the host environment, this is pretty straightforward. However, for the
network, this decomposition needs further explanation. For example, if we look at an IBM
3745, it has all three components: processor, storage (the area of interest here is in
"buffers", the way that the 3745 looks at its memory), and channels (these are the ports
on the 3745).

In this model, links in the network are planned for indirectly. In our example, the IBM
3745 would be the major IT resource and through the planning of its ports the link
requirements become known. Thus, links are forecasted by their effects on a network
node such as a FEP, a gateway, a router, a bridge, or a BM.

Page 137
IT Capacity Management Procedure

Section 5.4 -- Workloads and Traffic Types

Workload is one of the most overly defined words in the capacity planner's vocabulary.
Each I/S organization has its own way of using the word. Often it is used differently by
other departments and usually inconsistently. The following definition frames this word
with the idea that there is a common bond or characteristic that represents a meaningful
way of viewing the applications or the load that they present to the host or the network.

Definition: A workload is a logical grouping of processing work for the purpose of


management, measurement, or control.

Similarly, the networking term traffic type is subject to the same level of misuse as
workload in most companies. Due to the large number of different ways that information
can be transported from one place to another, the many types of information (including
image and video) that can be transported across the same media, and the sheer
complexity and rapid rate of change in the telecommunications industry, a general
definition of traffic load is offered for use in the xxx methodology.

Definition: A traffic type is a logical grouping of network traffic which reflects the
underlying business package traffic for the purpose of management, measurement, or
control.

Another related networking term that can be very confusing is traffic load. Rather than
being a grouping of work (as in the term workload), it is a metric that is used to represent
the amount of work (more specifically, traffic) that is on the network, i.e., a measured
amount. A related word is demand which represents the amount of capacity on the
network that is needed. Demand is an estimated amount. Once again, there are many
views of the traffic load that can be taken such as a geographic view, a component view,
and an application view.

Definition: A traffic load is a metric that represents the amount of work (or traffic) that
is put onto the network or a particular part of the network.

In the parlance of the capacity planner for the network, the terms offered load and
carried load are sometimes used (albeit in an academic sense) in place of traffic load.
The spirit of those terms won't be ignored in this methodology; however, clarity of
meaning sometimes wins out over precision. Since these terms find their corollaries in
economics the following convention will be used:

Offered Load: the load which the end-user would place on the communications facility if
there were no resource constraints. This is simply a term for expressing end-user demand for
communication resources.
Carried Load: the load which the communications facility actually carried. This is simply a
term for expressing the supply of communication resource capacity.

Page 138
IT Capacity Management Procedure

Queue: the end-users who wait longer than expected for a response to be received on their
terminal device since the offered load has exceeded the carrying capacity of the communications
resources.

The use of the workload and traffic type concepts are necessary for a capacity planner
to organize his data into groups that are meaningful to a variety of diverse groups, e.g.,
management, technical staff supporting multiple technology platforms, business end-
users, and application developers. Therefore, the capacity planner will find that there are
multiple ways of viewing a workload that will need to be used to effectively communicate
capacity usage and to understand capacity issues.

Figure 5.4.1 shows the four most important views that the xxx capacity data needs to
take. These four views apply to the way both the workloads and the traffic loads need to
be viewed. This is done primarily for reporting and internal costing purposes;
nonetheless, the customers of the capacity management process are the ones who
determine the process requirements.

For the capacity planner, developing a forecast will require combining and re-grouping
workloads in order to develop a complete picture of the network or traffic load. Two
frequently combined groupings are the Location view and the Network traffic type. In
combining these two, the capacity planner can create a picture of the traffic flowing into
and out of a particular Location by traffic type. In forecasting the Access components,
this type of mapping allows the planner to match the right technology to the right traffic
load, as seen by the traffic type, for a particular geography.

This technique of carving up the network and host environments into manageable parts is
called segmenting or sectionalizing the data. Thus a Geographic workload can be
segmented into traffic types or Business applications. This process works in the other
direction as well since a business workload can be segmented into geographic
workloads.

At this point, the traffic load needs to be expanded in its description since it is one of the
most important, but least well-defined, groupings at xxx. This is the Network traffic type
breakout. In Table 5.4.1, six traffic types are described quantitatively. While the
qualitative description is useful when communicating with the application developer or the
end-user, the quantitative description is the most useful in understanding the overall
expected load a new workload may contribute to the network.

Page 139
IT Capacity Management Procedure

Users
Topological
The network is viewed by user group or type. This view
organizes data basedViewupon the characteristics of a
particular set of user groups.

Service delivery and service Requests for new equipment


level attainment by user or location changes
group.
Location Business/
Focus is on service Application
Business forecasts and Feedback on application
delivered to the
equipment requests performance Performance of specific
Districts, various
locations and DIs applications, such as
FEDNET I/SIAS,
ACH, Management
or DORPS,
Interested in network
component dominates this view
performance over Important issues are
Traffic volumes, service Application plans and
elements they can level attainment, actual changes cost and throughput
control vs. planned capacity

Project Status Network resource


usage data

Topology or Component
The network is viewed by its component parts and the
location of circuits, controllers, etc.
The project view (e.g., the network is
compartmentalized into the Access, DTN, and
Backbone parts) is a part of this view.

Figure 5.4.1. Four views of the xxx network.

Network Traffic Types

Table 5.4.1 is one way of quantifying the traffic types based upon their package sizes.
These business package sizes relate to the basic information unit that is transported
during the execution of a business transaction. The use of the descriptors Small,
Medium, Large, and Extra Large are useful when the capacity planner is discussing these
with an application developer or end-user. The qualitative characteristics of these traffic
types are:

1. Transaction, or Interactive -- short transactions, e.g., Data Base queries


2. File Transfer -- large bursts of data (can include sub-types of BulkData, NJE, and PC
uploads/downloads)
3. Channel Extender -- supports remotely connected tape, printers, MICR, 3270s, and
consoles.
4. Image -- large bursts of data over short time periods including image and fax transmissions.
5. Video -- real-time transmissions (can be compressed, freeze-frame, or non-compressed
video).
6. Voice -- low-speed voice transmissions

Page 140
IT Capacity Management Procedure

Business Package Sizes


Traffic Type Small (S) Medium (M) Large (L) Extra Large
(XL)
Transaction (IMS, Size < 30 30 < Size < 300 300 < Size < 3000 Size > 3000
TSO, RRDF, Bond
Securities, etc.)
File Transfer Size < 40000 < Size < 100000 < Size < Size > 500000
(BulkData, 40000 100000 500000
Upload/Download, etc.)
Channel Extender Size < 40000 < Size < 100000 < Size < Size > 500000
(Print, MICR, Tape, 40000 100000 500000
NJE, etc.)
Image Size < 10000 < Size < 50000 50000 < Size < 200000 Size > 200000
10000
Video N/A N/A N/A N/A
Voice N/A N/A N/A N/A
Table 5.4.1. Quantification of Traffic Types.

Beyond the business package classification of each traffic type, each of these traffic types
can be characterized by:

1. the type of data that is transmitted,


2. the burstiness of the traffic,
3. the speed of the traffic,
4. messages sizes, and
5. the service objectives for that type.

For example, the Channel Extender traffic type is characterized by:

1. Type of data: data encapsulated by channel commands; essentially, a S/370 or ESA


channel data stream
2. Burstiness of traffic: channel type bursts of traffic
3. Speed: high speed traffic (3.5 Megabytes/second or more)
4. Message Size: variable message sizes: small inbound and large outbound.
5. Service Objectives: very low tolerance for turnaround delay. This is necessary to avoid
device timeouts.

Transaction Traffic Type

The transaction, or interactive traffic type is usually an input message to an application


that is answered with one or two response messages from the application. This traffic
type usually has a requirement to support a relatively quick response from the application.
An example of a typical user of this traffic type is a bank teller who is servicing a
customer, and has a requirement to access account balance information, and then
process a deposit or withdrawal and send the customer on his or her way.

Page 141
IT Capacity Management Procedure

This requirement for a quick response poses a dilemma for the capacity planner.
Maximizing the response to the workstation user means ensuring that there are no
elements (including some one else's data flowing over the network, to cause a delay in
response. For economic reasons, the planner must make tradeoffs between minimizing
line costs and maximizing the efficiency of the response. The planner is also concerned
with the error rates in a network that supports this traffic type. Minimizing errors will help
maximize productive use of the network, and help the planner achieve his objective.

File Transfer Traffic Type

The File Transfer traffic type is characterized by the requirement to move large amounts
of data across the network. There may be an associated requirement to complete the
transaction within a specified time period, another constraint that the planner must
consider. The transmission system must be robust enough to sustain a transaction that
lasts for an extended period of time. Applications that support this traffic type may have
features that are designed to minimize exposure to errors and maximize productivity of
the transaction. These include features such as the ability to resume interrupted
transactions, and compression, which reduces the amount of transmitted data, required to
complete the transaction. The planner can use this to information improve his productivity.

Note that a requirement to support both File Transfer and Transaction traffic types
provides an additional twist to the capacity planner's dilemma. The two traffic types have
competing demands on the network. One( File Transfer) needs to maximize utilization of
the network, while the other (Transaction) requires that utilization be intelligently throttled
to maximize the user's response.

Fortunately the capacity planner has a solution that is (relatively) easy to understand but
more difficult to implement. The planner must collect data that allows him to understand
how much capacity is required to support file transfer traffic types within the defined
constraints, and how much additional capacity is required to keep utilization low enough to
most economically support the response requirements of the transaction traffic type.

Although some link protocols include features to mediated the requirements of file
transfer and transaction traffic types, they do not resolve the problem for the planner.

Channel Extender Traffic Type

The Channel Extender traffic type has many of the same characteristics of the File
transfer traffic type, however implementations of channel extender devices may have
effects, such as additional traffic to support proprietary channel extender device protocols,
or administrative traffic to support their functions.

Image Traffic Type

The Image traffic type, like File Transfer imposes a requirement to support large amounts
of data but for generally shorter periods of time. Furthermore there may be many
transactions whose frequency is unpredictable. Image traffic can have some of the same
response time requirements as the Transaction traffic type. It depends on the nature of
the business application. The planner must understand the nature, and requirements of

Page 142
IT Capacity Management Procedure

the application. A thorough understanding of the requirements will enable the planner to
suggest alternatives such as,

1. dedicating the required capacity,


2. routing traffic in such a way that the network can accommodate the idiosyncratic demands of
the business application,
3. adjusting the expectations of the users.

Video Traffic Type

The Video traffic type can be quite sensitive to delay. It also has some timing sensitivity,
in that 'information' elements cannot be displaced, in time, from one another. Such delays
and displacements may impair the quality of transmission enough to make video useless.
Furthermore, the capacity requirements can be higher than any of the other traffic types,
and this large capacity requirement makes it an important traffic type for the planner to
consider.

Voice Traffic Type

Consider the network used by the voice traffic type. This is traffic that flows over a voice-
oriented network. The network, has the following characteristics.

1. The delay, on voice, imposed by resources in the network is minimum. This is a requirement
of the primary payload of the network, voice communications. Most users of voice communication
would find it disconcerting to encounter unexpected delays within words, and within syllables of a
word.. Indeed, a conversation would be unintelligible under circumstances where random delay is
experienced.
2. It has historically supported applications that can tolerate 'relatively' high (for data) error rates.
3. It has historically been used by data users where it is the only economically viable alternative,
either because of the low volume of traffic, or the occasional nature of its use.

Changes occurring in the marketplace are having significant impact on the characteristics
of the network that carries this type of traffic. Inexpensive 'higher speed' error correcting
modems, compression devices, and products that combine the two technologies are
making it increasingly likely that more data with more stringent constraints will emerge
over these networks.

It is easy to see that a capacity planner with the correct insight into the requirements of
the traffic types can ensure that the appropriate metrics are identified, and the data
collected to correctly estimate capacity requirements for new applications, understand
how much capacity that current applications are using, and project future requirements.

Page 143
IT Capacity Management Procedure

Section 5.5 -- Network Connection Types

To understand the network facilities utilized by network traffic, two views are presented for
the capacity planner: traffic-flow and physical connection.

The traffic-flow view visualizes the major network components involved in the
transmission of traffic from/to the application. Section 2 presents a data collection and
documentation technique using form xxxCP04 to record traffic flows and components
used. Documentation should begin in the early stages of application development even
though only a general knowledge of how an applicatino/workload will use the network may
be available. The information on xxxCP04 needs to be refined throughout the
development life cycle and maintained when the application is in production. Figure 5.5.1
illustrates a traffic flow on xxxCP04 for an application that is going to run on the server#1
at the yyy.

The physical connection view, illustrated in figures 5.5.2 and 5.5.3, focuses on the
network connections typically needed for different types of traffic. These are generic
representations that obviously may not be sufficient for detailed network design purposes.
However, they are helpful to the capacity planner as background when discussing
network connections and traffic flow with the application and business staff. Physical
network connections had been categorized in previous xxx consulting engagements as
follows:

1. Channel Extender
2. National Dial
3. Interactive
4. Work station, WAN connection
5. Work station, Token Ring connection
6. FLASH protocol
7. SNI
8. QQQ to QQQ

A useful vehicle to describe these eight network connections is through the creation of a
"bubble" chart indicating each major network component that is used to establish the
connection between the end-user (client) and the application (host or server). Two
"bubble" charts have been created as an example. Figure 5.5.2 illustrates the network
component connections for the channel extender traffic type. Figure 5.2.3 illustrates the
network connections for the Dial-Out manager environment.

Early in the development cycle of a new application, the application developer needs
some assistance from the capacity planner in trying to describe how the application will
use the network. Figure 5.5.1 and the corresponding xxxCP04 form in the Appendix are
useful vehicles to walk a developer through the potential paths that the application data
may take. In fact, if this has not already been, the paths for the major applications should
be known and documented in a similar fashion.

8/10/12 IBM and xxx Confidential Section 5 - Page 144


IT Capacity Management Procedure

Figure 5.5.1. Sample of a possible Aaa network connection.

8/10/12 IBM and xxx Confidential Section 5 - Page 145


IT Capacity Management Procedure

Figure 5.5.2 is a very simple description of the Channel Extender connection type. It also
happens to be a traffic type. From this diagram, the major network component types that
would be involved in transferring data from the host to the tape or printer device could be
identified.

FRAS District

N D D E C C E D N
HOST D TAPE
B N S S N B S
VTAM S S S PRINT
M C U U C M C
C U U
C R R C
Y Y
P P
T T

Figure 5.5.2. Channel Extender network connection type.

The starter set of network connection types is not meant to be exhaustive. As other more
important or complex network connections need to be described then additional types
should be added to the list. The idea was to create some sample types that would be
representative of the majority of the types in use at xxx.

Page 146
IT Capacity Management Procedure

Section 5.6 -- Network Tools and Metrics

The capacity management process is heavily dependent upon tools for automating parts
of the process and performing the activities efficiently. Most of the tools used by the
capacity planner are shared among other processes such as Performance Management,
Network Design, and Problem Management. For the capacity management process,
these tools can be classified into three areas.

1. Data collection
2. Extraction, summarization, and reporting
3. Forecasting

The various functions of data management are considered to be a part of each tool
although there is a considerable degree of variation in their implementation of these
functions. For a description of data management functions necessary for capacity
management please refer to Section 5.1, Data Management.

There are two major parts contained within this subsection. The first part describes each
tool category. The second part maps capacity metrics with the appropriate tool. Parts of
the first and second sections of Section 5.6 are left to the xxx capacity planner to
complete due to the dynamic nature of the information. Section 5.1.4, Techniques for
Managing the Data, provides examples of how to organize and document this tool
information. With a comprehensive list of capacity metrics for the xxx environment, the
current inventory of tools can be mapped against the metrics to determine any gaps in the
tool inventory or in the data collection process.

5.6.1 Network Tools


5.6.2 Source Matrix for Capacity Metrics

Note: Figure 5.6.1 is an important figure for this section since it is a summary of the
primary xxx tools which fit into the capacity management process for the network. This
figure provides a summary of the capacity management tools, a brief description of their
capabilities and the identification of which role they play in the capacity planning effort.

Note: With all of the network tools available at xxx, only those directly related to the
network capacity management process are considered here. Where redundancy exists,
only the most capacity-relevant tool is listed in Figure 5.6.1. This redundancy can be
seen in that figure by the listing of the Prod2 tool rather than either of the carrier-provided
tools (AT&T and MCI's 800 number reporting tools). While the carrier-provided tools are
useful in validating vendor bills or calibrating Prod2 statistics, the Prod2 tool is sufficient
for the capacity planner since it contains all of the relevant capacity information in a form
that can be used by other data management and analysis tools.

Page 147
IT Capacity Management Procedure

Capacity Management Tools


Tool Capacity Planning Capabilities 1=Data Collection
2=Extract., Summ, & Report.
3=Forecasting
BulkData Log (SLF) File Throughput, average time per file 1, 2
CF3745 3745 Configuration design tool 3
IMS Queue-to-queue time (response time);
transaction statistics 1
MICS Handles all metrics 2
MICS/SNA SNA metrics for MICS 2
MXG Handles variety of metrics 1, 2

Network Traffic Line and Traffic analysis tool


Analyzer (NTA) 1, 2, 3
NetSpy or NPM Access node and Link statistics; Access
node utilization... 1, 2
NetView Access node and Link statistics; session
statistics 1, 2
QSI Estimates tarrifs based upon output from the
Time Design tool. 3
RMF for Channel channel busy to translate into bytes per
Extenders second 1, 2
Spreadsheets (e.g., Statistical tool set, graphics
EXCEL, 1-2-3) 2, 3
VTAM Session statistics 1

Table 5.6.1. Capacity Management Tools and their function.

Page 148
IT Capacity Management Procedure

Section 5.6.1 -- Network Tools

The purpose of the capacity management network tools is to facilitate the execution of the
capacity management process. In general, the effectiveness and efficiency of the tool will
be determined by the answers to these questions:

How easy is the tool to install and maintain?


Does the tool have an interface to other xxx capacity management tools that allow for
data to be shared?
How robust are the tool's data management facilities? or, does it require another
product for data management?
Is the tool sufficient to achieve the desired capacity management objective without
necessitating another tool?
How comprehensive, readable, and helpful are the tool's supporting documentation?
Are there good references provided by the vendor which could be used for follow-on
dialogs on tool usage?
Does the tool fill a current gap or deficiency in the capacity management data
collection, analysis, or forecasting process?
Are the limits of the tool well documented?
What amount of training is required to be able to use the product? or, are the tool's
manuals sufficient to begin using the tool?
Does the tool support the level of granularity needed for data collection, analysis,
summarization, reporting, or forecasting?
How well defined are the tool's definitions of the metrics that it collects, analyzes,
reports, or forecasts?

Data Collection Tools

The purpose of this type of tool is to facilitate the collection of capacity-related information
for the capacity planner. For the purpose of this methodology, capacity data collection
tools either collect data on network components or application entities. This data is then
stored for future retrieval and use by the same or a different tool.

For network components, this data can be either counted, sampled, derived, or some
combination of the three. This delineation of data into these three categories is useful for
analysis purposes. For most network components that have processing and storage
capability, the data is collected and stored at the component itself. For example, IBM
3745 data collection and storage is done by NetSpy on the 3745 itself. For many routers,
data is stored in SNMP's MIB (Management Information Block) format for retrieval and
processing by programs that can accept that format.

Application entities can be: address spaces (e.g., IMS, CICS, VTAM); applications or
workloads; DP transactions; or, Messages. For these application entities, data is

Page 149
IT Capacity Management Procedure

collected on a host or server through software programs like RMF. These programs store
the data in main storage and then write the data periodically to tape or disk.

From a capacity planner's perspective, one way to look at these tools is that they help him
to answer questions about network resource load, traffic load, and application behavior.
Some examples of data to be collected at the network component level are:

0 Hourly traffic rate


1 Hourly traffic rate for the network component by application or location
2 Hourly average message size for the network component by application or
location
3 Error rate for the network component
4 Network component utilization by hour
5 Network component buffer usage by hour

For application entities, some examples of data to be collected are:

0 Hourly number of messages sent and received


1 Hourly average message size
2 Business driver volumes by hour

Section 5.1.4, Techniques for Managing the Data, describes a method for the capacity
planner to document the data to be collected. A sample Data Collection Inventory
Report is described and shown in Figure 5.1.4.1. This report standardizes the data to be
collected. This information could then be supplied to the data dictionary facility in MICS.

Extraction, Summarization, and Reporting Tools

The purpose of these tools is to: (1) extract pertinent capacity-related data from the data
that has been collected (this could be reading an SMF tape or a NETLOG file); (2)
summarize the extracted data into meaningful groupings or subsets for analysis or
forecasting (this is referred to as building summarization files in MICS); (3) use the
summarized data to develop and produce a report. One of the current xxx tools, MICS,
performs all of these functions and serves as the central repository for machine readable
metrics.

The extraction and summarization functions within this tool category assist the capacity
planner with reducing the total amount of data that he needs to gather and anaylze. Only
the pertinent data needs to be extracted and summarized. With MICS, data is extracted
by particular keys (e.g., time), and summarized for use with analysis and forecasting
tools. This allows the capacity planner to focus on data for particular points in time,
particular locations, particular links, or particular components.

In planning for the use of the extraction and summarization tools, the capacity planner
should address the following sample set of questions. These questions don't apply to just

Page 150
IT Capacity Management Procedure

these tools but to the development of the overall data collection strategy for capacity
management.

How am I going to look at the component data -- by time, location, application, or all
three (i.e., what are the keys going to be)?
What data do I need to keep to determine trends and patterns of behavior?
How long should I keep this data?
What business data and system/network data do I need to collect to determine
correlations?
These tools can be documented in two types of reports that are described in Section
5.1.4, Techniques for Managing the Data. Figure 5.1.4.2, Data Summarization
Inventory Report, covers the summarization data by record format. The example shows
possible entries for MICS format data. Figure 5.1.4.3, Data Reporting Inventory
Report, shows an inventory report using the data reporting product MXG as an example.

Forecasting Tools

The purpose of these tools is to transform resource demand requirements, as


represented within the MICS data base, into estimated loads on host and network
resources. Often there are two types of tools that get put into this category: (1) capacity
requirements forecasting tools, and (2) network configuration and design tools.
For this methodology, the capacity management forecasting tools focus on estimating the
demand for network resources. The output from those tools feeds the network design
process and the network configuration and design tools. From the network design
perspective, these design tools take the capacity demand estimates and determine the
best configuration or topology to meet the estimated network demands. This information
is then used within the capacity management process to finalize the capacity
requirements and produce the capacity plan.
In determining the requirements for a network forecasting tool, the capacity planner needs
to address the following questions:

What kind of forecasting techniques does this tool support?


Can I use capacity data in its current stored form or do I need to translate it into a
form acceptable to the tool?
What are the limits of the tool?
Does the tool support the use of the business-driven techniques described in this
Methodology?
How do I validate the modeling tool's results?
Does the tool provide built-in statistical analysis functions to understand the data?
Does the tool provide the forecast results in a form that I can use and to the level of
detail that I need?
Can I understand and communicate to others the forecasting technique used by the
modeling tool?

Page 151
IT Capacity Management Procedure

Section 5.6.2 -- Source Matrix for Metrics

The earlier parts of Section 5.6 covered a variety of ways of looking at tools within the xxx
capacity management environment. This subsection takes a look at mapping traffic types
to network metrics and then mapping capacity metrics, components, and tools against the
view that is under analysis.

Table Title

5.6.2.1 Capacity Metrics by Traffic Type

5.6.2.2 Capacity Metrics and Tools mapped against Network


components, applications, and business entities

Capacity Metrics by Traffic Type

Section 5.4, Workloads and Traffic Types, focused on defining what traffic types are and
how they can be used within the capacity management process at xxx. Table 5.6.2.1
describes what metrics need to be collected to characterize that particular traffic type.
This characterization is needed for developing a data base of traffic types for use as
"comparables" for applications under development. Early in the requirements phase of an
application, not much may be known about the application's traffic characteristics except
a qualitative description of the traffic types. Having a data base of traffic types with the
metrics identified in Table 5.6.2.1 would help to quantify some of the characteristics of
that new application.

Metrics and Tools to be Collected by Component, Application, and


Business

In Section 4 and throughout this methodology, there have been references to the four key
views of location, business/application, user, and component that the capacity planner
needs to address. The question that quickly comes to mind when considering those
views is: what metrics can be captured to present each view? is the granularity there
within my tools to get to the level I need to meet my customer's requirements?

Figure 5.6.2.2 presents an outline of a matrix for the capacity planner to complete. Once
again, this is a good exercise for the capacity planner to understood and document the
source of the metrics and to determine the comprehensiveness of his data collection plan
to support xxx capacity planning objectives.

Page 152
IT Capacity Management Procedure

Table 5.6.2.1. Traffic characterization by traffic type.

Page 153
IT Capacity Management Procedure

The following legend is to be used when either developing or interpreting Table 5.6.2.2:

Legend note: A letter is used to indicate what tool can be used or an "X" is specified to
indicate that it is possible with either a tool or technique, but impractical for xxx because
either xxx does not have the tool or the technique is too costly. Sample codes are:

A NetSpy
B RMF
C RDS
D Bbb 2000
E Prod4
F VTAM
G IMS
H JES/MVS
I Prod6
J LAN Manager
X Possible, but impractical

Legend note: For the "Physical Component" column, a number may follow the tool letter
to indicate the type of component this metric can be collected for. The network
component codes are:

1 SNA Access Link


2 FEP Backbone Link
3 Aaa Router Backbone Link
4 Bbb Backbone Link
5 Channel Extender Backbone Link
6 Access-backbone Link
7 Trunk
8 FEP
9 Token Ring

Page 154
IT Capacity Management Procedure

Network Components Application Entities Business Entities

Capacity Metric
For For For For For For For For User
Network Workstation Address Space Application DP Trans. Traffic Flow Location
Component (e.g., IMS) or ID
Workload (or Message)
Character/Bit Rate A/8, C/5, J/9
Characters/Bits Transferred
Characters Retransmitted
Error Rate A/8
Frame Size
Frames Transferred
Frame Rate D/4, E/4
Message Size C/5
Messages Transferred
Message Rate
Message Transit Time (Host)
Message Transit Time (Network)
Message Transit Time (total)
Packet Size
Packets Transferred
Packet Rate D/4, E/4
Transaction Size
Transactions Executed
Transaction Rate
Utilization (%) A/1/2/8/9, D/4,
E/3

Table 5.6.2.2. Capacity metrics by tool and physical component.

Page 155
Analysis and Forecasting Techniques

Several capacity planning forecasting principles and techniques useful for analyzing capacity-
related data and developing a forecast are introduced in this section:

6.1 Forecasting Overview and Terminology


6.2 Baseline Creation
6.3 Analysis Techniques
6.4 Relative I/O Content (RIOC) for CPU and I/O analysis
6.5 Cluster Analysis For Grouping Similar Applications
6.6 Business-driven Forecasting Techniques
6.7 Additional Forecasting Techniques
6.8 Application of Techniques

Page 156
Section 6.1 -- Forecasting Overview and Terminology

Forecasting

Within the capacity management process, the objective of forecasting is to estimate the "amount"
of I/T resources that will be required to support the business functions. Business functions are
executed on mainframes, mini-computers, or personal computers. The term "processor" will refer
to the machines running the applications. Other equipment with processing elements that provide
only system/network functions in support of the application, such as a FEP, will be referred to as
"support processors".

Version 2.0 of this methodology is limited to XXX centralized and location-unique applications.
However, future processing environments, e.g., one to support a cooperative processing
application, were taken into consideration to provide a methodology extendible to that
environment.

Like the processor resources, DASD resources can also reside in multiple places. For XXX
centralized or location-unique applications, the production databases reside on DASD attached
directly to the processor running the application. However, one should realize that in an
environment where the applications take the form of cooperative, distributed, or client/server
application models, the information required may be physically remote from the main or
initiating piece of application code. In this case, the patterns of reference to the remote DASD
would also need to be considered when planning the network resource requirements.

The estimation of resource requirements cannot be limited to just the main processors and
DASD. Contingency plans nurtured by the System/Network Design processes define the needs
for redundant systems for availability, backup and recovery. Backup systems could be defined as
a type of support processor and need to be considered when estimating all resource requirements
for an application. The backup/recovery plans would also define how the information will be
transferred to the backup systems for recovery readiness. XXX utilizes the network resources
heavily for this purpose. Thus, a major consideration for network resource forecasting will be the
amount of resources (and topology design) to support the transmission of "backup" information
to remote systems.

The forecasting of processor and DASD resources for centralized and location-unique
applications via a business-driven methodology is much more straight-forward than forecasting
the network resources. Facilitating the processor and DASD forecasting effort are mature data
collection, analysis, and modeling tools and techniques. Although specific resources within a
network may have the same robustness, the overall forecasting of network resources, from the
viewpoint of techniques and tools, is still in the early stages of evolution. The expected
maturation path of network capacity planning techniques and tools will be one which follows that
of mainframe resources.

Page 157
Mainframe capacity planning began slowly around the period when the IBM S/360 and OS/MVT
operating system were popular. Even as late as 1975, tools and operating systems were still
deficient in reporting performance data, particularly in relation to business functions. Around
the late 70s and early 80s, business-driven capacity methodologies were being documented; but
without the supporting measurement and modeling tools, their usefulness was limited, and,
therefore, generally overlooked. Unlike the past when hardware monitors were attached to
processors to determine resource utilization, most operating systems now monitor system events
and do sampling to record not only overall resource usage, but also the percent of use per
application or user.

The challenge for XXX network capacity planners is to select the appropriate level of analysis
and forecasting techniques despite known limitations of monitoring and forecasting tools to
report metrics for all network components and reporting views (see Sections 4 and 6.8).
Although the need still exists to understand desired reporting requirements, the delivery of every
view may not be practical for all resources.

Alternative forecasting techniques are provided in Section 6.5. All the techniques support the
overall business-driven methodology, but with varying degrees of accuracy, or success, and
effort. As better measurements, monitoring tools, and forecasting tools become available to
support a business-driven methodology and the distributed environment, the specifics and
mechanics of the techniques can be expected to be more automated .

Resource Usage

Consumption of I/T resources is a function of the I/T business application code, the use of
support systems and utilities, the organization and placement of application data, the
system/network design, and the volume of work that needs to be performed. The term
representing the volume of work estimated on a system or resource will be "demand". Hence,
resource demands are a measure of work on a resource. The metrics that will be used to report
the demands will be expressed in some measurable unit per time period. Metrics can be
expressed in business or DP terms. For example, to the business user, the business demands on a
resource may be the number of files per hours traversing a network, or the number of business
transactions per second; but to the system and monitoring tools used to measure the work, the
demands get transformed into DP demands whereas the demand is an estimation of work, the
term load is a measurement of work.

The actual resource load can be obtained from both software and hardware monitoring tools.
Section 5.8 provides an overview of the key metrics needed for capacity planning . Section 5.1
provides instruction on how to maintain an inventory of tools and to document the relationship of
metrics to tools. Examples of inventory formats are provided, as are some examples of how
network capacity metrics in general are related to existing XXX tools.

Page 158
Collecting Capacity Data in Preparation for Forecasting

There are two considerations for planning the metrics needed to be collected: (1) the analysis and
reporting requirements, and (2) the input requirements of analysis and forecasting tools.

Section 4, Producing Capacity Planning Reports, outlines the basic elements of capacity planning
reports. Sections 2 and 3 are the sections of the methodology that contain the activities that
produce the reports. Version 2.0 of the methodology does not address the specific report formats
or contents, but does provide the methodology for developing the report standards, report
specifications, and the metrics of interest to report. The logical grouping, or segmentation, of
data to support various views is discussed in Section 5.2.

Once the types of reports are understood and the analysis and forecasting tool requirements are
known, the steps to collect the data and the packaging of the data can be defined Analysis steps
can use a variety of tools, from simple hand calculations or spreadsheets to sophisticated
modeling tools or techniques. Outputs from these tools should not be assumed to be useable as is
by all XXX or yyy management and staff (refer again to Section 5.2). Tool inputs will generally
seek similar information, but may require it in a slightly different way. For example, one tool
may ask for messages per second, whereas another may want packets per second. Both are
measurements of demand. Even if the same term is used, its use may be different. For example,
does a message include or exclude protocol overhead characters.

Key Note: The data input requirements for tools will vary. The terminology, although similar,
may mean something entirely different. A clear understanding of the terminology and input
requirements must be established and related to the XXX terminology used throughout this
methodology.

The next decision after determining what metrics need to be collected is how to collect it. This
entails not only which tool to use, but other considerations: such as the frequency of collection
and the level of detail, e.g., should transaction counts and rates be recorded by system,
application, user, etc. These considerations are again mostly driven by the reporting and analysis
needs.

An even more important consideration deals with analysis techniques. Once data is collected,
which data should be used for capacity forecasting. Should averages of data across 24 hours be
used? How about using the peak 15 minutes? See Section 5.3 for discussion on these important
aspects of capacity planning.

Analysis

Page 159
The recommended approach for analyzing and forecasting resource capacity is to start at the
highest level (least effort) for all key applications (the 20% utilizing or expected to utilized 80%
of the resources), then apply more detailed analysis approaches as required. Section 6.6 presents
the overall business-driven methodology forecasting techniques. Two techniques are discussed:
(1) Business Driver technique and (2) Business Transaction technique. Basically, the Business
Driver technique hopes to find a correlation between the business drivers of an
application/workload and resource usage. Statistical techniques are applied against historical data
(usage and business drivers) to produce a regression algorithm that can be used to estimate
future usage based on growth of business drivers. Complementary, but not necessary if the
Business Driver technique correlates sufficiently, is the decomposition of the workload from the
way the business sees the functions it performs (business transactions) into the units of work that
the operating system and monitoring tools can measure (DP transactions). This technique
requires much more knowledge of the application and entails more effort to produce a forecast.

Complementing business-driven forecasting techniques designed to improve both the "business


model" part of the overall forecasting effort, are many other techniques for analyzing information
and forecasting resource loads that have always been used and still applicable. Some of these
techniques are discussed in following subsections.

Network analysis is much more complex than processor and DASD analysis. Far more resource
components are involved, and because multiple paths may be selected dynamically, the
correlation of usage of a resource further downstream to a specific workload may become
impossible. Just the number of components in a network, as is the case for XXX, begs the
question "which techniques are practical to use". Section 6.6 will introduce how network
forecasting can be tied to the business. Section 6.8 will discuss and position the various analysis
and forecasting techniques relative to the XXX environment.

The role of network design plays a much more important role in providing the final conclusions
and recommendations for future resource needs than in a mainframe environment. Network
design is a process in itself that supports all the service-related processes: capacity, performance,
and availability. Section 1 and the Process Manager's Guide touch on the interfaces between the
capacity management process and system/network design processes. For completeness, however,
Section 6.8 introduces the Topology View to address the overall objectives of the Network
Design process to manage the current configuration and provide alternative for future
configurations. The Capacity Management process provides just one input to the design effort --
capacity growth requirements. However, other information collected for capacity planning
purposes would also be useful to the Design processes, e.g., service and operational
considerations as documented on form XXXCP02.

Often, the complexity of alternative paths for performance and contingency requirements can
only be deduced after running network design models. The same applies to the configuration of
elements within a single network component, such as a 3745, where adapters, buffers, and many
other elements must be determined. Tools for forecasting the capacity of specific components,
such as a 3745 and its links, exist today and are used by XXX. These require specific inputs,
such as number of messages in/out and message sizes in/out. The objective of a business-driven

Page 160
methodology is to improve the inputs, as well as, to quantify them in terms of business needs as
much as possible.

However, individual component analysis is just a small portion of network planning. Designing
the best overall topology in support of all the workloads, which must include both demands and
contingency considerations, is still a difficult and arduous task. To date, simulation methods
provided the means of modeling various a topologies against workload demands. XXX has used
IBM SNAPSHOT services for this purpose in the past. However, more efficient and cost-
effective methods are needed to provide these same results.

Several vendors have introduced analytical modeling tools to address some of the network
components. Usually the models are based on well-defined segments of a topology utilizing
specific protocols or network processing where metrics exist from tools today. For example, the
in an SNA environment, tools like NetSpy capture many of the metrics need for capacity
planning for the Access nodes in the network. However, few analytical models exist that can
model an entire enterprise, taking into consideration the variety of components, the alternate
paths, and the various protocols used.

The techniques discussed in Section 6.6 focus on how to obtain the most reliable inputs to
forecasting analysis tools. Since this methodology isn't written for a specific tool, the techniques
will produce the metrics most commonly needed by any tool. Transformations of the metrics
may be necessary for a specific tool. It is also possible, as XXX has realized when the prod1
modeling tool was selected, that a tool will provide additional input activities dependent on the
purpose of the tool. For example, prod1 is a specialized tool for estimating performance of an
application even before code is available for testing. Thus, it provides steps for generating the
inputs discussed in this methodology, i.e., logical I/Os. Similar specifics can be assumed for tools
selected for network modeling. However, the specifics are not expected to change the
methodology of relating business needs to I/T resource demands. Modeling tools usually begin
with the demands.
It should be noted however, that some tools, are now providing a means of capturing business
driver values and statistical techniques, such as regression analysis, to facilitate the correlation of
usage back to business functions. These should be explored and utilized .

Forecasting Approach

An accurate and timely forecast of host and network resources that will meet customer service
and business requirements is the goal of the XXX capacity management process. It would be
nice if there was a single forecasting approach to achieve this objective -- but there isn't. The
approach used in this methodology document is a practical one which focuses on selecting the
right combination of forecasting approaches that will improve the accuracy and timeliness of the
resource estimates in the most efficient manner possible.

From a capacity planner's point of view, there are really two stages to forecasting. The first stage
is the forecasting of the growth in IT resource demand. The second stage is the translation of

Page 161
that growth into a forecast of resource load on a target host and network configuration. This
second stage takes the predicted increase in the demand for computer resources, evaluates the
load on the current configuration, and determines or designs a configuration that will support the
new demand and meet all service objectives.

Note: For a host capacity planner both stages are done within the capacity management process.
For the network capacity planner, the first stage is handled by the network capacity planner and
the second stage is handled by the network design process.

All forecasts of IT resource demand and IT resource loads are based upon a model. This model
is a representation of the environment to be forecasted. It can be a very simple model or a very
complex model; it can be a set of graphs representing the behavior of the system or it can be a
very extensive simulation model which can represent and simulate all of the behavior of the
system. Whatever form the model takes, it is the embodiment of the capacity planner's
assumptions about how the business, users, applications, workloads, and IT resources interact
and affect each other.

Most modeling packages provide only half of the model needed for the capacity planner since
they exclude a business model and assume that the capacity planner can somehow provide
quantifiable, accurate application growth and change information to the system model. This
means that there is no direct tie-in between the system model and the business model. This
methodology provides the capacity planner with the structure to develop a rigorous, quantifiable
business model for the capacity forecasting effort.

Note: This methodology provides the capacity planner with the structure to develop a more
rigorous business model aligned with the IT model and to base IT resource forecasts on business
forecasts.

Estimating traffic demands and forecasting resource loads is not an easy task for the capacity
planner. The three major sources of growth in IT resource demand are: growth of existing
workloads (Existing), introduction of new applications (New), and changes to the application
environment (Changes). All of these factors can be forecasted and need to be considered in the
development of any IT resource forecast or capacity plan.

Growth of existing workloads describes the increase in work to be done by the network and host
due to increases in the demand for "existing" applications. The factors causing an increase in
demand can be: (1) an increase in business activity (e.g., more sales orders received by a
business means that more sales order transactions on the host and network will occur); (2) an
increase in the number of users of the system (e.g., the number of e-mail users is doubled
causing a doubling of the number of e-mail transactions in the system).

Page 162
Introduction of new applications causes new work to be added to the host and network
environment. This work is typically planned from a timing standpoint and sufficient
performance information can be gathered prior to its roll-out to estimate the capacity impact.

Changes to the application environment involve: (1) changes to the environmental software or
hardware (e.g., changes in operating system or subsystem software levels, changes in hardware),
or (2) application maintenance changes like increasing the size of a record. These changes are
also commonly referred to as Environmental Changes.

These three sources of growth are represented in Figure 6.1.1 where it can be observed that at to,
a baseline of resource usage is established and exists until t1. Usage based upon projections for
the three areas of growth are then added to this baseline. Whether the baseline should be a
constant or be a revised baseline which is adjusted for cyclical and seasonal behavior is beyond
the scope of this methodology. However, it is usually wiser to keep things as simple and as
straightforward as possible for representing a baseline over a period of time.

Section 6.8 will pull together the various techniques introduced throughout Section 6 and help
guide the capacity planner in preparing the individual application forecasts and overall forecasts
in support of the Capacity Management methodology.

Figure 6.1.1. Composition of a growth forecast.

Page 163
Section 6.2 -- Baseline Creation

One of the most pivotal steps in establishing a capacity management process is the creation of a
baseline. The importance of the baseline is that it is the foundation for any host or network
model being used for forecasting. This is seen in Figure 6.1.1 (from the previous section)
wherein the baseline usage is shown as the base upon which all of the estimates of growth are
built.

Definition: A baseline is the characterization of the typical behavior of workloads and traffic
over a representative period.

The objective of creating a baseline is to establish a time-specific set of workload and traffic data
which can be used as the base set of measurements upon which a forecast can be built. Whether
the forecasts are related to an application view, a location view, or a component view, the
ultimate goal of the forecast is to estimate the load on host and network components based upon
resource demand so that the right configuration of components can be used in the future to meet
service delivery objectives.

Since the role of the baseline is so critical to achieving valid forecasts, the capacity planner needs
to focus on addressing several questions related to a particular baseline study prior to initiating
that study. These questions cover each area identified in the baseline definition.

What is the content and scope of the baseline?


What is the representative period needed for the baseline?
What metrics do I use to characterize the baseline?

An approach to answering these questions is provided over the next few pages. However, these
approaches cannot cover all possible cases and it is left to the capacity planner to use his
intuition and experience to decide on how to apply the approaches.

How to Determine the Content and Scope of a Baseline

The first item to address by the capacity planner is: what is the scope of the baseline? Section
6.8.1, XXX Data Collection, Analysis, and Forecasting Views, describes three views, any one of
which could be considered as the basis for a baseline study: application view (Section 6.8.1.1),
location view (Section 6.8.1.2), or the component view (Section 6.8.1.3).

If the application view is chosen, then the next steps will depend upon whether the application or
workload is in production or some pre-production phase of development. In the case of a new

Page 164
application not in production, Form XXXCP02 in Appendix C provides the application
information on the daily work flow patterns and other known peak processing or traffic periods.
This information is used to determine a possible representative period for the baseline. For an
existing production application, Form XXXCP08 provides actual data on the hourly network
traffic by location and component. Obviously, prior to the study, the capacity planner
maintaining XXXCP08 needs to determine whether the data for that form represents a peak day
or an average day or some composite day since that will directly affect the additional work that
may have to be done while conducting the baseline study.

If a location view is chosen, then the capacity planner needs to determine which network and
host components should be included in the baseline for a particular location. For a
comprehensive location view, all key host and network resources should be included. For this
effort, Form XXXCP17 can be used to summarize the measured load for each hour of the day by
network component. Similarly, a chart of utilizations, MIPS, or other host load metric could be
used in place of characters on XXXCP17 and host component substituted for network
component.

Additional work for a location view study is ongoing and follows the guidelines in Section
6.8.1.2 for this type of study. Once again, the next step is to determine what the representative
period of time is to be.

If a component view is chosen, then the capacity planner needs to determine which components
are to be included in the baseline. For the network, XXXCP17 can be used as with the location
view. The study can follow the guidelines in Section 6.8.1.3 for the component view.

How to Determine a Representative Period for the Baseline

Once the content and scope of the baseline is identified, then a baseline period needs to be
selected. The selection of a baseline period needs to consider what the time frame of the
forecasted environment will be (is this forecast for an environment 6 months or 2 years away?)
and what will be forecasted (is the capacity planner forecasting the load expected during the peak
hour of the peak day of the peak month within a quarter or is the forecast oriented towards
forecasting the daily average load without regard to handling peak hour demands?). These
questions reflect service level considerations. Forecasting on average daily load usually results
in some periods of poor responsiveness during the day; however, if that is acceptable to the
customer, then it could be acceptable to the capacity planner.

Another key question to ask during the planning stages is: should the forecast of the load on a
particular set of network or host resources focus on the concurrent workload peaks or the highest
hourly resource load during a representative day?

In the definition of a baseline it specified the time period to be a "representative period". Well,
there are two aspects to a representative period. The first aspect is that it is a specific period of
time, whether it be an hour, a full 24-hour day, or a week.

Page 165
The second aspect is that it is representative of the workload or traffic that is currently being
executed. This applies to the behavioral characteristics of existing applications only. Baselines
don't apply for applications under development. All of their behavioral characteristics are
estimated with the performance characteristics of that new workload established by benchmarks
and use the data collection forms from Section 2 and the appropriate forecasting techniques from
Section 6.
For applications, a representative period of time is determined by examining the hourly behavior
of the application over a 24-hour day to determine the peak hour. From knowledge of the
business drivers as documented in the forms and the determination of seasonal and cyclical
patterns of behavior, the capacity planner can determine what is a representative peak hour or
period to use for the baseline. This characterization of the peaks for an application along with
the characterization of its hourly loads for a 24-hour "representative" day is extremely helpful
when workloads are aggregated to evaluate concurrent peaks on host and network components.
This becomes the baseline for an application.

For components, a representative time period isn't necessarily related to a particular workload or
application. Consequently, a representative peak hour for a component may occur during a time
that is different from any particular workload or traffic peak hour. The type of information
necessary to capture this component peak is shown in Form XXXCP17 of the Appendix. Once
again it involves collecting information on the total number of characters transmitted per hour
during a 24-hour period. Doing this for several days that are known from experience to be
"heavy processing" days should quickly generate a reasonable representative peak load --
whether a peak hour is used or a more granular 15-minute spike.

Crossing multiple time zones presents an interesting synchronization problem when data needs to
be aggregated across time zones. Some data reduction programs can automatically adjust the
data based upon the location of the component (if a reasonable naming convention is used) while
others would suggest using a common clock base like GMT.

How to Characterize the Baseline

Creating a baseline can generate a lot of data. However, much of that data is not needed by the
capacity planning in maintaining a data base of baselines. The majority of the data that is
collected during a baseline is validation data which confirms that the baseline data collection met
the acceptance criteria (e.g., 90th percentile of response time during the peak period remained
below the threshold, re-transmissions were less than 1% during the peak, no performance
aberrations occurred like runaway jobs, etc.).

In all cases, the basic information that the capacity planner maintains about the baseline is: (1)
time period, (2) load information (whether it is MIPS, characters per second, or jobs/second),
(3) business volume data for the business drivers, and (4) transaction or job volume data, if
relevant. Form XXXCP17 in the Appendix, describes the metrics that need to be maintained for
the baseline. That is fairly straightforward as long as the baseline passed the acceptance criteria.
That form uses characters per hour per component for a 24-hour day.

Page 166
One important point to keep in mind about the baselines is that the capacity planner is looking at
measured loads for the baseline data. Applications under development can only look at estimated
demands, such as in Form XXXCP08, but then, they don't actually have a baseline as can be seen
in

Figure 6.1.1 which shows the "new application" workload being built upon the existing baseline
for an IT resource.

Producing a Baseline Report

The baseline reports are predominantly an informative type of report and are used to characterize
the performance behavior and characteristics of the four baseline types in a standard way. For
existing applications, the baseline would be specific to a point in time, such as the peak hour of a
particular day or for an entire day or week. It may also include information on the relationship of
the business drivers to business transactions and DP transactions. The characterizations are done
with measured data from the production environment. For new applications, the baseline would
represent the results of stress testing or a benchmark.

From a performance profile on a host, most of the questions answered by these reports relate to
"how much" of a particular resource such as CPU or DASD does the application workload
consume and "how often" does it do that? The primary questions answered for the host
environment are:

How many CPU milliseconds does this transaction consume?


How many CPU seconds does this batch job or set of batch jobs consume?
What is the relative I/O content (RIOC) of the application during the peak period?
What is the relative I/O content of the application during the prime shift?
How many I/Os, Data Base calls, or logical I/Os does the application perform per transaction or
batch job?

Page 167
What is the behavior pattern of this application workload or system over time?
What business drivers can be used as predictors of application workload capacity requirements?

There are two types of data used in the baseline reports: behavioral and characteristic. The
behavioral data are descriptive of the behavior of an application or component over time. It
relates to the volume of work over time and the relationship between business drivers and
capacity consumed. Most of this would be portrayed in graphical form to represent the pattern of
resource usage over time. Examples of this would be a graph of CPU seconds or MIPS
consumed by hour of day or by prime shift during a week or a month. Most of the behavioral
information is best described through graphics.

The characteristic data are the particular set of metrics used to characterize this workload. This
focuses on the amount of resources consumed per transaction or job or the traffic load generated
by a workload. In the host environment it focuses on the usage characteristics of the individual
application or job.

Section 2 and its XXXCPxx forms define most of the baseline metrics and when to collect them
during the development life cycle. Existing applications and their baseline metrics are
documented in Section 3 and related XXXCPxx forms. These are summarized in subsequent
subsections of this section.

What Characteristic Data Needs to Be Documented for an Application Workload Baseline?

The characteristic data are the performance metrics that characterize a workload by its
consumption of resources or its throughput. This information is needed to determine if the mix
of transactions has changed, the workload has shifted, or if an anomaly has occurred.

From the network perspective, the necessary metrics are documented in Sections 2 and 3 and
their associated forms for this type of baseline.

On the host, an on-line transaction is characterized by these metrics:

Average CPU milliseconds per transaction


Average number of Physical I/Os to DASD per transaction
Average number of Logical I/O requests per transaction
Average number of DB calls per transaction

The host metrics for an on-line workload are:

Page 168
Average number of MIPS consumed per peak hour
Average number of Physical I/Os to DASD per peak hour
Average number of Logical I/O requests per peak hour
Average number of DB calls per peak hour
Relative I/O content
Total number of volumes used during the peak hour for production
Total number of volumes available for backup
Total amount of DASD space in MBs that has been allocated

The transaction characteristic metrics for this on-line workload are used to further describe the
workload in this way: First, the workload's transactions are grouped by either business function
or by some performance-oriented grouping (such as simple, medium, or complex). Then the
relative mix of that transaction grouping is measured for a peak hour. For example, the workload
transaction mix could be composed of 30% simple, 40% medium, and 30% complex. Then, for
each of these three categories (or whatever breakout is needed) that group is characterized by the
transaction characteristic metrics.

For the host, a batch job or batch "transaction" is characterized by:

Average CPU busy seconds per job


Average number of Physical I/Os to DASD per job
Average number of Logical I/O requests per job
Average number of DB calls per job
Relative I/O content

For the host, a batch workload is characterized by:

Average number of jobs per peak hour


Average number of MIPS consumed per peak hour
Average number of Physical I/Os to DASD per peak hour
Average number of Logical I/O requests per peak hour
Average number of DB calls per peak hour
Relative I/O content for the peak hour
Total number of jobs for the workload during its primary production period
Total number of MIPS consumed for the workload during its primary production period
Total number of Physical I/Os to DASD during its primary production period

Page 169
Total number of Logical I/O requests to DASD during its primary production period
Total number of DB calls during its primary production period
Relative I/O content for the primary production period
Total number of volumes used during the peak hour for production
Total number of volumes available for backup
Total amount of DASD space in MBs that has been allocated

The batch job characteristic metrics for this batch workload are used to further describe the
workload in this way: First, the workload's jobs are grouped by either business function or by
some performance-oriented grouping (such as simple, medium, or complex; or short-running and
long-running jobs, or "Top 20" jobs). Then the relative mix of that job grouping is measured for
a peak hour and is based upon the relative amount of CPU time. For example, the workload job
mix could be a "Top 20" mix in which the 20 largest jobs (based upon either total CPU time or
DASD accesses) form one group and the rest of the jobs form the other group. These two groups
are then characterized by the batch job characterization metrics.

Suggestion: the "top 20" jobs based upon CPU seconds and the "top 20" jobs based upon total
number of DASD accesses should be known and characterized.

What Behavioral Data Needs to Be Documented for the Baseline?

On the host, the behavioral data is time series data that is used to characterize the behavior of the
workload over time. It can be viewed as characteristic data applied over time. For example, this
might be the peak hour utilization of a network component each day for a month.

There are two parts to the behavioral data: (1) Computer resource load, and (2) business usage
metrics. The computer resource load is the behavioral description of the workload in terms of
CPU and DASD consumption. The business usage metrics part describes the workload in terms
of its correlation to computer load, i.e., how well do certain business metrics predict the CPU and
DASD load. We will deal with the computer resource load portion first.

The way the workload was characterized for the characteristic data needs to be the same way that
the workload is characterized for the behavioral data. The behavioral data itself will be workload
related and not focused on individual transactions or jobs.

The goal of the behavioral characterization is to document what the normal pattern of behavior
looks like in terms of resources consumed for the typical hour, typical peak hour, typical day,
typical week, and typical month. Additionally, it is important to note daily, weekly, monthly,
quarterly, and annual cycles (or seasons) of behavior.

Page 170
Suggestion: Rather than get overwhelmed with data, it is best to choose a representative day,
week, and month as your "baseline" measurement period and note what have been the observed
cycles in behavior for monthly, quarterly, and annual cycles.

The behavior of a workload can be baselined (also known as production benchmarking) by


graphing the amount of MIPS consumed for the workload by:

Page 171
peak hour during a typical day
hourly averages for a typical day (i.e., 24 hour averages)
peak-to-average ratio (this is where "peak" means the MIPS consumed during the peak hour
within the prime shift, and "average" represents the average MIPS consumed during the prime
shift).
average week's peak hour MIPS consumed (i.e., the peak hour MIPS consumed for Monday
through Friday)
average week's daily MIPS consumed (i.e., the daily prime shift MIPS consumed for Monday
through Friday)
peak hour during a month
peak daily average MIPS consumed during the month

The business usage metric part of the behavioral characterization is primarily descriptive of what
the correlation is between business growth or activity and MIPS consumed or DASD allocated.

How Do I Create a Baseline for a Workload That Hasn't Gone Into Production?

For a "new" workload prior to production cutover, the characteristic data is based upon: (1)
stress testing or benchmark (2) an estimate based upon "comparable" transactions. This
behavioral data can be estimated before production cutover.

Suggestion: Use a freehand drawing on graph paper to represent the performance behavior over
time. For the daily averages and weekly measurements a relative scale can be used. For
example, application developers or business owners may find it easier to describe the pattern of
behavior in relative terms than absolute terms. For example, in describing the peak hour during
the day, it would be easier to say that the peak hour uses twice as many MIPS as the prime shift
hourly average consumption of MIPS rather than that the peak hour uses 40 MIPS and the
average prime shift hour uses 20 MIPS.
How Do I Create a Baseline for a Workload That is Both Batch and On-line?

If an application "workload" has a distinctive batch and a distinctive on-line component, like
IAS, then it should be characterized by two sets of data -- just as if it were two workloads. That
means that you would have a set of metrics characterizing IAS batch and another set of metrics
characterizing IAS on-line -- even though you are dealing with just one business workload called
IAS.

Handling the network component of the baseline is done similarly with the Forms for Sections 2
and 3: unique types of work are characterized as just another workload type.

Page 172
Section 6.3 -- Analysis Techniques

Another important task for the capacity planner is analyzing capacity data. This is done
routinely by the capacity planner for a variety of common reasons. These reasons fall into
several analysis categories:

Analysis Type Purpose of Analysis

Data Validation Determining the quality of the measured data and the level of error.

Variance Analysis Explaining variations between what was estimated and what was
observed as in the difference between actual capacity consumed and
planned capacity.

Root Cause Analysis Determining the root cause of a problem or observed event.

Sensitivity Analysis Determining cause and effect relationships.

Regression Analysis Quantifying the relationship between independent and dependent


variables, e.g., business activity (independent variable) and IT activity
(dependent variable).

Behavior Characterization A type of variance analysis focusing on determining and


characterizing the underlying patterns of variation observed in
applications, workloads, host components, and network components.

Descriptive Analysis Determining the best way to present statistical data in order to inform the
recipient (e.g., bar charts, tabular reports, histograms).

The list of analysis types focuses on those types of analysis which are performed by the capacity
planner. From a performance management perspective, one may have expected to see specific
techniques that contain the word analysis like contention analysis, threshold analysis,
performance exception analysis, throughput analysis, service level compliance analysis, and the
list goes on. However, upon closer scrutiny it appears that these are not really new and different
techniques, but rather old techniques which have been adapted and improved to fit a new trigger
event.

The trigger event for these other techniques is the existence of contention at a component level,
the exceeding of a performance threshold, a performance exception occurring, a level of

Page 173
throughput being achieved (or not achieved), or the compliance (or lack thereof) to a service
level. Thus, the generalization of analysis into seven types easily allows for adaptation of the
techniques to each of these other areas.

In developing analysis skills, the focus should be on establishing a foundation in each of these
areas and then apply those base skills to the XXX host and network environment.

Preparing for Analysis

In order for the capacity planner to be effective and efficient in conducting his analysis, he needs
to have in place the proper framework for analysis. As a prerequisite to effective analysis, the
capacity planner needs to have his data collection process in place which includes a robust
capacity data base. Tools, like SAS, need to be available to automate statistical data reduction.

Preparation includes implementing the recommendations on procedures and techniques in


Section 5.1, Data Management, and standardizing on terminology as described in Section 5.2,
Terminology and Metrics.

Analytical Thought Process

Each of these analysis types are described for a different analytical aim. These are all of the
major types of analysis that a capacity planner uses. Each has its own set of primary tools and
techniques, but at the core of each is a thought process that is grounded in the following series of
activities. Most planners follow these steps intuitively:

Set the objective for the analysis


Develop hypotheses
Gather data and test hypotheses
Analyze and synthesize data
Develop conclusions
Validate results
Communicate results.

Discussion of Analysis Types

Page 174
Data Validation

Data validation should be done in any of the following cases: (1) new data collection method is
used, (2) new data analysis technique or tool is used for the first time, and (3) risk of drawing
wrong conclusions from the data is very high. Automated procedures in the data collection and
storage process should be in place to insure that the data is being collected in the prescribed
manner . This can be done through a verification process which tests the data collection process
to make sure that the process is being executed as designed.

The Data management guidelines in Section 5.1 provide recommendations for establishing this
process to ensure the integrity of the data. Please refer to Section 5.1 for details on creating the
process.

Variance Analysis

Every capacity status report is based upon solid variance analysis since any exceptions to the
capacity plan need to be explained. This is a fairly straightforward process and in general,
follows these steps once a variation from plan has been detected:

Step One
Determine magnitude and significance of variation. In general, any deviation of 10% for a
major network or host component should be evaluated if it was based upon a 6-month projection.
This applies whether it is a deviation of 10% or more above or below what was planned.

Step Two
Determine root cause of the variation. Possible causes to explore:
growth assumptions have changed
business driver regression formula no longer applies
significant application change which caused the message size or number of messages to
change
significant hardware or software change
unplanned growth occurred such as additional new users
link errors
data compression or compaction used for data on the link
user behavior changed
new regulations in place
new incentives or disincentives to use the application at different times during the day
new pathing

Page 175
contingent capacity used due to unplanned outages
unplanned load due to traffic rerouting to this node

Step Three
Determine if root cause is an exception or structural (i.e., a defect in the current design).

Step Four
Determine if root cause needs to be corrected. If so, develop recommendations for immediate
correction. If root cause is structural, then modify capacity planner's model or assumptions of
how the host or network operates to accommodate the deviation.

Step Five
Document and communicate the results.

Root Cause Analysis

This term arose from the emphasis on American quality programs in the 1980s. The analysis
techniques for this approach are quality control techniques plus common sense. The aim of the
analysis is to determine the root cause of a deviation (or a defect in quality terms) so that it can
be eliminated. These techniques apply to any measurable environment with repeatable
processes. Some of the techniques used for root cause analysis are:

Fishbone diagramming to create a "Cause and Effect" diagram and determine the original causal
factor for the deviation.
Pareto diagramming, which can be modified to fit the performance and capacity environment.
To explore a service level exception (only an example and not really a part of the capacity
planner's job) for response time, a pareto diagram could be constructed with the components of
response time listed along the x-axis with the y-axis representing the number of service
exceptions (or performance thresholds exceeded). The resulting Pareto diagram would quickly
identify the component of response time that had the most exceptions ("defects" in quality terms)
to its performance threshold and thus the element to focus on first. This assumes that the
performance thresholds are properly set.
Control charts, which can be used to determine whether the "abnormal" occurrence is a "trend"
or not, and whether their is any "periodicity" to that abnormality. Control charts have been used
for monitoring service levels. In particular, these charts have been very helpful in understanding
the variations of batch processing milestones during daily runs. Guide to Quality Control by
Kaoru Ishikawa, 1982, Asian Productivity Organization, is one of the more enlightening and
practical books on the subject of statistical quality control. The application of quality control
techniques to the IT environment has been done by the IBM Consulting Group, but the work has
not been published.
Scatter diagramming, which is the quick technique of "eyeballing" the data to see if a "cause and
effect" relationship may exist -- at least give the statistical appearance that it does. If a negative

Page 176
or positive correlation appears, then more sophisticated techniques which create a coefficient of
correlation or a Sign Test Table should be used to quantify the relationship and confirm your
suspicions as to the candidate root cause.
Statistical sampling in order to confirm your hypotheses about a root cause. Ishikawa describes
four sampling techniques in his book: (1) random sampling, (2) two-stage sampling, (3)
stratified sampling, (4) cluster sampling, and (5) selected sampling.

Sensitivity Analysis

Although this term has a particular meaning within statistics, we are using the term to refer to the
type of analysis which attempts to determine cause and effect relationships through the use of
controlled experiments -- or benchmarks. These application or network benchmarks are run in a
controlled environment with a single parameter allowed to vary. In a network benchmark, this
variable could be the number of terminals on a line, the number of concurrent sessions, message
size, message rate, number of hops.

The emphasis is on finding out the level of sensitivity that a particular network or application
operating characteristic has on the overall performance of the system. This type of network
benchmarking is also useful in determining thresholds for particular components. For example,
in a controlled environment, it may be determined that doubling the packet size while
maintaining the same packet rate will greatly decrease the effective capacity of a particular
component.

For "new" applications, this type of analysis can occur during the early phases of development if
prototyping is done. At later developmental phases, actual benchmarks can be done to measure
the effect of varying certain parameters in a controlled environment.

For the network, reasonable benchmarks of production systems can be done by creating a
representative environment for a component, subsystem (e.g., , Channel Extenders), yyy location,
or technology type. Good experiment design using sound sampling techniques for representing
the current environment can create valuable insights into the host and network capacity
alternatives for the capacity planner.

Regression

This type of analysis focuses on identifying business and system variables which are good
predictors (or estimators) of system and network load. Historical production data is used almost
exclusively except where sensitivity analysis has proven a strong correlation between a system
variable and expected resource usage.

Page 177
The two major statistic approaches used within this analysis type are: regression analysis and
correlation analysis. There are numerous techniques within each, but we will discuss only the
most germane.

Statistical Correlation works with historical data to determine if there is some statistical
correlation between a system variable (e.g., 3745 utilization) or set of variables and a line or
curvilinear line representing another variable (e.g., message rate) or set of variables. The result
of the test is to determine if the variable is statistically correlated (either negatively or
positively) or uncorrelated. This metric is called the coefficient of correlation. However,
statistical correlation does not mean that a cause and effect relationship has been established. It
is up to the capacity planner to apply common sense and other techniques to determine if a cause
and effect relationship underlies the statistical correlation.
Correlation doesn't tell you what the formula is for predicting a future value for a variable (like
3745 utilization). However, it does tell you whether a particular formula or line has a similar (or
correlated) behavior pattern. This behavior pattern can be represented by regression lines and
thus a positive or negative correlation can tell the capacity planner whether a regression line
might be a good representation of the variable's behavior. Regression is used to determine the
formula that "best" fits the sampled points for the variable.

Regression deals with estimating or forecasting changes in one variable based upon changes in
another variable. Its critical assumption is that the relationship between two or more variables in
the past will remain the same in the future. For capacity planners, it is important to keep that
assumption in mind because the past relationship can, and most likely will, change. It is up to the
capacity planner to anticipate that change and adapt the predictive model to reflect that known
change. Often this adaptation is called a "fudge" factor.

Note: Regression is important for the capacity planning in determining the best curve or formula
for the business drivers. Therefore, it is a pivotal analysis type in implementing a business-
driven capacity management process.

Much of the data that a capacity planner deals with is time series data. When regression
techniques are used in this sense, the regression line can be called a trend line which is then used
for expressly estimating or predicting the value of a variable such as CPU MIPS.

There are a lot of different curve fitting techniques that are described in statistical textbooks so
we won't cover them in any detail here. The techniques start with the simplest which is just
sketching out the variables on paper in a scatter chart and then "eyeballing" the line. However,
more sophisticated tools are available in SAS and MICS that automate the process for the
capacity planner and eliminate a lot of drudgery from the task -- leaving the creative work to the
capacity planner.

Behavior Characterization

Page 178
This analysis type has two major components. The first is the determination and analysis of the
attributes which uniquely describe this application, component, system, network, or system
resource. Examples of attributes that would characterize a host workload are Relative I/O
Content (RIOC), I/O rate, and DASD space. The attributes mentioned in Section 5.4 describing
the Channel Extender traffic type are another good example.

The second component of behavior characterization is associated with time series analysis. This
looks at the behavior of a host's or network resource's variables over time as characterized by its
system metrics (e.g., message rate, buffer utilization). Once again, programs like SAS and MICS
can quickly aid the capacity planner in doing this type of analysis.

Some of the questions that a capacity planner would look to answer from performing this
analysis are: does the variable(s) have a trend? are there any cyclical or patterned movements
by this variable over time? is there any seasonality to this variable's movements over time? how
consistent are the peaks during the week? are there consistent monthly peaks? In economic
forecasting, this type of analysis is used to factor out these predictable movements to get at the
base data.

Quantifying the behavior of a workload or IT resource component over time is important to the
capacity planner in forecasting future behavior and determining when exceptions, variances, or
shifts are occurring.

Descriptive Analysis

This is not as rigorous as the other approaches and requires a more intuitive approach to analysis.
Capacity statistics can be very impersonal, misleading, or confusing to a person not directly
involved with capacity planning. Unfortunately, statistics do not speak for themselves. They
have to be packaged properly in order to be meaningful to the recipient.

This should be titled the art of descriptive analysis since the capacity planner is seeking ways of
conveying the proper message to the intended recipient without compromising the integrity of the
data or leaving any doubt about the underlying message. Statistical vehicles such as pie charts,
histograms, and tabular reports are the most sophisticated that this technique gets. The challenge
is how to simplify the results as much as possible and present it in a way that is crystal clear to
the recipient.

Page 179
Section 6.4 -- Relative I/O Content (RIOC) Metric
Analysis

One method for describing a workload and comparing the relative capacity of processors is
Relative I/O Content analysis. This concept was introduced and further illustrated in articles by
IBM Canada's Joe Major (see [1][2][3] in the Recommended Reading section of the Appendix).

The relationship between the amount of CPU consumed and the rate of I/Os executed in a
specific time period is called the Relative I/O Content (RIOC). XXXCP10 contains the
mathematical relationship and information required to us it. This relationship is repeated below:

R = S / (M * B)

where,
R =Relative I/O Content
S=I/O Rate (logical I/Os per second)
B=CPU busy %
M=Processor's power rating (MIPS)

The use of the relationship has many applications. For example, if an application is being
developed, measured CPU and I/O usage may be unavailable prior to the stress testing phase.
However, DBAs usually can estimate an I/O rate based on predicted accesses to their data bases.
Given an I/O rate and an estimate of the RIOC based on past experiences for applications
utilizing the same access methods and coding techniques the CPU power requirements (M*B)
can be derived via the formula.

Ray Wicks' bulletin ([6] in Recommended Reading of the Appendix) explains the stability of the
RIOC and factors that could change the RIOC for a particular application is summarized below.
The factors affecting the "S" component of the formula are:

Access method - change to an access method which has enhanced buffering would reduce the
physical I/Os
Blocking factor - more data per I/O
Data in memory - slower and more costly CPU for physical I/Os would be replaced by quicker
and more efficient memory I/Os at the cost of more memory
Operating System - utilization of new architectural features designed to reduce DASD I/O or
DASD space, such as data compression
Systems Managed Storage implementation - again, different way of doing I/O

Page 180
The factors affecting the "M*B" component of the formula are:

Software Upgrades - path length changes to support new functions, such as Systems Managed
Storage. Sometimes the subsystem performing work on behalf of an application does more in a
new release.
Complexity - as I/T customers become more sophisticated with an application, they tend to use
more resource consuming functions

The RIOC metric is easy to measure and when it changes from an established baseline, it
indicates that a change to an applications workload characteristics has occurred and that more
detailed analysis should be done to reevaluate the new baseline to use.

The RIOC can also be used when defining a new application. XXXCP03 lists the RIOC as a
metric to collect for the "comparable" application. Even if there is no comparable application, an
estimate can be made based on the subsystems being used. For example, empirical measurements
suggest that DB2 applications generally fall within the 0.2 and 0.3 range. See Cluster Analysis
for Comparable Applications in Section 6.5.1 for more details on applying the I/O to CPU
analysis.

Page 181
Section 6.5 -- Cluster Analysis for Grouping Similar
Applications

One of the most challenging tasks to perform in Section 2, "Gathering and Forecasting 'New'
Application Resource Requirements" is to identify a comparable application or workload to use
for developing initial estimates of application behavior and resource consumption patterns.
Often one of the best indicators of how a program is going to perform is the programming style
used by the developers. Independent of external influences, development organizations tend to
produce programs with similar performance characteristics due to the use of common
programming and data base design techniques. This comes about from the use of either
published or de facto development standards within a group.

The implications of this for the XXX organization are that if a Location's programming style can
be quickly and objectively characterized, then the most probable candidates for "comparable"
transactions, "comparable" batch jobs, or "comparable" workloads can be found within the
application portfolio of the "new" application developer's own Location. The use of cluster
analysis can help validate this as well as serve as a tool for identifying a performance cluster that
best represents any "new" application.

This subsection is not intended to be a tutorial on cluster analysis; rather, it is a description of the
basic procedure to do the analysis. Obviously, the capacity planner would use this as a starting
point for further analysis.

Initially, the easiest way to start the analysis is at the transaction or application level by utilizing
two metrics: (1) Normalized CPU busy milliseconds per transaction and (2) Logical I/Os per
transaction. These are easy to collect for an application or workload (or set of applications as a
whole such as IAS) and can be applied across different systems, release levels, and platforms.

These two metrics are very similar to the metrics used in computing the relative I/O content
(RIOC) of a workload. When they are plotted on a graph, as in Figure 6.1, clusters of work
having similar workload attributes are usually apparent. This is a "fast path" way of classifying
and understanding workloads having similar resource consumption traits and presents
opportunities for more specific analysis and improved conclusions and recommendations.

Page 182
Figure 6.5.1. Cluster Analysis showing clusters of applications by Location with similar
performance characteristics

Normalized CPU busy milliseconds per transaction (or batch job) captures the average
processing time of the transaction in a processor-independent metric: normalized CPU busy
milliseconds. The normalization is done by applying a factor to the measured CPU busy
milliseconds using some standard MIPS rating as obtained from either the IBM LSPR numbers
or Gartner's MIPS ratings so that transactions from different processing systems can be
compared. This is similar to the RIOC metric "M*B" where M is a factor based on the processor
power rating and B is the percent of CPU resource consumed. In both cases the measurement of
processor usage must be adjusted to include "uncaptured" CPU time.

Logical I/Os per transaction (or batch job) represents the measured average number of logical
I/Os per transaction for a transaction type or entire workload.

The first goal of the analysis is to create a plot such as the one seen in Figure 6.5.1. In this
fictitious comparison, there appears to be logical clustering by both the access method utilized,
and within an access method cluster, by Location. Analysis of the latter may result in the
conclusion that the coding techniques play a factor in the use of CPU and/or I/O resource
utilization and that each Location tends to develop applications that inherit certain performance
characteristics. If this was true, the reasoning behind this would have to be substantially
explored.

Page 183
The procedures used to identify a comparable application candidate for a "new" application are:

Identify a set of applications or workloads which are potential candidates. This may be from
different Location development groups or applications which perform similar functions or are
perceived to be "comparable".
Choose peak hour measurements for the candidate workloads and plot on a graph similar to
Figure 6.5.1.
Look for clusters.
Work with application development manager or business owner to estimate which cluster would
be the best fit for the new application.

Page 184
Section 6.6 -- Business-driven Forecasting Techniques

The forecasting techniques are assumed in this document to be business-driven. This means that
the forecasting techniques and projections are based on the growth of business variables or
factors. These are then translated into growth of DP transactions or expected resource usage.
The following figure defines some of the terms used in the methodology and provides an
example of the two forecasting techniques -- Business Driver Forecasting Technique and
Business Transaction Forecasting Technique.

In Example 2 on the right side of Figure 6.2, an interactive inventory application is driven by
users accessing the data base to look up part numbers (Query), edit entries (Update), and
displaying a component and its parts (Display Component). Thus, an IT customer would do one
of the business transactions: Query, Update, or Display Component. Using the Transaction
Volumes forecasting technique, the capacity planner would ask the IT business owner the mix of
business transactions for a typical customer and how each of the business transactions would
grow. These would then be translated into the required DP transactions that would be processed
to service a business transaction. This translation is called the Business-To-DP Transaction
Mapping. Monitoring tool and logs would capture the resource usage per DP transaction,
adjustments made to factor in uncaptured usage (see Capture Ratios), and using the Business-To-
DP Transaction Mapping, usage per business transaction derived.

Figure 6.6.2. Two forecasting techniques, definitions, and examples of usage.

Page 185
How much workload will exist for an application may also be estimated by increases or
decreases in its business drivers. In example 2, the IT customers themselves could be a driver, as
well as, those depicted in the figure. If a conclusive correlation is determined between the
resource usage for an application and a business driver or combination of business drivers, the
Business Driver Forecasting Technique would suffice to estimate future resource projections.
Through regression analysis, a mathematical formula and coefficient of correlation can be
derived where the variables are one or more of the business drivers with the result being the
measured usage. Success varies with this technique and high accuracy may be sacrificed.
However, it does usually have a benefit for longer term budgeting, and does provide a ball-park
estimating technique that executives can relate to.

Both techniques should be employed to allow quick projections (Business Driver Forecasting
Technique ) and more accurate estimates (Business Transaction Forecasting Technique). Figure
6.6.3 illustrates the amount of information needed to be measured or counted, the effort or
complexity to develop a conversion algorithm, and the degree of accuracy for each forecasting
technique. Note the higher accuracy of utilizing just DP transactions; however, it is impractical
to expect the IT customers to project growth in terms of DP transactions. Thus, the purpose of
the Business-To-DP Transaction Mapping.

Figure 6.6.3. Two forecasting techniques and tradeoffs to be made in their usage.

Figure 6.6.4 illustrates the implementation of the Business Driver Forecasting Technique. This is
an oversimplification of showing correlation. The mathematically accepted way is to use
regression analysis and determine the correlation coefficients. Form XXXCP13 discussed in
Section 3, Activity 5, provides the structure to document business driver trends, associated usage,
the regression algorithms for both CPU and DASD Space usage, and the derived coefficient of
correlation. This document is not a text on regression analysis or correlation and merely provides
the vehicle for capturing the necessary information and reporting the results.

Page 186
Correlation of business driver values to workload resource usage will determine driver validity and allow
workload conversion factors to be derived.

YES
WORKLOAD XYZ Good
Correlation? "# of Users"
DRIVER is a Valid Driver
# Of Users Measured
CPU
+30 Mo's
Driver
Date Value MIPS Expected Workload Estimated
Now 40 400 Driver Conversion CPU
NO
- 6 Mo's 30 300 Value Factor MIPS
-12 Mo's 20 200 ------------ --------------- -----------
-18 Mo's 10 100 100 X 10 = 1000
DRIVER "# Of Projects"
Measured is NOT a valid
# Of Projects Driver CPU driver
Date Value MIPS
Now 100 400
- 6 Mo's 90 300 Multivariate regression analysis may have to be used to
-12 Mo's 100 200 determine the usage relationships when no single
-18 Mo's 100 400 variable correlates sufficiently

Figure 6.6.4. Business Driver Volumes technique example.

Like the Business Driver Volumes technique seen in Figure 6.6.4, Figure 6.6.5 illustrates the
Business Transaction Forecasting Technique. What it depicts are the measured volumes,
associated usage per transaction, and expected growth projections for Workload #1. Note the
mappings between business transactions A and B. A maps directly to DP transaction T4;
whereas, business transaction B requires three transactions, T1, T2, and T3, to service it. It was
determined from shut down logs that the average CPU MIPS per transaction T4 was 0.01. Since
it mapped one-to-one, the usage per business transaction A is 0.01. Likewise, it was found that
transactions T1-T3 averaged to 0.02 MIPS per DP transaction. Here the business to DP
transaction mapping was one-to-three, or the usage per business transaction B is 0.06. The
estimated number of business transaction were solicited from the business owner and equal to
100 and 50 respectively for business transactions A and B. The expected usage for these
quantities of business transactions can now be estimated by multiplying the number of business
transactions by the usage per business transaction.

Page 187
Workload
Conversion Growth
Factors Estimates
Business DP
Transaction Transaction Measured CPU Estimated Estimated
(BT) (DPT) CPU MIPS MIPS per # Future CPU MIPS
Workload
per DPT BT BTs Needed

1 A T4 0.01 0.01 x 100 = 1

B T1-T3 0.02
Weighted 0.06 x 50 = 3
Business to DP Avg.
Transaction Translation

Workload 1 CPU Power Estimate = 4

0.06
2 C T4-T8 (Weighted 0.3 10 3
Average)

Figure 6.6.5. Transaction Volumes Technique example.

Various tools provide the facilities to map business transactions to DP transactions. See the
implementation procedures provided by the vendor of the product.

Three terms used in the XXX business-driven capacity management process are worthy of being
repeated; their understanding is imperative before continuing:

Definition: A Business Driver is an element of the business that drives the need for I/T resources;
e.g., for fff, the drivers are items and files.

Definition: A Business Transaction is a specific business function accomplished by an


application or the end user; e.g., for fff business transactions are business functions such as T1-
Edit and T2-Distribution.

Definition: A DP Transaction is a unit of work as seen by the subsystems servicing the


application and user; e.g., IMS transaction, batch job step or job , etc. These can usually be
measured directly by current monitoring tools.

Business drivers affect the number of business transactions to be executed. Business transactions
translate into DP transactions, which in turn demand CPU, DASD, and network resources.
Network traffic is also a result of executing business and DP transactions, but the network
demands are best described in terms of traffic flows decomposed into characters/sec or bits/sec
across the network components as illustrated in Figure 6.6.6. The resulting system and
networking metrics that measure the resource loads are described in Section 5.

Page 188
Application XYZ

Business
Workloads XYZ Decision Support XYZ Batch EOD

Business
Update Query Close-out Report 59
Transactions
DP Batch1-10
IMS Transaction AJ123
Transactions

Traffic Transaction File Transfer Channel Extender


Type (Bulk Data) (Print)

Business Query File Report


Package
Business/Application Queries/Min, Files/Hr Reports/Hr
Layer Transactions/Min
Traffic Transactions/Sec, Messages/Sec, Messages/Sec,
Demands Access Layer Messages/Sec, Packets/Sec, Packets/Sec, Packets/Sec,
Characters/Sec Characters/Sec Characters/Sec
Backbone Layer Bits/Sec Bits/Sec Bits/Sec

Figure 6.6.6. Decomposition of an application into traffic demands.

Capacity planning for the network requires additional decomposition steps to describe the
information flowing over the network. Also, the information begins to lose its affinity with the
original business transaction and DP transaction due to the inadequacy of monitoring tools and
programming techniques to retain a transaction's identify from/to the application.

Page 189
Section 6.7 -- Additional Analysis and Forecasting
Techniques

One or more techniques must be selected and applied when analyzing demands, forecasting
resource load, and analyzing new applications, existing workloads, and environmental changes.
Also, these techniques can be applied at an application-specific level or applied to all host and
network resources as would be needed in the development of an annual capacity plan. The
following techniques cover all the major categories:

Business Volume Indicator (or business volume predictor)


User group
Historical
Linear extrapolation (or Uniform growth)
Heuristic (or Rules of Thumb)
Simulation
Analytical and Queuing
Comparables
Benchmark
Trial and Error

For applications still in development, Section 2 of this methodology focuses on collecting the
right data during each phase of the application development life cycle. An extensive set of forms
contained in Appendix B provide a suggested structure for collecting and organizing the data
using a combination of the nine techniques listed above. Section 2 extends each of these
techniques into a business model that is used to establish (if reasonable and cost-effective to do
so) business activity as a predictor of an application's demand for IT resources. This business
relationship becomes extremely vital in forecasting an application's growth after it has been
cutover to a production environment.

Note: As will be seen in the description of each technique, the business model that is developed
in Section 2 and Section 6.6 is a prerequisite to the success of the forecasting effort.

For existing workloads, Section 3 of this methodology focuses on collecting the right data
periodically to support capacity analysis, forecasting, and reporting. An extensive set of forms
contained in Appendix B provide a suggested structure for collecting and organizing the data
using a combination of the nine techniques listed above. Section 3 extends each of these
techniques into a business model that is used to establish (if reasonable and cost-effective to do
so) business activity as a predictor of an application's demand for IT resources. This business
relationship becomes extremely vital in forecasting an "existing:" application's growth.

Page 190
Environmental changes include changes to: operating system levels; network operating system
levels, platform changes, upgrades to particular host or network components, link speed changes,
protocol, technology, and the application of maintenance to a hardware or software component.
The same techniques used for new and existing application/workload forecasting are applied here
to environmental changes. These techniques have no direct connection with a business model as
they did when used for forecasting new and existing application IT resource demands. Rather,
the application of the results from these techniques is more often the determination of a factor to
be applied to the overall IT resource load than the determination of an independent load (except
perhaps in the case of CMC CPU load forecasting) to be added to a resource.

Some examples of variables to be forecast under the category of environmental changes are:

Anticipated Event Sample of variables to be estimated

Operating System upgrades 1."Overhead" factor to apply to overall


system load or comp
2. Factor to apply to capture ratio

Network component upgrade 1. Adjustment factor to be applied to


utilization
2. Adjustment factor to be applied to
throughput (e.g., charac

Line speed changes 1. Adjustment factor for throughput


thresholds
2. Adjustment factor for utilization
estimates

Protocol changes 1. Adjustments for message, packet, frame,


or cell size
2. Adjustments for control character
overhead

Application maintenance 1. Adjustments to CPU per transaction


2. Adjustments to RIOC

As one can see from the previous example, the focus of the forecasting effort is the impact of a
change on other forecasted variables such as CPU utilization, throughput, or "overhead"
(additional protocol overhead or operating system "overhead"). In most cases, the effect of the
anticipated event will be relatively small in magnitude but can be rather large in scope (i.e., it

Page 191
affects all applications) and may result in lowering the effective capacity of network components
to transport data.

Page 192
Analysis and Forecasting Techniques

1. The Business Volume Indicator (or, Business Volume Predictor) approach uses a business
growth model in determining IT resource demands. The approach depends upon a strong
statistical correlation between business activity and IT activity. Similar approaches are: NFU
(Natural Forecasting Unit), Application Units (SPE). This relationship needs to be statistically
determined over a reasonably long period of time through a technique such as Multivariate
Regression (Least Squares Regression is the minimally accepted technique). Within this general
approach, two techniques are described in this methodology in Section 6.6:

Business Driver Forecasting Technique


Business Transaction Forecasting Technique

The Business Driver Forecasting Technique examines a particular workload, such as fff
processing and identifies potential candidates for the correlation analysis. Usually, the
application developers and business owners would know intuitively what business activity is
going to drive their application. In the case of fff, the business drivers were known to be Files,
Batches, and Items. That means that there also appears to be a direct correlation between these
drivers with the volume of work processed by the host and the volume of work handled by the
network.

2. The User group approach focuses on interactive traffic and is typically not used to predict
IT resource growth for batch environments. The underlying growth model for this technique is
based upon (1) the numbers and types of users (e.g., clerks, managers, and analysts are put into
their own aggregate group) and (2) the number and type of workstations. Each user group or
workstation represents a "typical" amount of work to be processed. In the aggregate, this
technique is useful when no business correlations exist or where business correlations are too
difficult to identify and track. This is especially true with environments characterized by ad hoc
queries and graphics work.

3. Historical Trending examines the past historical behavior of a workload and assumes that
its previous pattern of growth and behavior will continue in the future, regardless of any
correlation of growth with IT activity. No particular growth model is assumed for this approach.
Similar statistical techniques to the ones used in the Business Volume Indicator approach can be
used or even simpler curve-fitting techniques (e.g., Least Squares). This is a very risky approach
if their is any volatility in the workload. Also, this technique usually results in greatly over-
estimating the capacity required in order to cover the lack of rigor in estimating growth.

Forecasting new application demands by historical growth has limited accuracy and no tie to
business activity at all. This technique looks at the past year's history of the system or network
under consideration and determines what percent of all new growth was due to the addition of
new applications. For example, if 30% of the growth in IT resource consumption over the last
year can be attributed to the addition of new applications on the host or network, then that same

Page 193
30% factor is applied to the upcoming year. Not very scientific, but for a very stable, small
environment where prior year's history has proven to be fairly consistent and predictable, this
technique is sufficient.

4. Linear Extrapolation is one of the simplest techniques to be used. It can be done with a
pencil and paper. A Uniform growth model is assumed for this technique (usually an unwise
assumption in complex and volatile environments). Simply put, the current growth curve is
extended "up and to the right" without any consideration for workload fluctuations or cyclic
activity.

5. A Heuristic technique is just a fancy description for the use of Rules of Thumb. This is
also know as the common sense approach based upon the experience and intuition of the capacity
planner. In less sophisticated environments, planners have a "gut" feel for what needs to be
added to the host and network resources to accommodate growth. These rules of thumb are the
guidelines that the capacity planner uses to make capacity decisions or recommendations. For
example, if the capacity planner has a processor that is currently averaging 50% busy with an
upgrade point of 75%, has an annual growth rate of 50% per year, and knows that it takes six
months for an upgrade to arrive, then he needs to order the upgrade now! This is the least
sophisticated of all of the approaches with no growth model other than personal experience and
intuition.

Heuristic techniques involve the capacity planner's intuition and experience without necessarily
going to a repository of comparable changes. For instance, a capacity planner may know that, on
average, every migration to a new release of MVS has cost him 2% of his capacity without
accounting for any exploitation of new performance features.

The heuristic, or Rules of Thumb (ROTs), approach is similar to the historical growth technique
in that the intuition and experience of the capacity planner over a period of years with the system
or network under examination is used as the basis for the forecast. As an example, a capacity
planner may know intuitively that typically 20 to 30% of each year's growth is attributable to
new applications. He knows about when the applications get rolled out and that typically no new
applications are introduced during the last two months of the calendar year. For his organization,
this approach has been sufficiently "accurate" because the company is willing to accept a
forecast which is more than 20% inaccurate.

6. The simulation approach has been used at XXX to model the front-end process of the fff
application. A product from Bachman called prod1 was used to prototype the application and
estimate the CPU and DASD resource load of this application under various scenarios. Business
metrics were used to develop the volume estimates of DP transactions to be run through the
model since there was a strong correlation of the volume and mix of fff Files, Items, and Batches
with CPU and DASD load. The application was represented to the prod1 model through a meta-
language and prod1 used resource loading tables for IMS FastPath and DB2 calls to provide a
high-level simulation of the fff application.

Page 194
This is just one example of the many types of simulation packages that can be used to forecast
the IT resource load of a new application. However, the value of this approach can only be
achieved if there is confidence in the application volumes that will be run through the simulator.
This confidence can only be achieved with a robust business model that provides accurate
forecasts of business loads along with quantified business drivers.

Another simulation technique that is useful is TPNS (IBM's Teleprocessing Network Simulator,
which is installed at XXX). This product requires production-level code and is usually used for
applications that are currently in production or very near the production cutover date. In this
environment, the application is actually run on the target system and the network environment is
simulated through various line and terminal drivers on another processor attached to a pair of
FEPs. In this way, the response times, line loads, and component loads can be measured for
various "what-if" scenarios.

6. Simulation is rarely needed to estimate release level costs for operating system changes,
but it is used to evaluate the capacity impact of technology changes such as moving from 3380
DASD to 3390 DASD, or moving from X.25 to frame relay, or moving from one topology design
to another. In these later cases, simulation is the only method short of a benchmark to accurately
quantify the capacity effects of those changes.

7. The use of Analytical and Queuing techniques requires a strong background in statistics to
obtain the full benefit of the techniques available in this category. Sometimes these techniques
are combined into a single forecasting tool. The analytical techniques refer to the use of
mathematical formulae to represent certain performance behaviors on various host and network
components. An example of this would be a linear equation to represent the increase in
component load as a proportion of the increase in the transaction volume of a particular type.

The queuing techniques are a type of analytical technique which characterizes the transactions,
jobs or data moving through the network or host as a set of arrival distributions for a set of
resources Each resource estimates its service time for each transaction from a statistically
determined distribution of service times. The key metrics for the model are based upon known or
derived distributions of arrival rates and service times.

The Analytical and Queuing techniques are more frequently used with existing system. A robust
business model is needed to provide input to the arrival rate and volume determination process.
The analytical techniques could be used to do a simple linear extrapolation of program test data
to larger transaction volumes. However, the assumed linear salability of an application breaks at
some point in the growth cycle.

Analytical and Queuing techniques may be the easiest to use for modeling CPU, network
component, or protocol changes since these models typically work with average service times
and can easily work with adjusted service times.

Page 195
8. One of the best ways to forecast the impact of an environmental change is to examine the
data of "comparable" changes. Since environmental changes occur rather frequently, it is useful
for the capacity planner to develop a repository of changes for each major change type. From
this repository, the planner can roughly identify the impact of these changes in terms of
additional MIPS, reduced throughput, additional protocol overhead, or any other metric. This
technique is very subjective and is similar to the next technique, Heuristic.

The comparable application technique, documented in Section 2, seeks to estimate the IT


resource demands and loads for a new application based upon the perceived match between an
existing application's performance characteristics and the new application's expected
performance behavior. Section 6.5 describes a cluster analysis technique to be used to help
determine if an existing application may be "comparable" to the new application. Once a
"comparable" application has been identified, then the performance and behavioral
characteristics can be used to "model" the new application.

However, while this technique helps in quantifying the IT resource load to be forecasted, the
technique is not sufficiently comprehensive to help forecast IT resource demand. For that effort,
an IT resource demand technique needs to be used, such as those described in Section 2 and
Section 6.6.

9. Benchmarks are used where additional risk is involved and simulation models either don't
handle the particular case under study or the cost of being wrong is too high (The Missouri
"Show Me" State Syndrome).

The Benchmark technique is very labor-intensive but can be used to directly measure the
application's load on the processor, DASD, and potentially network, under various growth
scenarios. This technique requires production or near-production level code in order to provide a
planner with the accuracy needed. However, prototype code in the design and development
phases can be used to understand the basic performance characteristics of the application with
the aim towards improving its efficiency.

The accuracy of the measurements with this technique is usually not a serious problem.
The main concern with this technique is the requirement for good transaction volume estimates
as well as a good description of the transaction mix. This can be gleaned from a solid
understanding of the business factors driving the application. Business driver analysis needs to
be done prior to the benchmark in order to insure that the benchmark scenarios reflect a realistic
target environment.

10. Trial and Error techniques really mean that a "fudge" factor is simply applied across the
board to what are believed to be affected components. This is done with almost no analysis and
heavy reliance upon the "fudge" factor to be sufficiently large to cover over any mistakes. In the
end, this approach leads to systems which are always over-configured. The justification for the

Page 196
over-capacity situation is usually won on the basis of "FUD" ("Fear, Uncertainty, and Doubt")
arguments (otherwise known as a poor sales technique).

The Trial and Error technique is really just an educated guesstimate of the resources
required to meet the guesstimated IT resource demand. The only reason that this technique
survives is that it is quick and easy to apply and the capacity planner is allowed to apply a
significant "fudge" factor to the estimate to allow for the gross estimation error. This is tolerated
with the agreement that the "excess" capacity requested will be reduced to a reasonable amount
after the application has been put into production and actual measurement data is available.

Page 197
Common Analysis Approaches

The types of analysis techniques were covered in Section 6.3. There are more specific
techniques that are available to the capacity planner than were mentioned in that section;
however, these are more often used by the participants of the performance management process.
Table 6.7.1 describes these approaches.

Analysis Approach Description


Component Threshold Analysis Each component has one or more performance or
service thresholds. Exceeding one of these
thresholds kicks off the analysis to determine the
cause of the exception. This technique is really an
approach to organizing the work of the performance
manager to focus on only those areas where
predetermined thresholds are exceeded.
Control Limit Analysis The variation in service levels, component
performance, traffic loads, and job completion times
can all be analyzed through the statistical process
control techniques used in quality initiatives. This
can be a very effective tool for identifying systemic
variances.
Response time and Throughput Analysis The main focus for any component analysis is on
the consistent delivery of response time and
throughput. Deviations from the "norm" kick off the
analysis project.
Growth Driver Analysis Correlation and regression techniques are used to
determine the drivers of growth in IT activity by
component or system. Analysis of variations is also
included.
Response Time Decomposition The focus is on understanding the components of
response time and focusing the tuning efforts on the
component with the largest contribution to the
response time and then looking at which
components contribute the greatest delay in
returning the response to the end-user.
Traffic Analysis (use of Erlangs) Mostly a telephone company approach to describing
traffic by offered and carried load as well as using
the Erlang approach to estimating load
Contention Analysis The main focus is on analyzing the component
within the host or network environment that has the
longest queues or contributes the most delay to
response time.

Table 6.7.1. Table of some approaches to analysis.

Page 198
Section 6.8 -- Application of Techniques

This document describes an overall methodology designed to better tie I/T resource use and
forecasts to business functions and events. Its major focus is to improve the I/T resource growth
estimates and input to a system/network forecasting tool/technique, by systematically collecting
and analyzing relevant business, environmental, and system system-generated resource load
information. The methodology provides a disciplined Business Model to collect the necessary
information and translate it to resource growth demands. This discipline aims at improving the
accuracy of the input to the System Models used to forecast and design alternative configurations
for review. It further provides the inxxxtructure to: efficiently manage the volumes of necessary
performance and business data, to design effective reports that focus on the decisions that need to
be made, and to initiate a quality capacity management process that fosters communications and
continuous improvement.

But where does one begin? Which parts of the methodology should be used? How can this
methodology be implemented in the XXX organization?

These and other questions will be answered in the following topics:

6.8.1 XXX Data Collection, Analysis, and Forecasting Views


6.8.2 General Considerations for Selecting and Applying Techniques
6.8.3 Applying the Techniques
6.8.4 Getting Started

This section was added exclusively for the forecasting of network resources and has not been
updated for CPU and DASD resources.

Page 199
Section 6.8.1 -- XXX Data Collection, Analysis, and Forecasting
Views

One could view a forecasting effort as those activities necessary to produce specific capacity
outputs or reports. Section 4 introduces three types of reports:

annual capacity plan (or periodic updates)


capacity forecast reports
capacity status reports

Although the details for each of these types were not the focus of the first two versions of this
methodology, recommendations of content and design were discussed. Sections 2 and 3 and the
associated forms suggest data content and organization which supplement information in Section
4 to yield a solid base from which effective capacity reports can be delivered. Section 4 also
addresses the need to customize reports to recipient requirements and suggests four general
views for presenting capacity-related information: location, user, topology or component, and
application.

The views that need to be presented also define the specifics of the capacity study to be
performed to produce these views. Consequently, the selection of data collection, analysis, and
forecasting techniques to complete the study will be driven by these views. Data collection
techniques (see Section 5), analysis techniques (see Sections 6.3-6.5), and forecasting
techniques (see Sections 6.6-6.7) were discussed in general terms in the referenced sections.
Section 6.8.3 will relate these techniques to three Capacity Management process views that will
define the content and depth of capacity studies and the selection of techniques:

Application View (see figure 6.8.1.1)


Location View (see figure 6.8.1.2) -- applies also to smaller user groups
Component View (see figure 6.8.1.3)

A fourth view as seen by the Network Design process, the Topology View (see figure 6.8.1.4),
will use information from the Capacity Management process. Planning relative to this view is the
responsibility of the Network Design process and considers the optimization of paths for
performance (input from Performance Management process), the redundancy of paths for
recovery (input from Recovery Management process), and capacity growth (input from the
Capacity Management process).

In summary, when performing a capacity planning study, one or more of the above views might
be taken. For example, to produce the Annual Capacity Plan (see Section 4.1.1, Basic Elements
of an Annual Capacity Plan) a Location View must be utilized to report a location's or user
group's contribution to future growth of resources, whereas, an Application View would drive the
reporting of effects of a new application. An element always present in an annual capacity plan is

Page 200
the documentation of configuration alternatives. The alternatives are driven by Component
Views when the focus is on specific configurations of key resources and Topology Views when
component and path configuration alternatives are required..

Each of the three Capacity Management process views is presented in a structured format to
expedite its use as a reference base for further capacity data collection, analysis, and forecasting
discussions; the topology view is also presented for continuity, but is not as structured or to the
depth of the capacity planning views. The structure consists of the following elements:

Drivers - reasons why this view is taken


Objectives - typical objectives of a study designed to produce information for the
view
Outputs - deliverables from the study in support of the view

The format's intent is not to exhaustibly document every possible detail about the view, but to
quickly highlight the most important points relative to the view.

Overviews of the three Capacity Management process views and the Network Design process
view are available in the following subsections:

6.8.1.1 Application View Overview -- capacity management process view


6.8.1.2 Location View Overview -- capacity management process view
6.8.1.3 Component View Overview -- capacity management and network design process
view
6.8.1.4 Topology View Overview -- network design process view

Page 201
Section 6.8.1.1 -- Application View Overview

APPLICATION
VIEW
Application

Location 1 Location 2 Location n

Access Link 1 Access Link 2 Access Link a


FEP 1 Access Link 3 Access Link b
Printer n FEP 2 Access Link c
MICR n Tape n FEP i
Tape n FEP j
Printer n
Other Network Components

Figure 6.8.1.1. Application View for Data Collection, Analysis, and Forecasting

The Application View (figure 6.8.1.1) provides the view from the perspective of an application,
independent of other applications. It focuses on understanding that application's demands on
overall network resources. This is employed throughout the Application Development Life Cycle
by the Capacity Management methodology as described in Section 2. However, the view is also
valid for an existing application and may be prompted by periodic reports from Section 3 or as a
special project. Although the focus is on a specific application, the aggregation of individual
studies for all the key application contributors to the total network traffic is of importance when
estimating and designing the widely-shared network components, such as the backbone.

Page 202
The drivers of this view and the resulting study objectives and outputs are:

Drivers: (1) Capacity management methodology checkpoints defined to collect


(2) An application or technological change that will impact the capacity requ
(3) Exceptions to an application's service objectives (driven from
(4) A special application study for some other purpose

Objectives: (1) To document an application/workload's demands on the overall


(2) To forecast the effects of a specific application/workload on all or
(3) To provide necessary growth and estimated load requirements for

Outputs: (1) Resource capacity estimates for this application alone (see form
(2) Study report or summarization letter (like XXXCP16)
(3) View of demands on the other more widely shared network
(4) Capacity estimates by component by hour of the day (XXXCP08)

Page 203
6.8.1.2 -- Location View Overview

LOCATION
VIEW
Location

Application 1 Application n Other Applications

Access Link i Access Link i Access Link i

Access Link n Access Linkn Access Link n

FEP i FEP i FEP i

FEP n FEP n FEP n

Printer i Printer i Printer i

Tape/MICR i Tape/MICR i Tape/MICR i

Other Network Components

Figure 6.8.1.2. Location View for Data Collection, Analysis, and Forecasting

The Location View provides a perspective of resource usage and growth requirements by
location; it is also readily adaptable to smaller user group's as well. A location can be either a
user location, like a Location, or a XXX Processing Site. Its focus is on understanding the traffic
to/from the user/application location and the contribution of each location's demands on the
overall network and immediate Access Links, FEPs, and other business attachment components
(see Section 5.3). Specific resource estimates of widely-shared resources, such as backbone
resources, are not a result of studies with this view. However, an output from the forecasting
effort should be the proportion of overall network traffic by each location. This is one input of
value to the network designer when designing the overall topology (see Section 6.8.1.4).

The drivers of this view and the resulting study objectives and outputs are:

Drivers: (1) Semi-annual capacity plan updates (see Section 3)


(2) Annual Capacity Plan (see Section 3)
(3) Relocation of an application or its users (quarterly review of
(4) Change to a XXX Site's application portfolio (quarterly review of
(5) Technological change (quarterly review of XXXCP11)

Page 204
(6) Budget cycle requirement (should correspond to annual capacity
(7) Exceptions to service objectives (driven by Problem or Performance

Objectives: (1) To understand a specific user location's application and user


(2) To determine the relationship of all user locations on the total
(3) To Estimate the FEP and Access Links for a location

Outputs: (1) Identification of key workloads for each location responsible for
(2) Traffic analysis by location
(3) Estimate of future loads on user location FEP(s) and Access Links
(4) Estimate of future loads on XXX Site FEP(s) and Access Links
(5) Study of business and usage relationships (see XXXCP12,

Page 205
6.8.1.3 -- Component View Overview

Component (a) Component


VIEWS
Location 1 Location n

Workload 1 Workload 1
Workload n Workload 2
Other Workloads Workload n
(b) Component Other Workloads

Other
Workload 1 Workload n
Workloads

Location 1 Location 1 Location 1

Location n Location n Location n

Figure 6.8.1.3. Component View for Data Collection, Analysis, and Forecasting

The Component View provides a perspective of resource usage and growth requirements by
network component or group of components. It is usually driven by an exception to a
component's capacity threshold or a "yellow' flag indicating that a threshold is about to be
exceeded occurs. The latter could be a result of measuring existing resource usage, or the output
from an Application View or Location View study. Since the focus is on an individual
component, more than processor capacity would most likely be modeled; e.g., buffers or
attachments in a 3745. These are important issues and will be outputs from a design model run
by the network designers. Input to these models will be the traffic characteristics documented
from this capacity methodology and configuration and contingency information maintained by
the Network Design process. The information below focuses only on the Capacity Management
process deliverable, traffic characterization and growth capacity.

The drivers of this view and the resulting study objectives and outputs are:

Drivers: (1) Quarterly capacity status reports for key resources (see Section 3)
(2) Exceptions to service objectives (driven by Problem or Performance
(3) Exceptions to component thresholds (see Section 5.2)
(4) Change to a XXX Site's application portfolio (quarterly review of

Page 206
(5) Technological change (quarterly review of XXXCP11)
(6) Correlation analysis (see Section 6.5)

Objectives: (1) To measure the traffic loads on a component; i.e., usage, trends,
(2) To provide capacity growth demands to the Network Design

Outputs: (1) Component capacity demands and estimated traffic loads (see
(2) Capacity status and trend reports (see Section 3 and Section 4)
(3) Information in studies (views)

Page 207
6.8.1.4 -- Topology View Overview

TOPOLOGY
VIEW User Location 1

User Location 3

User Location 2

FRAS Site 1

FRAS SIte n

User Location n

Figure 6.8.1.4. Topology View for Network Design

A fourth view as seen by the network design staff, the Topology View (figure 6.8.1.4), will use
information from the Capacity Management process. Planning relative to this view is the
responsibility of the Network Design process and involves the optimization of paths for
performance (input from Performance Management process), the redundancy of paths for
recovery (input from Recovery Management process), and capacity growth (input from the
Capacity Management process).

This view encompasses all the resources commonly shared by more than one location; i.e., those
resources where the detailed study of resource usage by application or location may not be
warranted. Resources included here will be backbone components and other routing components;
in general, all those not included in the Location View. The planning of the components and
paths within the network is the responsibility of the network designer within the Network Design
process. Capacity planners play a key part in this design capacity growth is one of the
considerations. Also, the capacity planning methodology collects other vital pieces of
information that would contribute to the design of a network, e.g., service and operational
requirements (Form XXXCP02). However, overall policies for recovery and availability from
other Systems Management processes will play an equal or even more important role in network
design when determining contingency requirements. Thus, capacity planning only provides input
to network design; other inputs must be obtained from the other processes.

Page 208
The XXX Capacity Management process and interfaces highlighted in Section 1, figures 1.1 and
1.3, indicate the key activities performed by each process. Figure 1.2 in Section 1 provides an
overview of the inputs and outputs from the Capacity Management process and respective
processes. The definition and documentation of the processes and interfaces ensures that
activities are clearly understood and that a complete set of activities are defined to complete a
business function.

Because the XXX customer is probably interested in the total costs associated with his
application, location, or other user group, an all-inclusive (capacity, recovery, availability, etc.)
resource recommendation needs to produced by the network designer. Form XXXCP15, page 3,
suggests the information that should be reported. In short, it should provide be an acquisition
plan (new equipment requirements) and/or other alternatives for satisfying the customer's
capacity, availability, recovery, and XXX' contingency requirements. Although the capacity
planning process collects some of the necessary information relevant to availability, service, and
recovery, it focuses only on what is needed to address production use demands.

Page 209
6.8.2 -- General Considerations for Selecting and Applying
Techniques

Some of the selection criteria which should guide the capacity planner in deciding which
technique to use are:

Who is the customer of the forecast and what does he want?


What level of accuracy is required of the forecast?
What is the exposure to service levels of being wrong?
What is the impact to IT costs of being wrong?
How much time do I have to create the forecast?
How complex is the change that needs to be forecasted?
Am I collecting the right data to do the forecast?
Do I need to collect additional data to do the level of forecasting I need?

Previous subsections in Section 6 described techniques for estimating growth and for forecasting
resource loads. This methodology assumes the capacity planner is experienced with most of the
techniques previously discussed. However, the selection of the most appropriate and efficient
technique is not always made and some capacity planners rely on one technique for everything.
This subsection focuses on four general selection criteria:

Time to complete
Accuracy required
Capability of tools
Data available
Cost

Time to complete a study is only one factor that needs to be considered when selecting an
approach and techniques for estimating growth demands and forecasting resource loads. In a
"reactive" environment, completion date is always "yesterday". This precipitates techniques that
are far from adequate to project resources in today's rapidly changing environments. This
methodology exists to improve XXX' proactive posture relative to capacity planning. Therefore,
if effectively implemented, time can now be considered in its right context.

Page 210
The techniques for estimating growth demands and forecasting resource loads is discussed in
Section 6.1. There can be many combinations of techniques. Following is a list of the Data
Collection and Growth Analysis techniques and some combinations of Component Load
Forecasting Techniques relative to the time and effort it takes to complete a study:

Data Collection and Growth Analysis Techniques

Trial and Error growth numbers (New, Existing, Changes)


Heuristic growth numbers (New, Existing, Changes)
Growth based on Historical Analysis (Existing)
Growth based on Comparables Analysis (New, Changes)
Business Drivers, Correlation, and Regression Analysis (New, Existing)
Business Transactions (New, Existing)

Resource Load Forecasting Techniques

Linear Projection using Trial and Error or Heuristic growth estimates (New,
Existing, Changes)
Projections based on Business Drivers using Regression Algorithm (Existing --
when a correlation exists between driver growth and resource use)
Linear/nonlinear Projection based on Historical growth
Projections improved by Comparables Analysis and Heuristic experiences (New,
Changes)
Use of Analytical & Queuing methods regardless of how growth is estimated
(New, Existing -- Note: Use of Business Driver or Business Transaction growth is
preferred to provide a direct tie back to the business)
Use of Simulation methods (New, Existing, Changes)
Benchmarking (New, Existing, Changes)

The lists above are not all-inclusive of the many combinations of techniques that can be applied
to a particular study. But they should provide an idea of which techniques to apply based on the
time available to complete the study.

Accuracy of the Forecast is a second factor to consider when selecting capacity planning
techniques. Is the study a 5-year budgeting study, or is its purpose to ensure the proper resources
for a highly-visible and critical application are available when it goes into production six months
out? Accuracy is usually driven by the risk of being wrong: What will it cost the organization if
not enough resources are available? What will it cost the organization if too much is available?

Page 211
Tool practitioners and vendors continuously challenge each other as to the accuracy of their
methods or tools for forecasting resource loads: Are simulators more accurate than analytical
tools? Is linear projections always a poor predictor of resource loads? Many books and articles
are devoted to the underpinnings of these methods. A vendor's product must apply one or more of
these mathematical models. So, which is better?

Without a methodology that focuses on improving the input to a forecasting tool, the choice
should be easy -- use the least expensive in cost and time. Accuracy of forecasts depend heavily
on the ability to estimate growth. In this methodology, this is accomplished by the "business
model", i.e., the activities centered on collecting accurate business requirements and translating
them into I/T resource growth demands.

Given the proper preparation for each forecasting technique and accurate input, one method
versus another is not worth the time contemplating when an I/T system needs to be modeled. The
determinant will usually be the effort to accurately produce the inputs for the method and the
capability of the tool to model the workloads. An example of the latter is XXX' decision to select
the Bachman/prod1 product to model fff during its Development and Test Phase. fff was not a
simple interactive/batch application commonly addressed by most modeling tools. Instead, it was
a series of sequential and parallel batch processes driven by time schedules and availability of
input files. The look and feel of a product become niceties, not requirements when the
capabilities are deficient.

Within the constraint of time and the capability of data collection tools lies the granularity of
data available to analyze. Sometimes one must work with what one has. This might lead to the
selection of a technique that is known not to be the desired technique, but a substitute given the
data and time available.

The last selection criteria is cost. Obviously if the only funds available are normal operating
expenses, the selection of benchmarking technique would not seem practical. If other techniques
could be done in the same time period, with just as acceptable accuracy, then cost constraints
would be the determining factor when selecting a technique.

The capacity planner should seek the most efficient techniques that provide an acceptable
forecast.

Page 212
Section 6.8.3 -- Applying the Techniques

Suppose the task is to produce a semi-annual update to the capacity plan (see Section 4 for
general content, but note that XXX' plan may be described differently). What is needed in the
plan? Is it required to report key application resource usage - the Application View? Is there a
requirement to report location usage -- the Location View? Do only resources approaching
capacity thresholds need to be reported -- the Component View?

Suppose a major development effort is in progress, such as that of the fff project, and its second
design phase capacity planning checkpoint is due. Which views would be of importance?
Obviously the Application View since the objective of Section 2 is to flush out the network
demands for an individual application. However, one view may prompt the need for other views.
For example, while studying the needs of a new application it was discovered that a FEP
threshold would exceed its processor capacity threshold. The capacity planner must now shift
hats and involve the network designer in a more thorough study of that component if within the
provisioning lead-time window. Hence, a Component View would be taken.

Other scenarios could be given, but these are sufficient to suggest an order for selecting
techniques:

determine the desired capacity output (status, individual workload forecast, capacity plan, etc.)
determine the necessary views for satisfying the outputs and necessary depth
understand the time available, the accuracy needed, cost constraints, and availability of tools and
data
select the data collection, analysis, and forecasting techniques that fit

Note: Business-driven techniques, as this methodology suggests, should be applied in all cases
when time permits. If time is not available, get the job done some other way, but also ask why
insufficient time was available to apply proactive business-driven techniques.

The following subsections step through each view in more depth to apply the various techniques
to the methodology

Page 213
Section 6.8.3.1 -- Application View

Section 6.8.1.1 presented an overview of the Application View. Figure 6.8.1.1 suggests that the
depth of an Application View could include Business/Application Layer and immediate Access
Layer network components (FEPs, access lines, etc.).

For "new" applications in development prior to the Design Phase, and even while in the Design
Phase when plenty of time still exists prior to going production, the need to do a study to the
individual component level is debatable. During this period, comparable workloads and traffic
types should be identified and studied. For example, if the new application will be transferring
files via the same utility, an understanding of that utility and network usage to transfer files
would be in order. How will the size and frequency of file transfers for the new application
compare to the existing workloads using the utility? Can an analogy be drawn and a forecast
made from this information? The comparables technique is a very viable technique during this
period.

However, this does not preclude using cluster analysis techniques to understand the potential
capacity effects of various development groups, generally reflective of their development
standards or environment. Using similar coding standards and facilities should result in
relationships similar to the existing applications. This is again a comparables technique, but
looks at how a development group can influence the outcome.

The application of business techniques are critical for the applications still in development. It
takes very little effort to hypothesize what the business drivers are; besides, it provides the
opportunity to foster relationships with your customers. An important follow-up activity is the
development and assignment of tracking business drivers. Without business driver quantification,
no correlation or regression analysis can be performed. These are the techniques that provide
large benefits in tying resource usage to the business, and if correlation confidence is established,
can produce an easy forecasting technique (projection via a regression algorithm).

Although this approach involves more effort, understanding the business transactions and
mapping them to DP transactions will provide yields worth the effort. Initiating this task early in
development before transaction naming standards get established can simplify the mapping and
allow reporting of usage by business transaction when the application goes into production.

So far, no tools have been mentioned. All the techniques are driven by Section 2. However, some
tools, like the Bachman/prod1 tool, allow the development staff to model performance as early as
in the Design Phase of development. To do this, the tool's inputs and prerequisites need to be
examined and fulfilled. But one must consider the selection criteria before venturing off and
using a tool. Is there time to use this tool -- nothing comes free? Is its accuracy required at this
point in time -- i.e., what is the risk of being off by 30%? Tools are not a magic solution that
replaces common sense and other techniques.

Page 214
This methodology begins to produce estimated traffic demands for an application on form
XXXCP08 during the Design Phase. It forces the capacity planner towards achieving the total
Application View as represented in figure 6.8.1.1, seeking application usage by locations, and
resulting estimated loads on network resources by hour of day. The information by location
becomes valuable inputs to bother the Component and Topology View studies.

Section 6.8.1.1 summarizes the key outputs of the Application View. With this information and
information from Section 4 and Section 5, the capacity planner will be positioned to create and
deliver the reports necessary for effective capacity planning.

The XXXCP08 approach for estimating resource loads can also be utilized for existing
workloads. The objectives of XXXCP08 are to:

facilitate data collection, analysis, and forecasting of new and existing applications via business
forecasting techniques
document demands from/to the application, estimated loads on network resources, determine
usage by location in support of various forecasting techniques

The Application View is in general selected to understand an application's cost, determine the
funding requirements, and to prepare for ordering equipment. Form XXXCP15 can be used to
summarize this information for both new and existing applications from Application View
studies.

The Application View study also feeds overall Component and Topology View studies. The
application schedules (see form XXXCP01) must be consolidated with other application
schedules into a composite schedule to determine which to use in a point-in-time analysis. An
overall component study may seek the information captured on XXXCP08 for the applications
within the forecasting period, consolidating their spreadsheets with a base spreadsheet (like
XXXCP17) to determine overall estimates of resource loads. This will usually be driven by semi-
annual capacity planning updates.

Page 215
6.8.3.2 -- Location View

Section 6.8.1.2 presented an overview of the Location View. Figure 6.8.1.2 suggests that detailed
studies could include Business/Application Layer and immediate Access Layer network
components. However, information relating to an application's proportion of traffic from/to the
application, as measured from its FEP, also provides valuable information for the Component
and Topology Views.

The Location View is normally driven by periodic capacity reports and budgeting cycles. The
questions that typically get answered are:

what is each location's use of the network?


how will a location's use grow?
what will be the expected impact of a location's growth in I/T resources?
what is the proper I/T resource funding for each location?

Given this view, the capacity planner will strive to learn the history of growth (historical
technique), and within that perspective, the workloads contributing to the resource loads.

Forecasts of future growth for existing applications can be driven by simple projection
techniques (linear or nonlinear curves). The goal of the business-driven techniques documented
in this methodology is to find correlation (correlation analysis) between business drivers and
resource usage (business driver technique). If sufficient correlation confidence exists, regression
techniques might produce a mathematical approximation of a curve that can be used to project
future loads. Business drivers, initially hypothesized on XXXCP02, are tested for validity
through the correlation and regression techniques represented by forms XXXCP12 through
XXXCP14. These forms were developed to help clarify the techniques and will usually be
replaced by outputs and methods associated with statistical tools, like SAS, LOTUS 1-2-3, and
MICF.

Unlike projections of growth by application identified in the Application View, the Location
View is a composite of applications. Each application could be scrutinized as suggested by figure
6.8.1.2 and assumption of composite growth made. Like the correlation and regression analysis
done for a particular application, could not the capacity planner draw some conclusion using
similar techniques applied to the location? Are there business drivers for the location that would
simplify forecasting even more? Correlation and regression techniques should be a continuous
pastime for the capacity planner. Much can be learned, whether correlation exists, or not.
However, if attention has not been previously given to the data needed for correlation analysis,
the time element to produce a forecast may eliminate this technique.

Page 216
The Location View plays an important role for capacity status reports. Historical and current
studies of resource usage by location are a type of user group technique. In addition, the Location
View may drive studies to even more granular depths. For example, could the drivers of a
location's demands be a result of a dominant traffic type? Could file transfers be the major
determinant of capacity requirements? This would certainly be nice to know. The study of
relationships between resource usage and potential growth drivers are invaluable.
The Topology and Component Views relative to backbone modeling will seek each location's
traffic demands and estimated resource loads. These will be a derivative of Location View
analysis or a composite of XXXCP08 forms for all applications per location. In general, the
traffic to/from the network components shared by the locations must equal the traffic from/to the
FEPs. Whereas the Application View estimated the loads on the FEPs by application, the
Location View would consolidate the individual application loads into a total composite FEP
load, or the estimated load on the backbone.. Thus, input to backbone studies could be network
throughput requirements, e.g., characters per second or bits per second, by location. The network
designer would then run design models (System Models) to support his Topology View and to
deliver alternative designs to satisfy the capacity requirements determined by the Capacity
Management process.

Page 217
6.8.3.3 -- Component View

Section 6.8.1.3 presented an overview of the Component View. Figure 6.8.1.3 suggested two
more granular subsets that may be necessary to answer questions relative to a single component
or group of network components.

The Component View is usually the driver of a study when the focus is to better understand the
capabilities and requirements of a particular component. Take for example a 3745 FEP.
Determining the processing requirements may be the initial emphasis that signals the need for
more capacity. However, when it is necessary to place an order, the internal component
configuration will also need to be determined. This is a network design function which relies on
estimated traffic loads from/to the FEP as input. Thus, the Component View will seek estimated
loads from/to the component as documented on form XXXCP08 and discussed in both Sections 2
and 3.

Like the other views, collection and analysis techniques are similar for the Component View.
Techniques to understand current and historical usage of the component and correlation of usage
to business drivers is of prime importance. However, the forecasting techniques become more
tool-driven. The selection of spreadsheet methods, analytical and queuing methods, or simulation
will be necessary to more accurately design the individual component configuration or an
overall topology. the tools to model the internal component requirements for a particular
component should be available. Thus, the inputs of the tools must be captured by the capacity
methodology.

Note: The intent of this methodology is to provide general procedures and metrics to allow the
capacity planner to perform capacity planning activities. It may have to be modified if more
specific tool content is desired.

Tools for modeling overall component configurations, or entire topologies, are still not perfect.
Good tools do exist, such as some simulation services provided by some vendors, but even these
ignore the importance of accurately projecting traffic demands from a Business Model. This
methodology helps the capacity planner and network designer better understand the requirements
for providing better information to simulators and for selecting overall modeling tools. The
Topology View study, introduced in Section 6.8.3 as a fourth view and for network designers
only, is beyond the scope of this version of the methodology. However, its importance in overall
network design to provide alternative configurations for consideration is essential. The Capacity
Management methodology documents the interfaces (see Section 1 and the Management Guide)
and provides the Network Design process with the necessary inputs.

Page 218

Вам также может понравиться