Вы находитесь на странице: 1из 15

Problem Management

In This Lesson

 Purpose and Objective


 Scope of Problem Management
 Value to the Business
 Principles and Basic Concepts
 Techniques
 Activities of Problem Management
 Inputs and Outputs
 Interfaces to Other Processes
 Critical Success Factors, Challenges, and Risks

Definition of a Problem

Problem
The underlying cause of one or
more incidents
Definition of Problem Management

Problem management is the process responsible for managing the


lifecycle of problems
– Problem management seeks to trace the underlying root cause of
problems and minimizing the impact of incidents due to errors in
the underlying infrastructure

Purpose of Problem Management

Trace the underlying root cause


Purpose

Document known errors and


communicate them

Initiate actions to improve or


correct the situation

Objective of Problem Management

Prevent problems and their


resultant incidents
Objective

Eliminate recurring incidents

Minimize the impact of incidents


that can not be prevented
Scope of Problem Management

All activities required to diagnose root cause of incidents


– Diagnosis
– Documentation of known errors and work-arounds
– Resolution through appropriate control mechanisms
• Change management
• Release and Deployment Management

Problem management will also maintain information about


problems to reduce the number and impact of incidents over time

Reactive vs. Proactive Problem Management

Problem management has both reactive and proactive


components

Proactive
Reactive
concerned with solving
concerned with solving problems and known errors
problems in response to one before further incidents
or more incidents related to them can occur
again

Proactive Problem Management

Trending of incident data

Major incident reviews

Review of operational logs and maintenance records identifying


patterns

Periodic review of event logs

Brainstorming sessions

Using check sheets to proactively collect data on service or


operational quality issues to detect underlying problems

A close relationship exists between problem management and


continual service improvement
Value to the Business
Higher

Availability of services by
Expenditures on work-
reducing the duration
arounds or fixes that do
and number of incidents
not work

Productivity of IT staff
Cost of effort in fire-
by reducing unplanned
fighting or resolving
labor costs caused by
recurring incidents
incidents

Reduced

Principles and Basic Concepts

Problem models
– Many problems are unique and require individual handling
– It is also conceivable that incidents may recur due to underlying
problems that exist
– Creating a known error record in the known error database
(KEDB) will ensure quicker diagnosis

– A problem model for these types of problems may be useful

Principles and Basic Concepts

Incidents versus problems


– Incidents and problems are closely related but they are distinctly
different

– Incidents never become problems


Principles and Basic Concepts

Rules for invoking the problem process in response to an incident


can vary, but may include:
– Incident management cannot match an incident to a problem or
known error

– Trend analysis reveals a problem might exist


– A major incident has occurred
– Other IT functions determine a problem exists
– The service desk has resolved an incident but has not determined
the underlying cause

– Notification from a supplier that a problem exists

Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis
Brainstorming
5 why analysis
Fault isolation
Affinity mapping
Hypothesis testing
Technical observation post
Ishikawa diagrams

Pareto analysis

Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis

Brainstorming
Complex problems where a
5 why analysis
sequence of events needs to
be assembled to determine Fault isolation
exactly what happened
Affinity mapping
Hypothesis testing

Technical observation post


Ishikawa diagrams

Pareto analysis
Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis

Brainstorming

Uncertainty over which 5 why analysis


problems should be
addressed first Fault isolation

Affinity mapping

Hypothesis testing

Technical observation post

Ishikawa diagrams

Pareto analysis

Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis

Brainstorming

Uncertainty whether a 5 why analysis


presented root cause is truly
the root cause Fault isolation

Affinity mapping

Hypothesis testing
Technical observation post

Ishikawa diagrams

Pareto analysis

Techniques
Chronological analysis

Pain value analysis

Kepner and Tregoe analysis


Brainstorming
Intermittent problems that
appear to come and go and 5 why analysis
cannot be recreated or
repeated in a test Fault isolation
environment Affinity mapping

Hypothesis testing

Technical observation post

Ishikawa diagrams

Pareto analysis
Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis

Brainstorming
Uncertainty over where to
5 why analysis
start for problems that
appear to have multiple Fault isolation
causes
Affinity mapping

Hypothesis testing

Technical observation post

Ishikawa diagrams

Pareto analysis

Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis

Brainstorming

Struggle to identify the 5 why analysis


exact point of failure for a
problem Fault isolation

Affinity mapping

Hypothesis testing

Technical observation post

Ishikawa diagrams

Pareto analysis

Techniques
Chronological analysis

Pain value analysis


Kepner and Tregoe analysis
Brainstorming
Uncertain where to start 5 why analysis
when trying to find root
cause Fault isolation

Affinity mapping
Hypothesis testing

Technical observation post

Ishikawa diagrams

Pareto analysis
Activities of Problem Management

Work-arounds
Detection and known Resolution
errors

Investigation Review and


Logging
and diagnosis Closure

Major problem
Categorizing Prioritizing
review

Problem Management
Process

Detection

Logging

Categorization

Prioritization

Investigation and diagnosis

Work-arounds and
known errors

Resolution

Review and Closure

Major Problem Review

Detection

Reactive Proactive

Suspicion of a root cause Analysis ofincidents


of one or more incidents
at the service desk Trending of historical incidents

Analysis of incident data by a technical Activities taken to improve the


support group quality of a service resultsin the
need to raise a problem record to
Automated detection of an identify further improvement actions
infrastructure fault

Notification from a supplier


Problem Logging and Categorization
User details

Service details

Equipment details _

Date/time initially logged _

Priority and categorization details

Incident description _

Incident record numbers or other cross reference _

Details of all diagnostic or attempted recovery actions


taken

Problem Logging and Categorization

Problems should be categorized using the same coding system as


incidents so that the true nature of the problem can be traced and
meaningful management information is available

Prioritization

Problems are prioritized using the same reasons as incidents


– Priority is determined by impact and urgency

Things to consider
– Can the system be recovered or does it need to be replaced?
– How much will it cost?
– How many people and what skills are required?
– How long will it take to fix the problem?
– How extensive is the problem? (i.e. how many CIs are affected)
Investigation and Diagnosis

Investigation requires the use of the CMS and KEDB


Problem solving techniques
Chronologicalanalysis
Pain value analysis

Kepner and Tregoe analysis


Brainstorming

5 why analysis
Fault isolation
Affinity mapping
Hypothesistesting

Technical observationpost
Ishikawa diagrams

Pareto analysis

Raising Work-arounds and Known Errors

Work-arounds are a temporary way of restoring service or


minimizing the business impact
– Example: restarting a service on a server that has failed

Known errors are problems with documented root cause


– Example: A defect in the configuration of an application causes
user password lockouts

Work-arounds and known errors should be documented and


communicated to support personnel and the service desk to assist
in the incident management process. A known error should be
raised as soon as diagnosis is complete particularly where a work
around is found, or whenever it is useful to do so

Resolution

Once a root cause has been determined and a solution to remove


it has been developed it should be applied

– RFCs should be raised and authorized before deployment of the


resolution

– There may be problems for which a business case for the


resolution may not be justified (low impact and high cost)
• In this case the problem will remain open and using the work around
becomes the approach for incident recurrence
Review and Closure

Problems are formally closed when a final resolution has been


applied

A review of major problems should be conducted to learn any


lessons for the future
– What was done correctly?
– What was done wrong?
– What could have been done better in the future?
– How can we prevent recurrence?
– Are there any third party responsibilities or follow up actions?

Knowledge gained from this review should be incorporated into


the service review meeting with the business customer

Inputs to Problem Management

Incident records that have triggeredthe


problem management process

Incident reports and histories that will be used


to support proactive problem management

Information about CIs and their services

Inputs to Problem Management

Communication and feedback about incidents


and their symptoms

Communication and feedback about RFCs and


releases that have been implemented or
planned for implementation

Communication of events that were triggered


from event management
Inputs to Problem Management

Operational and service level objectives

Customer feedback on success of problem


resolution activities and overall quality of
problem management activities

Agreed criteria for prioritization and escalation


of problems

Output of risk management and risk


assessment activities

Outputs of Problem Management

Resolved problems and actions taken to


achieve their resolution

Updated problem management records with


accurate problem detail and history

RFCs to remove infrastructure errors

Outputs of Problem Management

Work-arounds for incidents

Known error records

Problem management reports

Output and improvement recommendations


from major problem review activity
Interfaces to Other Processes

Service Strategy
– Financial management for IT services
Service Design
– Availability management
– Capacity management
– IT Service continuity management
– Service level management
Service Transition
– Change management
– Service asset and conjuration management
– Release and deployment management
– Knowledge management
Continual Service Improvement
– The seven step improvement process

Problem Management CSFs and KPIs

Minimize the impact to the business of incidents that cannot be prevented

Number of known errors added to the KEDB

Percentage of accuracy of the KEDB

Percentage of incidents closed by the service desk


without reference to other levels of support

Average incident resolution time for those incidents


linked to problem records

Problem Management CSFs and KPIs

Maintain quality of IT services through elimination of recurring incidents

Total number of problems (as a control measure)

Size of current problem backlog for each IT service

Number of repeat incidents for each IT service


Problem Management CSFs and KPIs

Provide overall quality and professionalism of problem handling activities


to maintain business confidence in IT capabilities
The number of major problems (opened closed and backlogged)
Percentage of major problem reviews successfully completed
and on time
Number and percentage of problems incorrectly assigned
Number and percentage of problems incorrectly categorized
Backlog of outstanding problems and trends

Number and percentage of problems that exceeded target


resolution time
Percentage of problems resolved within SLA targets
Average cost per problem

Challenges

An effective incident management process must be in place

The skills and capabilities for problem resolution staff to identify


true root cause

The ability to relate incidents to problems

The ability to integrate problem management activities with the


CMS to determine the relationships between CIs and to refer to
the history of CIs when performing problem support activities

Challenges

Problem management is able to use all knowledge and service


asset and configuration management resourcesavailable

Ongoing training of technical staff of the technical aspects of their


job as well as business implications of the services they support

Ability to have good working relationships between first, second,


and third, line staff

Business impacted is well understood


Risks

Being inundated with problems that cannot be handled within


defined timescales

Problems being bogged down because of inadequate support


tools

Lack of adequate or timely information sources

Problem staff not adequately trained to investigate problems, find


their root cause, or identify appropriate actions to remove errors

Mismatches in objectives or actions because of poorly aligned or


non-existent OLAs and/or UCs

What We Covered

  Purpose and Objective


 Scope of Problem Management
  Value to the Business
  Principles and Basic Concepts
 Techniques
 Activities of Problem Management
  Inputs and Outputs
  Interfaces to Other Processes
  Critical Success Factors, Challenges, and Risks

Вам также может понравиться