Gestión de Problemas

Problem Management
In This Lesson
 Purpose and Objective

 Scope of Problem Management
 Value to the Business
 Principles and Basic Concepts
 Techniques
 Activities of Problem Management
 Inputs and Outputs
 Interfaces to Other Processes
 Critical Success Factors, Challenges, and Risks
Definition of a Problem
Problem
The underlying cause of one or
more incidents
Definition of Problem Management
Problem management is the process responsible for managing the

lifecycle of problems
– Problem management seeks to trace the underlying root cause of
problems and minimizing the impact of incidents due to errors in
the underlying infrastructure
Purpose of Problem Management
Trace the underlying root cause

Purpose
Document known errors and

communicate them
Initiate actions to improve or

correct the situation
Objective of Problem Management
Prevent problems and their

resultant incidents
Objective
Eliminate recurring incidents
Minimize the impact of incidents

that can not be prevented
Scope of Problem Management
All activities required to diagnose root cause of incidents

– Diagnosis
– Documentation of known errors and work-arounds
– Resolution through appropriate control mechanisms
• Change management
• Release and Deployment Management
Problem management will also maintain information about

problems to reduce the number and impact of incidents over time
Reactive vs. Proactive Problem Management
Problem management has both reactive and proactive

components
Proactive
Reactive
concerned with solving
concerned with solving problems and known errors
problems in response to one before further incidents
or more incidents related to them can occur
again
Proactive Problem Management
Trending of incident data
Major incident reviews
Review of operational logs and maintenance records identifying

patterns
Periodic review of event logs
Brainstorming sessions
Using check sheets to proactively collect data on service or

operational quality issues to detect underlying problems
A close relationship exists between problem management and

continual service improvement
Value to the Business
Higher
Availability of services by
Expenditures on work-
reducing the duration
arounds or fixes that do
and number of incidents
not work
Productivity of IT staff
Cost of effort in fire-
by reducing unplanned
fighting or resolving
labor costs caused by
recurring incidents
incidents
Reduced
Principles and Basic Concepts
Problem models
– Many problems are unique and require individual handling
– It is also conceivable that incidents may recur due to underlying
problems that exist
– Creating a known error record in the known error database
(KEDB) will ensure quicker diagnosis
– A problem model for these types of problems may be useful
Incidents versus problems

– Incidents and problems are closely related but they are distinctly
different
– Incidents never become problems

Rules for invoking the problem process in response to an incident

can vary, but may include:
– Incident management cannot match an incident to a problem or
known error
– Trend analysis reveals a problem might exist

– A major incident has occurred
– Other IT functions determine a problem exists
– The service desk has resolved an incident but has not determined
the underlying cause
– Notification from a supplier that a problem exists
Techniques
Chronological analysis
Pain value analysis

Kepner and Tregoe analysis
Brainstorming
5 why analysis
Fault isolation
Affinity mapping
Hypothesis testing
Technical observation post
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Complex problems where a
5 why analysis
sequence of events needs to
be assembled to determine Fault isolation
exactly what happened
Affinity mapping
Hypothesis testing

Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Uncertainty over which 5 why analysis

problems should be
addressed first Fault isolation
Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Uncertainty whether a 5 why analysis

presented root cause is truly
the root cause Fault isolation
Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Intermittent problems that
appear to come and go and 5 why analysis
cannot be recreated or
repeated in a test Fault isolation
environment Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Uncertainty over where to
5 why analysis
start for problems that
appear to have multiple Fault isolation
causes
Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Struggle to identify the 5 why analysis

exact point of failure for a
problem Fault isolation
Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Techniques
Pain value analysis

Brainstorming
Uncertain where to start 5 why analysis
when trying to find root
cause Fault isolation
Affinity mapping
Hypothesis testing
Ishikawa diagrams
Pareto analysis
Activities of Problem Management
Work-arounds
Detection and known Resolution
errors
Investigation Review and

Logging
and diagnosis Closure
Major problem
Categorizing Prioritizing
review
Problem Management
Process
Detection
Logging
Categorization
Prioritization
Investigation and diagnosis
Work-arounds and
known errors
Resolution
Review and Closure
Major Problem Review
Detection
Reactive Proactive
Suspicion of a root cause Analysis ofincidents

of one or more incidents
at the service desk Trending of historical incidents
Analysis of incident data by a technical Activities taken to improve the

support group quality of a service resultsin the
need to raise a problem record to
Automated detection of an identify further improvement actions
infrastructure fault
Notification from a supplier

Problem Logging and Categorization
User details
Service details
Equipment details _
Date/time initially logged _
Priority and categorization details
Incident description _
Incident record numbers or other cross reference _
Details of all diagnostic or attempted recovery actions

taken
Problem Logging and Categorization
Problems should be categorized using the same coding system as

incidents so that the true nature of the problem can be traced and
meaningful management information is available
Prioritization
Problems are prioritized using the same reasons as incidents

– Priority is determined by impact and urgency
Things to consider
– Can the system be recovered or does it need to be replaced?
– How much will it cost?
– How many people and what skills are required?
– How long will it take to fix the problem?
– How extensive is the problem? (i.e. how many CIs are affected)
Investigation and Diagnosis
Investigation requires the use of the CMS and KEDB

Problem solving techniques
Chronologicalanalysis
Pain value analysis

Brainstorming
5 why analysis
Fault isolation
Affinity mapping
Hypothesistesting
Technical observationpost
Ishikawa diagrams
Pareto analysis
Raising Work-arounds and Known Errors
Work-arounds are a temporary way of restoring service or

minimizing the business impact
– Example: restarting a service on a server that has failed
Known errors are problems with documented root cause

– Example: A defect in the configuration of an application causes
user password lockouts
Work-arounds and known errors should be documented and

communicated to support personnel and the service desk to assist
in the incident management process. A known error should be
raised as soon as diagnosis is complete particularly where a work
around is found, or whenever it is useful to do so
Resolution
Once a root cause has been determined and a solution to remove

it has been developed it should be applied
– RFCs should be raised and authorized before deployment of the

resolution
– There may be problems for which a business case for the

resolution may not be justified (low impact and high cost)
• In this case the problem will remain open and using the work around
becomes the approach for incident recurrence
Review and Closure
Problems are formally closed when a final resolution has been

applied
A review of major problems should be conducted to learn any

lessons for the future
– What was done correctly?
– What was done wrong?
– What could have been done better in the future?
– How can we prevent recurrence?
– Are there any third party responsibilities or follow up actions?
Knowledge gained from this review should be incorporated into

the service review meeting with the business customer
Inputs to Problem Management
Incident records that have triggeredthe

problem management process
Incident reports and histories that will be used

to support proactive problem management
Information about CIs and their services
Communication and feedback about incidents

and their symptoms
Communication and feedback about RFCs and

releases that have been implemented or
planned for implementation
Communication of events that were triggered

from event management
Operational and service level objectives
Customer feedback on success of problem

resolution activities and overall quality of
problem management activities
Agreed criteria for prioritization and escalation

of problems
Output of risk management and risk

assessment activities
Outputs of Problem Management
Resolved problems and actions taken to

achieve their resolution
Updated problem management records with

accurate problem detail and history
RFCs to remove infrastructure errors
Outputs of Problem Management
Work-arounds for incidents
Known error records
Problem management reports
Output and improvement recommendations

from major problem review activity
Interfaces to Other Processes
Service Strategy
– Financial management for IT services
Service Design
– Availability management
– Capacity management
– IT Service continuity management
– Service level management
Service Transition
– Change management
– Service asset and conjuration management
– Release and deployment management
– Knowledge management
Continual Service Improvement
– The seven step improvement process
Problem Management CSFs and KPIs
Minimize the impact to the business of incidents that cannot be prevented
Number of known errors added to the KEDB
Percentage of accuracy of the KEDB
Percentage of incidents closed by the service desk

without reference to other levels of support
Average incident resolution time for those incidents

linked to problem records
Maintain quality of IT services through elimination of recurring incidents
Total number of problems (as a control measure)
Size of current problem backlog for each IT service
Number of repeat incidents for each IT service

Provide overall quality and professionalism of problem handling activities

to maintain business confidence in IT capabilities
The number of major problems (opened closed and backlogged)
Percentage of major problem reviews successfully completed
and on time
Number and percentage of problems incorrectly assigned
Number and percentage of problems incorrectly categorized
Backlog of outstanding problems and trends
Number and percentage of problems that exceeded target

resolution time
Percentage of problems resolved within SLA targets
Average cost per problem
Challenges
An effective incident management process must be in place
The skills and capabilities for problem resolution staff to identify

true root cause
The ability to relate incidents to problems
The ability to integrate problem management activities with the

CMS to determine the relationships between CIs and to refer to
the history of CIs when performing problem support activities
Challenges
Problem management is able to use all knowledge and service

asset and configuration management resourcesavailable
Ongoing training of technical staff of the technical aspects of their

job as well as business implications of the services they support
Ability to have good working relationships between first, second,

and third, line staff
Business impacted is well understood

Risks
Being inundated with problems that cannot be handled within

defined timescales
Problems being bogged down because of inadequate support

tools
Lack of adequate or timely information sources
Problem staff not adequately trained to investigate problems, find

their root cause, or identify appropriate actions to remove errors
Mismatches in objectives or actions because of poorly aligned or

non-existent OLAs and/or UCs
What We Covered
  Purpose and Objective

 Scope of Problem Management
  Value to the Business
  Principles and Basic Concepts
 Techniques
 Activities of Problem Management
  Inputs and Outputs
  Interfaces to Other Processes
  Critical Success Factors, Challenges, and Risks

Gestión de Problemas

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Gestión de Problemas

Загружено:

Авторское право:

Доступные форматы

Problem Management

 Purpose and Objective

Problem management is the process responsible for managing the

Purpose of Problem Management

Trace the underlying root cause

Document known errors and

Initiate actions to improve or

Objective of Problem Management

Prevent problems and their

Eliminate recurring incidents

Minimize the impact of incidents

All activities required to diagnose root cause of incidents

Problem management will also maintain information about

Reactive vs. Proactive Problem Management

Problem management has both reactive and proactive

Proactive Problem Management

Trending of incident data

Major incident reviews

Review of operational logs and maintenance records identifying

Periodic review of event logs

Using check sheets to proactively collect data on service or

A close relationship exists between problem management and

Principles and Basic Concepts

– A problem model for these types of problems may be useful

Principles and Basic Concepts

Incidents versus problems

– Incidents never become problems

Rules for invoking the problem process in response to an incident

– Trend analysis reveals a problem might exist

– Notification from a supplier that a problem exists

Pain value analysis

Pain value analysis

Technical observation post

Pain value analysis

Uncertainty over which 5 why analysis

Technical observation post

Pain value analysis

Uncertainty whether a 5 why analysis

Pain value analysis

Kepner and Tregoe analysis

Technical observation post

Pain value analysis

Technical observation post

Pain value analysis

Struggle to identify the 5 why analysis

Technical observation post

Pain value analysis

Technical observation post

Investigation Review and

Investigation and diagnosis

Review and Closure

Major Problem Review

Suspicion of a root cause Analysis ofincidents

Analysis of incident data by a technical Activities taken to improve the

Notification from a supplier

Date/time initially logged _

Priority and categorization details

Incident record numbers or other cross reference _

Details of all diagnostic or attempted recovery actions

Problem Logging and Categorization

Problems should be categorized using the same coding system as

Problems are prioritized using the same reasons as incidents

Investigation requires the use of the CMS and KEDB