Вы находитесь на странице: 1из 51

The Art of Problem Solving as a DBA

James F. Koopmann
Founder & President

jkoopmann@pinehorse.com
www.pinehorse.com

www.pinehorse.com www.dbcorral.com
James F. Koopmann
Where to Find Me on the NET
 N-years of IT/core RDBMS experience  Speaker
 Oracle (V6-10g) 8i & 9i OCP
 Contributing Author
technical articles a month 2 / Month
Forum expert / moderator Daily

Database centric vendor reviews As Needed


Columnist – New Column Coming in March 1 / Month

Database general interest issues Bimonthly


Blogger (SQL Script-a-Day) Daily (in March/April)

Blogger (An Expert’s Guide to Database Solutions) 3-5 / Week

Various Ghost Writing ???

Various Technical Publications 1-2 / Year


================
10 / Month

www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA
Outline

1. Communication
2. DBA
3. DBA Support / Problem Flow
1. Finding
2. Recording
3. Escalation
4. Solving
5. Deployment
6. Closure
4. End User Component
5. Alignment with Business Objectives
www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA
Reasoning – Why are Your Here

 I’m tired of getting beat up


 There is a difference between what you
have and what you want
 You have been thrown into the fire
• Do you have the skills
• Do you need the skills
• What skills do you need
 Do I need to be an expert
 I’m a newbie, can I solve problems
 What does my company want me to do
www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA
DBA .vs. The corporate food-chain
 Past / Current
• Watch your system
– CPU levels / Disk capacity / I/O levels / SQL Executions / …
• We have learned ALL TO WELL
• Nut-n-Bolt / tinkering individuals
• Prefer Fight OVER Solution
• We hold the keys
• Black box approach
 Current / Future
• Automation of administrative tasks
• Toolsets produce ?Smarter? / Faster DBAs
• Databases are less threatening
• End Users want more
• DBAs get pushed closer to End User community
• Corporate Attitude
Times, they are a changing
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Communication

Honey, I aint’ drink’n no more

, I aint’ drink’n NO LESS

He who holds his tongue is wise


www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Communication
Stress Levels

Effectiveness
Danger lurks around every corner
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Entities Involved
DBA

Customer

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Entities Involved
DBA

DBA Customer
1. Gate
Solution
Keeper
Provider 1. End-User
2. Developer 2. Developer
3. Report Writer 3. Manager
4. System Analyst 4. C-Level
5. Gofer 5. Divisional
6. Departmental
7. Machine
8. Database
Customer

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Impose simple task management (level 0)

Issue DBA
Solution

Basic information flow


• Issues come in
• Solutions go out
Issue
1
• Verification Occurs
Track • Issues get closed
. Issue
. Solution

Issue
Verification
Customer

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Impose simple task management (level 1)
Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
Finding Problems
Non-Customer-Facing DBA
 Your actions are reactive, not
proactive, in nature
 The events that trigger an
investigation are often very
specific to an incident, narrow in
scope, and the solutions typically
do not take the full health of a
database into consideration.
 Most of your time is spent in
problem detection, not problem
solving
 Because of the time wasted in
detecting the problem, using this
method inherently wastes money
 Customers / users drive the work
flow of the database
administrators
 Database administration group is
seen as ineffective

www.pinehorse.com www.dbcorral.com
Finding Problems
Customer-Facing DBA
 User complaints are circumvented and
drastically reduced
 Time searching for solutions is reduced
 Database administration group is seen
as being very effective, and as trusted
custodians of corporate information
 Allows database administrators to work
on more strategic issues that do effect
the bottom line and direction of the
company
 You finally have confirmation that it is
possible to find problems before they
happen

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
Recording Problems
Why Record EVERYTHING

1. Do you know what you should be doing


2. Do you know what you did yesterday
3. Can someone pick up where you left off
4. How many have “searched” for a DBA to-do list
5.Bookkeeping will dictate what you should be doing
1. Simple log book
2. Document the issues
6.Macro .vs. Micro principles
1. What do you do when you find bad SQL
2. Do you have an internal process to handle the fix
3. Are you hampered by development approval
4. Are you a gatekeeper
5. “black box” .vs. “open the lid”
www.pinehorse.com www.dbcorral.com
Recording Problems

www.pinehorse.com www.dbcorral.com
Recording Problems
Gage / Core Area
The amount of time the database is available to work on requests. Uptime,
Availability downtime, or mean time to failure (MTTF).

Recoverability The ability to recover the database from failure.

Reliability The probability of errors or the mean time between errors.

Security The ability of your database to handle attacks.

The tracking of your database system to enable modifications to meet future


Planning
strategic business needs.

www.pinehorse.com www.dbcorral.com
Recording Problems
Dial / Focus Area
Are the physical resources that your database consumes adequate to support the
operations of your database and users? Are the systems on which your databases
Layout
run configured in the best possible way to meet or exceed best practices and
standards?
What is the time interval between a request for work and the database’s response
Response
to that request? Does the system deliver results that meet end user needs?
Schema Are the logical structures defined that allow for access to data?
How much disk space is required by your databases? While this could be part of
Storage the physical layout focus area, it is such a large issue in itself that I categorize it
as a separate focus area.
What is your database’s productivity? This is measured in the rate (requests per
Throughput
unit of time) at which requests can be performed by your database.
How much time does the database use to work on a request? This is the ratio of
busy time and total elapsed time over a given period. The period during which a
resource is not being used is called the “idle time”; the resource with the highest
Utilization
utilization is called the “bottleneck.” Performance optimizations at this bottleneck
offer the highest payoff, so finding the utilization of various resources inside the
system is an important part of performance evaluation.

What are the requests made of the database, including queries, DDL, and
Workload
administrative tasks?

www.pinehorse.com www.dbcorral.com
Recording Problems
Audience
You and anyone in the database administrative staff responsible for maintaining
DBA
the corporate databases.
Management Management to whom you directly report.

Executive Management levels above your direct management.

Importance
That which is necessary to maintain a minimum level of performance.
Critical
Required That which is necessary to improve the system.
Optional That which is necessary to achieve Nirvana.
Note Informational only.

Database
Vendor Oracle / SQLServer / DB2 / MySQL / …

Version What version including patch level


www.pinehorse.com www.dbcorral.com
Recording Problems
Problem Described
 Problem
Define the nature of the problem. If you can’t make a clear
statement about what the problem is, you should refocus your
attention on something else.
 Why Important
Every problem should have with it an explanation of why the
solution to a problem is important.
 Recommendation
What should you do based on findings.
 Data Needed
You should be able to provide a road map of the source for the
data you are using for analysis. This can be as easy as showing
the SQL statements.
 Analysis
In this methodology, you base a recommendation for action on
insights you’ve derived from the data you have extracted.
(Typically this is pseudo code to explain the logic on which you
support your recommendation.)

www.pinehorse.com www.dbcorral.com
Recording Problems
Urgency
Now The problem should be solved when discovered.

(1-5)days Because of your analysis and recommendation, the problem does not need
(1-2)weeks to be solved immediately, but should be handled within a given time frame.

Leisure There are no negative consequences for the database and the company if a
problem is never solved. Such issues serve more to create an environment
that functions to meet standards but will not fail if we don’t implement
them.

www.pinehorse.com www.dbcorral.com
Recording Problems

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

1. State problem
• Knowing / Understanding the problem is half the issue
• You can not solve anything until you know the problem
• Classify the problem (report issue)
• Put issue down in writing
• Determine your objectives around the solution
• Be careful not to start solving the problem
• Make sure it is a root problem
• Be careful of ‘others’ solutions

If you can't find this problem, find an easy one.


www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

1. State problem
Backups
It has been conveyed to me and to the database group that
there is still some uncertainty in the viability of our
database backups and the ability to recover from failure.
It is therefore my intent to provide the following and
raise the level of confidence in this area.

If you can't find this problem, find an easy one.


www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

2. Research problem
 What do I need to / already know
• What data is involved and relation to the unknown
• What are the conditions / variables involved
• Research may expose other problems
– Sub, related, specific, or general problem
• Are there any similar problems already solved
• When researching you will find solutions to problems
not on your current issue list, document them
• Should the problem be re-stated
 The Art of asking effective questions
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

2. Research the problem


Backups
It has been conveyed to me and to the database group that
there is still some uncertainty in the viability of our
database backups and the ability to recover from failure.
It is therefore my intent to provide the following and
raise the level of confidence in this area.
1. Review and document the current backup mechanisms
2. Provide a plan and procedures for testing backups
3. Determine loop holes in current backups by testing recovery
under all known situations
4. Provide a level of security and accountability to other functional
groups that they can rely upon our backups to be viable.

If you can't find this problem, find an easy one.


www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

3. Form solution(s)
• Problem statement and research will drive you to a
solution
• Do you have an idea of what the solution already
might be
• The simplest solution is often the best
• Are there checks and balances that you can impose on
your solutions
• Don’t forget your record of the problem (keep
organized)
• If you can not find a solution, go back to Step 2 and
research the problem more to state a new problem
Finding the proper people is half the equation to solving a problem
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

3. Form solution(s)
The Oracle manuals state that the database group has
implemented a set of procedures for creating an RMAN
backup that does not allow for complete recovery
under certain conditions.
1. Implement RMAN procedures in accordance with the
Oracle manuals.

Finding the proper people is half the equation to solving a problem


www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

4. Test solution(s)
 You actually have to do something now
 Try and see if one of your solutions works
 Keep sight of the solution strategy developed
 Carry out a clear plan
 Allow for each step to be analyzed for correctness
 Can you fall back to prior steps
 Make sure you can back / prove your findings
 Analyze the answers
 Evaluate the results
 Document any sub problems
 Keep it Simple and Straightforward
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
4. Test solution(s)
> connect target /
connected to target database (not started)
RMAN> startup nomount
pfile='/u01/app/oracle/PRD9I/product/9.2.0/dbs/initPRD9I.ora';
Oracle instance started
Total System Global Area 504452544 bytes
Fixed Size 456128 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 679936 bytes
RMAN> RUN
2> {
3> RESTORE CONTROLFILE FROM '/u01/backup/PRD9I/rman/lev0_open_cf_PRD9I_114_1';
4> }
Starting restore at 28-MAY-04
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=16 devtype=DISK
channel ORA_DISK_1: restoring controlfile
channel ORA_DISK_1: restore complete
replicating controlfile
input filename=/u01/oradata/PRD9I/control01.ctl
output filename=/u01/oradata/PRD9I/control02.ctl
output filename=/u01/oradata/PRD9I/control03.ctl
Finished restore at 28-MAY-04

www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps

5. Implement Solution(s)
• Practice before you leap
• Stay focused on YOUR solution
• Follow your plan EXACTLY
• Check / Validate each step
• Did the step clearly pass for correctness
• Now prove it to me
• Allow for a fall-back routine
• Sometimes looking forward is better than falling back
• The art of implementation

www.pinehorse.com www.dbcorral.com
Problem Solving
Six Steps

6. Draw conclusions
 Time to look back at what you have done
 What data do you have after the test
 Evaluate the results / Check your answers
 Is the answer realistic and what you expected
 Has the problem gone away
 Does anyone experience benefits
 Does the data tell you anything
1.Is your hypothesis Correct – Good – STOP!
2.Is your test invalid – Different Test?
3.Is your hypothesis Wrong – CHANGE
www.pinehorse.com www.dbcorral.com
Problem Solving
Six Steps

1. State problem
2. Research problem
3. Form solution(s)
4. Test solution(s)
5. Implement solution(s)
6. Draw conclusions

www.pinehorse.com www.dbcorral.com
Problem Solving
Best Tools for Effective Problem Solving

• Respect for the needs of the other


• Being open to new data and ideas
• Active listening
• Asking effective questions
• Clear and honest communication
• Persistence
• Firmness in goal of success

www.pinehorse.com www.dbcorral.com
Problem Solving
Steps to Avoid

1. Creating problems with known solutions


2. Continually stating the problem
3. Hung up on research
4. My configuration = Your configuration
5. Next Problem Please

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
Solution Deployment

 Get the proper people together


 Get buy-in from all parties
 Map out success factors
• This will Benefit
• How Much
• A reduction in
 Maintain SLAs
 Timing is Everything

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
Problem Closure

 Validate satisfaction levels


 Schedule training if needed
 Determine monitoring requirements
 Schedule follow-up timeline

www.pinehorse.com www.dbcorral.com
End User Component
Database Performance

How do you gauge database performance

Engine .vs. Customer

www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow

Issue Solution
DBA
. .
Research

Problem
Severity Expert
Issue Knowledge

1.1 1.2 1.3 1.4 1.5 1.6


Problem Problem Problem Problem Solution Problem
Finding Recorded Escalation Solving Deployment Closure

Notification Solution

Severity

. .
Customer
Issue Verification

www.pinehorse.com www.dbcorral.com
End User Component
Customer

database end user (n)


Any individual, not directly responsible for
the maintenance of the database, that uses
computing resources, applications, or
database internals such as utilities and
internal code for the purpose of extracting
and viewing data.

www.pinehorse.com www.dbcorral.com
End User Component
What End Users Want / Need
Poor Response Time = Assumed Corrupted Data
= Reduced Use of System
= ill-trusted System
Keep Communication Channels Open
1. Do your end users know that you are monitoring application
performance as it directly impacts their level of satisfaction?
2. Do your end users know they will be notified when you notice response
levels are below expected levels, even before the user community
notices this dip?
3. Do your end users have expectations about how quickly performance
issues should be resolved and they should be notified about resolution?
4. Do your end users have a process or means to tell you when
performance levels are not up to their expectation levels?
5. Do your end users understand what you have to do each and every day
to maintain the service levels they are currently experiencing, and the
issues you encounter when solving their response time problems?

www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Definition
Type

Instantaneous When a user hits the enter key, clicks a button, or submits a request, the screen is
immediately painted with information. The response is so automatic that the end user does
not even realize that work has been done in the background, down the pipe, and within a
database.

Questionable While waiting for a request to process, a user may grab a sip of coffee or jot something down
on a notepad. The system response to their request isn’t instantaneous, but before they finish
their interim task, the database has done the work, passed back the information, and
presented it on the screen. What’s important to remember is that while an end user may not
be as productive when the response is not instantaneous, she will not typically leave her
terminal and focus on other tasks. Thus, there is still continuity in the user’s ability to do their
work.

Deteriorating An acceptable level of response that is slowly worsening. Users may or may not be aware of
the deterioration because it could be increasing at very small intervals for any amount of time
over days or weeks.

Delayed / The user is experiencing delayed response or no response at all, and they feel they are
Unresponsive trapped at the computer terminal, waiting for its response. They are torn between their ability
to complete the task on the computer and quitting that task to work on something else. It is
hard for them to get their work done, and they wonder if it would be easier to do the work
without the computer.
www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Challenges / Traps
Type
Instantaneous  Complacency: Don’t become complacent about well running applications.
 Making performance “improvements”. When we change things, we introduce more
complexity (and more that could go wrong).
Questionable  Assuming a user always wants and needs instantaneous response time. Users may
actually appreciate response lags and are more productive.
 Assuming that if the user is working, all is well, and the slowdown isn’t important.
 Providing users with instantaneous response levels can be simple, or even possible. I
have often found that questionable response levels are “built” into applications and are
a function of logic.
Deteriorating  Conveying that the slowdown is is just a momentary lapse in database performance,
caused by unknown factors. Users often prefer to hear that response time will remain
slower than before rather than hear that you don’t know or understand why
performance has been impacted.
 Assuming that if your database is working just fine, everything else is fine. Remember
that users will always assume that the database is the cause of performance
degradation since that is where their data lives. Thus, whether or not it is technically
your responsibility, you should still be involved in solving the response issue.
Delayed /  You don’t know how to solve the problem, so you just do nothing. You should do
Unresponsive something, even if you don’t know the ultimate solution. Your users already can’t use
the system; they will attempts at resolving their issues.

www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Opportunities
Type
Instantaneous  Highlighting the great performance users experience with their applications is a great
way to open communications and relationships. There is nothing better for convincing
users you are committed to excellent response time levels than an application that is
already working well.
 When application or query response time is excellent, document everything you can
about how this application or query works within the database. You may need this
information when response times deteriorate.
 Determine what “instantaneous” really means. You may be able to un-tune, get by, and
give resources to failing applications.
Questionable  Caution: Know your user and application. Users may not be highly motivated to
complete their work and move on to more productive items.
 If you can move them on to instantaneous response the bond you will make through
your commitment will leave a lasting impression.
Deteriorating  If you can notify users that they are experiencing performance deterioration before they
begin to complain about it, you are more likely to be seen as someone as proactive,
committed to customer satisfaction.
Delayed /  Here is your chance to be a hero. Give your end users a time frame when you will be
Unresponsive able to solve the problem, provide status updates, and get to work.
 Schedule improvement around when users will need the system next.
 Get a clear indication of what the acceptable levels of response are for the job or task
your end user is doing before attempting to resolve the issue.
www.pinehorse.com www.dbcorral.com
Alignment with Business Objectives
Do More with LESS

 Understand your support process


 Is it always about money
• Labor & Downtime Costs
• Re-Acquisition of dissatisfied customer
• Customer satisfaction
 Problem Life Cycle
• Over 50% of time is spent recreating problem
• Root cause analysis represents 80% of time
• Solution deployment requires 20% of time
 Communication with end users wastes time / money
 Reduce occurrences of problems
 Proper prioritization
www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA

James F. Koopmann
Founder & President

jkoopmann@pinehorse.com
www.pinehorse.com

www.pinehorse.com www.dbcorral.com

Вам также может понравиться