Академический Документы
Профессиональный Документы
Культура Документы
James F. Koopmann
Founder & President
jkoopmann@pinehorse.com
www.pinehorse.com
www.pinehorse.com www.dbcorral.com
James F. Koopmann
Where to Find Me on the NET
N-years of IT/core RDBMS experience Speaker
Oracle (V6-10g) 8i & 9i OCP
Contributing Author
technical articles a month 2 / Month
Forum expert / moderator Daily
www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA
Outline
1. Communication
2. DBA
3. DBA Support / Problem Flow
1. Finding
2. Recording
3. Escalation
4. Solving
5. Deployment
6. Closure
4. End User Component
5. Alignment with Business Objectives
www.pinehorse.com www.dbcorral.com
The Art of Problem Solving as a DBA
Reasoning – Why are Your Here
Effectiveness
Danger lurks around every corner
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Entities Involved
DBA
Customer
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Entities Involved
DBA
DBA Customer
1. Gate
Solution
Keeper
Provider 1. End-User
2. Developer 2. Developer
3. Report Writer 3. Manager
4. System Analyst 4. C-Level
5. Gofer 5. Divisional
6. Departmental
7. Machine
8. Database
Customer
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Impose simple task management (level 0)
Issue DBA
Solution
Issue
Verification
Customer
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Impose simple task management (level 1)
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
Finding Problems
Non-Customer-Facing DBA
Your actions are reactive, not
proactive, in nature
The events that trigger an
investigation are often very
specific to an incident, narrow in
scope, and the solutions typically
do not take the full health of a
database into consideration.
Most of your time is spent in
problem detection, not problem
solving
Because of the time wasted in
detecting the problem, using this
method inherently wastes money
Customers / users drive the work
flow of the database
administrators
Database administration group is
seen as ineffective
www.pinehorse.com www.dbcorral.com
Finding Problems
Customer-Facing DBA
User complaints are circumvented and
drastically reduced
Time searching for solutions is reduced
Database administration group is seen
as being very effective, and as trusted
custodians of corporate information
Allows database administrators to work
on more strategic issues that do effect
the bottom line and direction of the
company
You finally have confirmation that it is
possible to find problems before they
happen
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
Recording Problems
Why Record EVERYTHING
www.pinehorse.com www.dbcorral.com
Recording Problems
Gage / Core Area
The amount of time the database is available to work on requests. Uptime,
Availability downtime, or mean time to failure (MTTF).
www.pinehorse.com www.dbcorral.com
Recording Problems
Dial / Focus Area
Are the physical resources that your database consumes adequate to support the
operations of your database and users? Are the systems on which your databases
Layout
run configured in the best possible way to meet or exceed best practices and
standards?
What is the time interval between a request for work and the database’s response
Response
to that request? Does the system deliver results that meet end user needs?
Schema Are the logical structures defined that allow for access to data?
How much disk space is required by your databases? While this could be part of
Storage the physical layout focus area, it is such a large issue in itself that I categorize it
as a separate focus area.
What is your database’s productivity? This is measured in the rate (requests per
Throughput
unit of time) at which requests can be performed by your database.
How much time does the database use to work on a request? This is the ratio of
busy time and total elapsed time over a given period. The period during which a
resource is not being used is called the “idle time”; the resource with the highest
Utilization
utilization is called the “bottleneck.” Performance optimizations at this bottleneck
offer the highest payoff, so finding the utilization of various resources inside the
system is an important part of performance evaluation.
What are the requests made of the database, including queries, DDL, and
Workload
administrative tasks?
www.pinehorse.com www.dbcorral.com
Recording Problems
Audience
You and anyone in the database administrative staff responsible for maintaining
DBA
the corporate databases.
Management Management to whom you directly report.
Importance
That which is necessary to maintain a minimum level of performance.
Critical
Required That which is necessary to improve the system.
Optional That which is necessary to achieve Nirvana.
Note Informational only.
Database
Vendor Oracle / SQLServer / DB2 / MySQL / …
www.pinehorse.com www.dbcorral.com
Recording Problems
Urgency
Now The problem should be solved when discovered.
(1-5)days Because of your analysis and recommendation, the problem does not need
(1-2)weeks to be solved immediately, but should be handled within a given time frame.
Leisure There are no negative consequences for the database and the company if a
problem is never solved. Such issues serve more to create an environment
that functions to meet standards but will not fail if we don’t implement
them.
www.pinehorse.com www.dbcorral.com
Recording Problems
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
1. State problem
• Knowing / Understanding the problem is half the issue
• You can not solve anything until you know the problem
• Classify the problem (report issue)
• Put issue down in writing
• Determine your objectives around the solution
• Be careful not to start solving the problem
• Make sure it is a root problem
• Be careful of ‘others’ solutions
1. State problem
Backups
It has been conveyed to me and to the database group that
there is still some uncertainty in the viability of our
database backups and the ability to recover from failure.
It is therefore my intent to provide the following and
raise the level of confidence in this area.
2. Research problem
What do I need to / already know
• What data is involved and relation to the unknown
• What are the conditions / variables involved
• Research may expose other problems
– Sub, related, specific, or general problem
• Are there any similar problems already solved
• When researching you will find solutions to problems
not on your current issue list, document them
• Should the problem be re-stated
The Art of asking effective questions
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
3. Form solution(s)
• Problem statement and research will drive you to a
solution
• Do you have an idea of what the solution already
might be
• The simplest solution is often the best
• Are there checks and balances that you can impose on
your solutions
• Don’t forget your record of the problem (keep
organized)
• If you can not find a solution, go back to Step 2 and
research the problem more to state a new problem
Finding the proper people is half the equation to solving a problem
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
3. Form solution(s)
The Oracle manuals state that the database group has
implemented a set of procedures for creating an RMAN
backup that does not allow for complete recovery
under certain conditions.
1. Implement RMAN procedures in accordance with the
Oracle manuals.
4. Test solution(s)
You actually have to do something now
Try and see if one of your solutions works
Keep sight of the solution strategy developed
Carry out a clear plan
Allow for each step to be analyzed for correctness
Can you fall back to prior steps
Make sure you can back / prove your findings
Analyze the answers
Evaluate the results
Document any sub problems
Keep it Simple and Straightforward
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
4. Test solution(s)
> connect target /
connected to target database (not started)
RMAN> startup nomount
pfile='/u01/app/oracle/PRD9I/product/9.2.0/dbs/initPRD9I.ora';
Oracle instance started
Total System Global Area 504452544 bytes
Fixed Size 456128 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 679936 bytes
RMAN> RUN
2> {
3> RESTORE CONTROLFILE FROM '/u01/backup/PRD9I/rman/lev0_open_cf_PRD9I_114_1';
4> }
Starting restore at 28-MAY-04
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=16 devtype=DISK
channel ORA_DISK_1: restoring controlfile
channel ORA_DISK_1: restore complete
replicating controlfile
input filename=/u01/oradata/PRD9I/control01.ctl
output filename=/u01/oradata/PRD9I/control02.ctl
output filename=/u01/oradata/PRD9I/control03.ctl
Finished restore at 28-MAY-04
www.pinehorse.com www.dbcorral.com
Problem Solving
Five Steps
5. Implement Solution(s)
• Practice before you leap
• Stay focused on YOUR solution
• Follow your plan EXACTLY
• Check / Validate each step
• Did the step clearly pass for correctness
• Now prove it to me
• Allow for a fall-back routine
• Sometimes looking forward is better than falling back
• The art of implementation
www.pinehorse.com www.dbcorral.com
Problem Solving
Six Steps
6. Draw conclusions
Time to look back at what you have done
What data do you have after the test
Evaluate the results / Check your answers
Is the answer realistic and what you expected
Has the problem gone away
Does anyone experience benefits
Does the data tell you anything
1.Is your hypothesis Correct – Good – STOP!
2.Is your test invalid – Different Test?
3.Is your hypothesis Wrong – CHANGE
www.pinehorse.com www.dbcorral.com
Problem Solving
Six Steps
1. State problem
2. Research problem
3. Form solution(s)
4. Test solution(s)
5. Implement solution(s)
6. Draw conclusions
www.pinehorse.com www.dbcorral.com
Problem Solving
Best Tools for Effective Problem Solving
www.pinehorse.com www.dbcorral.com
Problem Solving
Steps to Avoid
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
Solution Deployment
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
Problem Closure
www.pinehorse.com www.dbcorral.com
End User Component
Database Performance
www.pinehorse.com www.dbcorral.com
DBA Support / Problem Flow
Issue Solution
DBA
. .
Research
Problem
Severity Expert
Issue Knowledge
Notification Solution
Severity
. .
Customer
Issue Verification
www.pinehorse.com www.dbcorral.com
End User Component
Customer
www.pinehorse.com www.dbcorral.com
End User Component
What End Users Want / Need
Poor Response Time = Assumed Corrupted Data
= Reduced Use of System
= ill-trusted System
Keep Communication Channels Open
1. Do your end users know that you are monitoring application
performance as it directly impacts their level of satisfaction?
2. Do your end users know they will be notified when you notice response
levels are below expected levels, even before the user community
notices this dip?
3. Do your end users have expectations about how quickly performance
issues should be resolved and they should be notified about resolution?
4. Do your end users have a process or means to tell you when
performance levels are not up to their expectation levels?
5. Do your end users understand what you have to do each and every day
to maintain the service levels they are currently experiencing, and the
issues you encounter when solving their response time problems?
www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Definition
Type
Instantaneous When a user hits the enter key, clicks a button, or submits a request, the screen is
immediately painted with information. The response is so automatic that the end user does
not even realize that work has been done in the background, down the pipe, and within a
database.
Questionable While waiting for a request to process, a user may grab a sip of coffee or jot something down
on a notepad. The system response to their request isn’t instantaneous, but before they finish
their interim task, the database has done the work, passed back the information, and
presented it on the screen. What’s important to remember is that while an end user may not
be as productive when the response is not instantaneous, she will not typically leave her
terminal and focus on other tasks. Thus, there is still continuity in the user’s ability to do their
work.
Deteriorating An acceptable level of response that is slowly worsening. Users may or may not be aware of
the deterioration because it could be increasing at very small intervals for any amount of time
over days or weeks.
Delayed / The user is experiencing delayed response or no response at all, and they feel they are
Unresponsive trapped at the computer terminal, waiting for its response. They are torn between their ability
to complete the task on the computer and quitting that task to work on something else. It is
hard for them to get their work done, and they wonder if it would be easier to do the work
without the computer.
www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Challenges / Traps
Type
Instantaneous Complacency: Don’t become complacent about well running applications.
Making performance “improvements”. When we change things, we introduce more
complexity (and more that could go wrong).
Questionable Assuming a user always wants and needs instantaneous response time. Users may
actually appreciate response lags and are more productive.
Assuming that if the user is working, all is well, and the slowdown isn’t important.
Providing users with instantaneous response levels can be simple, or even possible. I
have often found that questionable response levels are “built” into applications and are
a function of logic.
Deteriorating Conveying that the slowdown is is just a momentary lapse in database performance,
caused by unknown factors. Users often prefer to hear that response time will remain
slower than before rather than hear that you don’t know or understand why
performance has been impacted.
Assuming that if your database is working just fine, everything else is fine. Remember
that users will always assume that the database is the cause of performance
degradation since that is where their data lives. Thus, whether or not it is technically
your responsibility, you should still be involved in solving the response issue.
Delayed / You don’t know how to solve the problem, so you just do nothing. You should do
Unresponsive something, even if you don’t know the ultimate solution. Your users already can’t use
the system; they will attempts at resolving their issues.
www.pinehorse.com www.dbcorral.com
End User Component
Response Types
Response Opportunities
Type
Instantaneous Highlighting the great performance users experience with their applications is a great
way to open communications and relationships. There is nothing better for convincing
users you are committed to excellent response time levels than an application that is
already working well.
When application or query response time is excellent, document everything you can
about how this application or query works within the database. You may need this
information when response times deteriorate.
Determine what “instantaneous” really means. You may be able to un-tune, get by, and
give resources to failing applications.
Questionable Caution: Know your user and application. Users may not be highly motivated to
complete their work and move on to more productive items.
If you can move them on to instantaneous response the bond you will make through
your commitment will leave a lasting impression.
Deteriorating If you can notify users that they are experiencing performance deterioration before they
begin to complain about it, you are more likely to be seen as someone as proactive,
committed to customer satisfaction.
Delayed / Here is your chance to be a hero. Give your end users a time frame when you will be
Unresponsive able to solve the problem, provide status updates, and get to work.
Schedule improvement around when users will need the system next.
Get a clear indication of what the acceptable levels of response are for the job or task
your end user is doing before attempting to resolve the issue.
www.pinehorse.com www.dbcorral.com
Alignment with Business Objectives
Do More with LESS
James F. Koopmann
Founder & President
jkoopmann@pinehorse.com
www.pinehorse.com
www.pinehorse.com www.dbcorral.com