Академический Документы
Профессиональный Документы
Культура Документы
0LFURVRIW 64/6HUYHU
® TM
7UDGHPDUNV
SAP, the SAP logo, R/2, R/3, ABAP, and other SAP-related products mentioned herein are registered or
unregistered trademarks of SAP AG. All other products mentioned in this document are registered or
unregistered trademarks of their respective companies.
Simplification Group
SAP Labs, Inc.
3475 Deer Creek Road
Palo Alto, CA 94304
www.saplabs.com/simple
simplify-r3@sap.com
This book uses EcoFLEX lay-flat binding. With this lay-flat feature—developed by
and exclusively available at Johnson Printing Service (JPS)—you can open this book
and keep it open without it snapping shut on you. You need not worry about
breaking the spine. EcoFLEX makes books like this one easier to use.
&RQWHQWVDWD*ODQFH
iv Release 4.6A/B
'HWDLOHG7DEOHRI&RQWHQWV
$FNQRZOHGJHPHQWV [L[
,QWURGXFWLRQ [[L
What Is This Guidebook About?........................................................................ xxii
Who Should Read This Book?........................................................................... xxii
Prerequisites.......................................................................................................... xxiii
User ........................................................................................................................ xxiii
System.................................................................................................................... xxiv
How to Use This Guidebook .............................................................................. xxv
Organization ............................................................................................................xxv
What’s New .......................................................................................................... xxv
Content ....................................................................................................................xxv
Conventions........................................................................................................... xxvi
Special Icons...................................................................................................... xxvii
&KDSWHU 56\VWHP$GPLQLVWUDWLRQ%DVLFV ²
Overview............................................................................................................... 1–2
Roles of an R/3 System Administrator.............................................................. 1–2
Within R/3 .............................................................................................................. 1–2
External to R/3....................................................................................................... 1–3
Traits of an R/3 System Administrator.............................................................. 1–4
R/3 System Guidelines........................................................................................ 1–4
Protect the System ................................................................................................ 1–5
Do Not Be Afraid to Ask for Help........................................................................... 1–5
Network with Other Customers and Consultants.................................................. 1–6
Keep It Short and Simple (KISS)........................................................................... 1–7
Keep Proper Documentation................................................................................. 1–7
Use Checklists....................................................................................................... 1–8
Use the Appropriate Tool for the Job .................................................................... 1–9
Perform Preventive Maintenance.......................................................................... 1–9
Do Not Change What You Do Not Have To........................................................ 1–10
Do Not Make System Changes During Critical Periods...................................... 1–11
Do Not Allow Direct Database Access................................................................ 1–12
Keep all Non-SAP Activity Off the R/3 Servers................................................... 1–12
Minimize Single Points of Failure ........................................................................ 1–13
Corollaries to Murphy’s Law ............................................................................ 1–13
Special Definitions ............................................................................................ 1–14
Database server ................................................................................................... 1–14
Application server ................................................................................................. 1–14
Instance ................................................................................................................ 1–14
System.................................................................................................................. 1–14
&KDSWHU 'LVDVWHU5HFRYHU\²
Overview............................................................................................................... 2–2
What Is a Disaster? ............................................................................................... 2–2
Why Plan for a Disaster? .................................................................................... 2–3
Planning for a Disaster ....................................................................................... 2–4
Creating a Plan...................................................................................................... 2–4
What Are the Business Requirements for Disaster Recovery? ............................ 2–4
Who will provide the requirements?.............................................................................. 2–4
What are the requirements?......................................................................................... 2–4
When Should a Disaster Recovery Procedure Begin? ......................................... 2–5
Expected Downtime or Recovery Time................................................................. 2–5
Expected Downtime................................................................................................ 2–5
Recovery Time........................................................................................................ 2–6
Recovery Group and Staffing Roles ..................................................................... 2–6
Types of Disaster Recovery .................................................................................. 2–7
Onsite ..................................................................................................................... 2–7
Offsite ..................................................................................................................... 2–7
Disaster Scenarios ................................................................................................ 2–8
Three Common Disaster Scenarios ...................................................................... 2–8
A Corrupt Database................................................................................................ 2–8
A Hardware Failure................................................................................................. 2–8
A Complete Loss or Destruction of the Server Facility........................................... 2–9
Recovery Script ................................................................................................... 2–10
Creating a Recovery Script ................................................................................. 2–10
Recovery Process ............................................................................................... 2–10
Major Steps........................................................................................................... 2–10
Crash Kit.............................................................................................................. 2–11
Business Continuation During Recovery ............................................................ 2–14
Offsite Disaster Recovery Sites .......................................................................... 2–15
Integration with your Company’s General Disaster Planning ............................. 2–15
When the R/3 System Returns............................................................................ 2–15
Test your Disaster Recovery Procedure......................................................... 2–15
Other Considerations........................................................................................ 2–16
Other Upstream or Downstream Applications..................................................... 2–16
Backup Sites........................................................................................................ 2–17
Minimizing the Chances for a Disaster ........................................................... 2–17
Minimize Human Error......................................................................................... 2–17
Minimize Single Points of Failure ........................................................................ 2–18
Cascade Failures ................................................................................................ 2–18
&KDSWHU %DFNXSDQG5HFRYHU\ ²
Overview............................................................................................................... 3–2
Restore ................................................................................................................. 3–2
Strategy ................................................................................................................. 3–2
Testing Recovery.................................................................................................... 3–3
Backup.................................................................................................................. 3–3
What to Backup and When ................................................................................... 3–3
Database ................................................................................................................ 3–3
Transaction Logs .................................................................................................... 3–5
Operating System Level Files................................................................................. 3–6
Backup Types........................................................................................................ 3–6
What Is Backed Up................................................................................................. 3–7
How the Backup Is Taken....................................................................................... 3–8
vi Release 4.6A/B
Detailed Table of Contents
&KDSWHU 0XOWL5ROH7DVNV²
Starting the R/3 System ...................................................................................... 9–2
Start R/3—NT ........................................................................................................ 9–3
Stopping the R/3 System.................................................................................... 9–5
Tasks to Be Completed Before Stopping the System........................................... 9–6
System Message (SM02) ....................................................................................... 9–6
Check that No Active Users Are on the System (AL08/SM04) .............................. 9–9
Check for Batch Jobs Running or Scheduled (SM37).......................................... 9–11
Check for Active Processes on All Systems (SM51)............................................ 9–15
Check for External Interfaces ............................................................................... 9–15
Stopping R/3........................................................................................................ 9–16
STOP R/3—NT ..................................................................................................... 9–16
&KDSWHU 56\VWHP$GPLQLVWUDWLRQ ²
Overview............................................................................................................. 10–2
Major System Monitoring Tools....................................................................... 10–2
CCMS Central Alert Monitor (Transaction RZ20) ............................................... 10–2
Accessing the CCMS Alert Monitor (RZ20).......................................................... 10–4
Current View and Alert View................................................................................. 10–5
Switching Between the Current and Alert Views .................................................. 10–6
Finding an Alert .................................................................................................... 10–7
Configuring the Batch Job to Collect Historical Data (RZ21) ............................. 10–10
View the Alerts.................................................................................................... 10–12
Analyze the Alert ................................................................................................ 10–13
Acknowledge the Alert........................................................................................ 10–14
Provide System Configuration Information (Transaction RZ20)......................... 10–15
Maintaining The Alert Thresholds for RZ20........................................................ 10–17
Hiding SAP Standard Monitor Sets .................................................................... 10–19
Create a New Monitor Set .................................................................................. 10–23
Add a Monitor to the Monitor Set........................................................................ 10–24
System Administration Assistant (Transaction SSAA)...................................... 10–28
Specific Transaction Monitoring Overview .................................................. 10–32
Failed Updates (Transaction SM13) ................................................................. 10–32
Managing Update Terminates ............................................................................ 10–35
User Training ...................................................................................................... 10–37
System Log (Transaction SM21)....................................................................... 10–38
Locks (Transaction SM12) ................................................................................ 10–41
Active Users (Transactions SM04 and AL08)................................................... 10–43
Single-Instance System (Transaction SM04) ..................................................... 10–44
Multi-Instance System (Transaction AL08) ........................................................ 10–45
Work Processes (Transactions SM50 and SM51)............................................ 10–46
For a System with Application Servers............................................................... 10–46
For a System Without Application Servers......................................................... 10–47
ABAP Dump Analysis (Transaction ST22)........................................................ 10–48
Simple Selection ................................................................................................. 10–49
Free Selection..................................................................................................... 10–49
System Message (SM02)................................................................................. 10–51
Creating a Message .......................................................................................... 10–52
Editing a Message............................................................................................. 10–54
ABAP Editor (SE38) .......................................................................................... 10–55
For Information About a Program or Report....................................................... 10–56
x Release 4.6A/B
Detailed Table of Contents
Other....................................................................................................................... B–5
White papers........................................................................................................... B–5
SAPNet, Selected Items of Interest ........................................................................ B–5
Third-Party Resources ..........................................................................................B–7
Books:..................................................................................................................... B–7
R/3 .......................................................................................................................... B–7
UNIX ....................................................................................................................... B–8
NT ........................................................................................................................... B–8
OS/400.................................................................................................................... B–9
Microsoft SQL Server ............................................................................................. B–9
Informix ................................................................................................................... B–9
DB2....................................................................................................................... B–10
Oracle ................................................................................................................... B–10
Other Topics ......................................................................................................... B–10
Magazines: ........................................................................................................... B–11
Helpful Third-Party Information............................................................................. B–11
Web Sites ............................................................................................................B–11
SAP....................................................................................................................... B–11
SAP Affiliated........................................................................................................ B–12
Third Party ............................................................................................................ B–12
Internet News Groups .........................................................................................B–12
Other Resources .................................................................................................B–13
Operating System ................................................................................................. B–13
Database .............................................................................................................. B–13
Other Helpful Products: Contributed by Users..............................................B–13
UNIX ....................................................................................................................B–14
Backup.................................................................................................................. B–14
Monitor.................................................................................................................. B–14
Scheduler.............................................................................................................. B–14
Spool Management .............................................................................................. B–14
Other..................................................................................................................... B–14
NT ........................................................................................................................B–14
Backup.................................................................................................................. B–14
Monitor.................................................................................................................. B–14
Remote Control .................................................................................................... B–15
Scheduler.............................................................................................................. B–15
Spool Management .............................................................................................. B–15
Other..................................................................................................................... B–15
Common, Both UNIX and NT..............................................................................B–15
Network ...............................................................................................................B–16
$SSHQGL[& 8VHIXO6$31RWHV &²
Overview...............................................................................................................C–2
R/3 Notes ..............................................................................................................C–2
Operating System Notes.....................................................................................C–6
Common to Multiple Operating Systems ..............................................................C–6
NT ..........................................................................................................................C–6
UNIX ......................................................................................................................C–8
AS-400...................................................................................................................C–8
Database Notes ...................................................................................................C–9
MS SQL server ......................................................................................................C–9
DB2 / UDB ...........................................................................................................C–11
Informix................................................................................................................C–12
Oracle ..................................................................................................................C–13
The combined experience in SAP and general systems administration of those who contributed to this book
is measured in decades. I hope that I am able to share with you some of their wisdom.
I also wish to express appreciation to the following individuals who provided time, material, expertise, and
resources which helped make the Release 4.6A/B guidebook possible:
Customers and partners: Bill Robichaud, Bridgestone/Firestone; Chad Horwedel, XXX; Doris Steckel,
Agilent/HP; Gary Canez, Motorola; Hanumantha Kasoji, Celanese Acetate; John Blair, Steelcase; Joyce
Courtney, Infineon; Laura Shieh, John Muir Mt Diablo Health System; Kerry Ek, Finteck; Lynne Lollis,
e.coetry/Chaptec; Otis Barr, Ceridian; Paul Wiebe, TransAlta; Richard Doctor, Acuson; Sam Yamakoshi,
Timothy Rogers; Tony Schollum, Ernst & Young; Thomas Beam, NCUA; HP; Udesh Naicker, HP.
SAP AG: Andreas Graesser, Dr. Arnold Niedermaier, Dr. Carsten Thiel, Fabian Troendle, Georg Chlond,
Dr. Gert Rusch, Herbert Stegmueller, Joerg Schmidt, Dr. Meinolf Block, Michael Demuth, Michael Schuster,
Dr. Nicholai Jordt, Otto Boehrer, Rudolf Marquet, Stephen Corbett, Dr. Stefan Fuchs, Thomas Arend,
Thomas Besthorn, Dr. Uwe Hommel, Uwe Inhoff, and Dr. Wulf Kruempelman.
SAP America: “Casper” Wai-Fu Kan, Daniel Kocsis, Daniel-Benjamin Fig Zaidspiner, Jackie Wang, Lance
Pawlikowski, Maria Gregg, Sue McFarland.
SAP Labs: Dr. Arnold Klingert, Jaideep Adhvaryu, “Jody” Honghua Yang, John Wu, Kitty Yue, Nihad Al-
Ftayeh, Peter Aeschlimann, Philippe Timothee, Dr. Thomas Brodkorb.
SAP UK: Peter Le Duc.
Contributing authors: Patricia Huang, SAP America; Jerry Forsey, SAP America.
QA testers: Brad Barnes, e.coetry; Claudia Helenius; Jeff Orr, Utilx; Lynne Lollis, e.coetry; Marc Punzalan,
Heat and Control; Patrick McShane, Bramasol.
Documentation and production: Rekha Krishnamurthy, John Kanclier, Kurt Wolf.
&RQWHQWV
:KDW,V7KLV*XLGHERRN$ERXW"
3KLORVRSK\
Release 4.6 of the System Administration Made Easy Guidebook continues in the direction of the
4.0 version. The primary focus is the importance of the on-going nature of system
administration. This book is written for an installed system, where all installation tasks have
been completed. Installation and related tasks, which are usually performed once, have not
been included in this guidebook.
2UJDQL]DWLRQ
We have tried to group items and tasks in job role categories, which allows this guidebook
to be a better reference book.
&RQWHQW
Real world practical advice from consultants and customers has been integrated into this
book. Because of this perspective, some of the statements in this book are blunt and direct.
Some of the examples we have used may seem improbable, but “facts can be, and are,
stranger than fiction.”
Because system administration is such a large area, it is difficult to reduce the volume to
what can be called “Made Easy.” Although material in this book has been carefully chosen, it
is by no means comprehensive. Certain chapters can be expanded into several books [two
examples are the chapters on disaster recovery (chapter 2) and security (chapter 11)].
:KDW,V1RW3URYLGHG
Although there are chapters on problem solving and basic performance tuning, these
chapters are only introductions to the subjects. This guidebook is not meant to be a trouble
shooting or performance tuning manual. Installation tasks are not presented. We assume
that your SAP consultant has completed these tasks.
:KR6KRXOG5HDG7KLV%RRN"
Senior consultants, experienced system administrators, and DBAs may find portions of this
guidebook very elementary, but hopefully useful.
3UHUHTXLVLWHV
To help you use this guidebook, and to prevent this guidebook from becoming as thick as
an unabridged dictionary, we defined a baseline for user knowledge and system
configuration. The two sections below (User and System) define this baseline. Review these
sections to determine how you and your system match. This book is also written with
certain assumptions about your knowledge level and the expectation that particular system
requirements have been met.
8VHU
We assume that you have a baseline knowledge of R/3, the operating system, and the
database. If you lack knowledge in any of the following points, we recommend that you
consult the many books and training classes that specifically address your operating system
and database.
You should know how to complete the following tasks at the:
< R/3 System level:
Be able to log on to R/3
Know how to navigate in R/3 using menus and transaction codes
There are screens that do not have menu paths and the only way to access them is by
using the transaction codes. In the “real world,” navigating by transaction codes is
faster and more efficient than menus.
< Operating system level:
Be familiar with the file and directory structure
Be able to use the command line to navigate and execute programs
Set up a printer
Perform a backup using standard operating system tools or third-party tools
Perform basic operating system security
Copy and move files
Properly start and stop the operating system and server
< Database level
Properly start and stop the database
Perform a backup of the database
R/3 runs on more than five different versions of UNIX. In many cases, significant
differences exist between these versions. These differences contributed to our decision to
not go into detail at the operating system level.
6\VWHP
For an ongoing productive environment, we assume that the:
< R/3 System is completely and properly installed
< Infrastructure is set up and functional
The following checklist will help you determine if your system is set up to the baseline
assumptions of this book. If you can log on to your R/3 System, most of these tasks have
already been completed.
+DUGZDUH
,QIUDVWUXFWXUH
< Is the Uninterruptible Power Supply (UPS) installed?
< Is a server or system monitor available?
6RIWZDUH
< Are the following utility software installed (as appropriate)?
Backup program
Hardware monitors
System monitors
UPS control
< R/3 System
Is R/3 installed according to SAP’s recommendation?
Is the TPPARAM file configured?
(In Release 4.6, TMS creates a file to be used as the TPPARAM file.)
Is the TMS/CTS configured?
Is the SAProuter configured?
Is the OSS1 transaction configured?
Is the ABAP workbench configured?
Has initial security been configured (default passwords changed)?
Are the NT sapmnt share or UNIX NFS sapmnt exports properly configured?
Is the online documentation installed?
< Can users log on to R/3 from their desktops?
'HVNWRS
For optimal results, we recommend that the minimum screen resolution be set as follows:
< For the users, 800 × 600
< For the system administrator, 1024 × 768 and a minimum color depth of 256 colors
The Release 4.6 GUI displays better with 64K colors.
+RZWR8VH7KLV*XLGHERRN
:KDW·V1HZ
This guidebook evolved from the previous versions of this guidebook and incorporates
customer and consultant comments. Send us your comments, so we can make future
versions better meet your needs.
&RQWHQW
The new features of the Release 4.6 guidebook are:
< System Administration Assistant (transaction SSAA), chapter 10
< New chapters on:
Security (chapter 11)
Microsoft SQL Server / Windows NT (chapter 13)
Basic problem solving (chapter 17 )
Basic performance tuning (chapter 22)
The procedures to perform regularly-scheduled tasks have been moved to the Roles section.
The unscheduled tasks section from the 4.0B guidebook has become a role-oriented section.
This change accommodates customers who perform scheduled tasks at times other than the
times presented in this guidebook. Therefore, all the task procedures are classified in one
section and by job roles, where related tasks are placed together. Regardless of the job
schedule, all jobs related to a job role are grouped in one place.
&RQYHQWLRQV
In the table below, you will find some of the text conventions used throughout this guide.
Menu Bar
Standard Toolbar
Screen Title
♦ Application Toolbar
User menu
♣ Workplace Menu
Workplace
Status Bar
♦ Application toolbar:
The screenshots shown in this guide are based on full user authorization (SAP_ALL).
Depending on your authorizations, some of the buttons on your application toolbar may
not be available.
♣ Workplace menu:
Depending on your authorizations, your workplace menu may look different from
screenshots in this guide which are based on SAP_ALL. The User menu and SAP standard
menu buttons provide different views of the workplace menu.
To learn how to build user menus, see Authorizations Made Easy guidebook Release
4.6A/B.
1RWH In this guidebook, we show the technical names of each transaction. To match our
settings, choose Extras → Settings and select Show technical names.
6SHFLDO,FRQV
Throughout this guide special icons indicate important messages. Below are brief
explanations of each icon:
Exercise caution when performing this task or step. An explanation of why you should be
careful is included.
This information helps you understand the topic in greater detail. It is not necessary to
know this information to perform the task.
These messages provide helpful hints and shortcuts to make your work faster and easier.
&RQWHQWV
Overview ..................................................................................................................1–2
Roles of an R/3 System Administrator .................................................................1–2
Traits of an R/3 System Administrator .................................................................1–4
R/3 System Guidelines ...........................................................................................1–4
Corollaries to Murphy’s Law................................................................................1–13
Special Definitions ................................................................................................1–14
2YHUYLHZ
This chapter is about the roles that a system administrator plays. These roles cross all
functional areas, and the number and intensity of the tasks depends on the size of the
company. In a small company, one person can be the entire system administration
department. In a larger company, however, this person is probably part of a team. The
purpose of this “definition” is to help clarify the roles of a system administrator. This
chapter is a list of commonly used system administration terms and their definitions.
At the end of this chapter is a list of 14 R/3 System guidelines, which a system administrator
must be aware of while working with the system.
Sample guidelines include:
< Keep it short and simple (KISS)
< Use checklists
< Do not allow direct database access
5ROHVRIDQ56\VWHP$GPLQLVWUDWRU
Depending on the size of the company and available resources, R/3 administrator(s) may
range from one person to several specialized people in several departments.
Factors that affect an R/3 system administrator’s tasks, staffing, and roles:
< Company size
< Available resources (the size of the Basis group)
< Availability of infrastructure support for:
Desktop support
Database
Network
Facilities
The R/3 system administrator may wear many hats both in or directly related to, R/3 and
indirectly or external to R/3.
:LWKLQ5
< User administrator
Set up and maintain user accounts
< Security administrator
Create and maintain SAP security profiles
Monitor and manage security access and violations
Release 4.6A/B
1–2
Chapter 1: R/3 System Administration Basics
Roles of an R/3 System Administrator
([WHUQDOWR5
< DBA for the specific database on which the system is running
Manage database specific tasks
Maintain the database’s health and integrity
< Operating system administrator
Manage the operating system access and user IDs
Manage operating system specific tasks
< Network administrator
Manage network access and user IDs
Manage network support and maintenance
< Server administrator
Manage the servers
< Desktop support
Supports the user’s desktop PC
< Printers
< Facilities
Manages facilities-related support issues, such as:
Power/utilities
Air conditioning (cooling)
7UDLWVRIDQ56\VWHP$GPLQLVWUDWRU
56\VWHP*XLGHOLQHV
Release 4.6A/B
1–4
Chapter 1: R/3 System Administration Basics
R/3 System Guidelines
3URWHFWWKH6\VWHP
:KDW
Everything you do as a system administrator should be focused on protecting and
maintaining the system’s integrity.
:K\
< If the system’s integrity is compromised, incorrect decisions could be made based on
invalid data.
< If the system cannot be recovered after a disaster, your company could be out of
business.
+RZ
< The system administrator must have a positive, professional attitude.
If the system administrator has less than this attitude, critical tasks may not be properly
completed (for example, backups may not be taken as scheduled and backup logs may
not be checked, which reduces the chances for a successful recovery).
< System administrators should maintain a “my job is on the line” attitude.
This attitude helps to ensure that administrators focus on maintaining the integrity of
the system. The company may not survive if the system crashes and cannot be
recovered.
< The system must be protected from internal and external sources.
One problem today is employees “poking around” in the network.
'R1RW%H$IUDLGWR$VNIRU+HOS
:K\
< R/3 is so large and complex that one person cannot be expected to know everything.
If you are unsure which task to complete or how to complete it, you could make a
mistake and cause a larger problem.
< Mistakes within the system can be expensive.
Certain things cannot be “undone,” and once set, are set forever.
< The only way to learn is to ask.
There are no dumb questions—only dumb reasons for not asking them.
+RZ
< SAPNet R/3 notes
< Various web sites and news groups
< Consultants
Also see the section in this chapter that covers networking with other customers and
consultants.
1HWZRUNZLWK2WKHU&XVWRPHUVDQG&RQVXOWDQWV
:KDW
Get to know the R/3 Basis and system administrators in other companies.
:K\
< Other customers may be able to provide solutions to your problems.
< Customers who help each other reduce their consulting expenses.
< The more people you know, the better your chances of finding someone to help you
solve a problem.
:KHQ:KHUHDQG+RZ
When you have the opportunity, meet:
< Other SAP customers and consultants, especially those in your specialty area
< Others using your operating system or database
Where to network:
< Training classes
< SAP events
Technical Education Conference (TechEd)
SAPPHIRE
< Participate in user groups:
Americas SAP Users Group (ASUG)
Regional SAP users groups
Database user groups, such as those for Microsoft SQL Server, Informix, DB2, or
Oracle
Operating system user groups, such as those for UNIX (the various versions), NT, or
IBM (AIX, AS400, or OS390)
< Participate in professional organizations
Participation means getting involved in the organization. The more you participate, the
more people you meet and get to know.
Release 4.6A/B
1–6
Chapter 1: R/3 System Administration Basics
R/3 System Guidelines
.HHS,W6KRUWDQG6LPSOH.,66
:K\
< Complex tasks are more likely to fail as situations change.
A process with 27 steps has 27 chances to fail, because complex tasks are difficult to
create, debug, and maintain.
< It is difficult to train people for complex tasks.
< Explaining a complex task on the telephone increases the chance that what is said will
not be properly understood and an error will be made. If the error is severe, you may
have a disaster on your hands.
+RZ
< Keep tasks as simple as possible.
< Test
.HHS3URSHU'RFXPHQWDWLRQ
:KDW
Document processes, procedures, hardware changes, configuration changes, checks
performed, problems, errors, etc. If in doubt about what to document, write it all down.
:K\
< As time passes, you will forget the details of a process or problem.
At some point, you may not remember anything about the process or problem. In an
extreme situation, which happens with short-term memory, you can quickly forget the
information in minutes.
< If you violate the KISS principle, complete documentation becomes even more
important.
< If the process is complex, complete documentation reduces the chance of errors.
< If you are sick or unavailable, complete documentation can help someone else do the
job.
< If changes need to be undone, you will know exactly what needs to be done to complete
this task.
< Documentation helps train new people.
Employee turnover must be planned for. Proper documentation makes the training and
transition of new employees easier and faster.
:KHQ
Documentation must be changed when:
< Documented items change.
Inaccurate documentation could be dangerous because it describes a process that should
not be followed.
< Changes are made to the system.
< Problems, such as hardware failures, error log entries, and security violations, occur.
+RZ
< Record everything done to the system, as it is being done, so details are not forgotten.
< Document items clearly and sufficiently so that, without assistance, a qualified person
can read what you have written and perform the task.
< Re-read older documentation to see where improvements can be made. Obvious items
get “fuzzy” over time and are no longer obvious.
< Use graphics, flowcharts, and screenshots to clarify documentation.
:KHUH
< Keep a log (notebook) on each server and record everything that you do on the servers.
< Keep a log for everything done remotely to any of the servers.
< Keep a log for other related items.
8VH&KHFNOLVWV
:KDW
A checklist lists the steps required to complete a task. Each step requires an
acknowledgement of completion (a check) or an entry (date, time, size, etc.).
:K\
< Checklists enforce a standardized process and reduce the chance that you will overlook
critical steps.
For example, if you were to use a checklist every time you drive a car, then you would
remember to turn off your headlights when you park your car, or you would not drive
off with your parking brake still set.
< Checklists force you to document events, such as run times, which may later become
important.
Release 4.6A/B
1–8
Chapter 1: R/3 System Administration Basics
R/3 System Guidelines
:KHQ
Checklists are especially useful for tasks that are:
< Complex or critical
If a step is missed or done incorrectly, the result could be serious (for example, inability
to restore the database).
< Done for the first time
< Done infrequently
It is difficult to remember how to do a complicated task that you do only once a year.
+RZ
See examples in Scheduled Tasks.
8VHWKH$SSURSULDWH7RROIRUWKH-RE
Sometimes a low-tech solution is best. Depending on the situation, a paper-and-pencil
solution may work better and be more cost effective than a computerized solution. Paper
and pencil still works during a power failure.
3HUIRUP3UHYHQWLYH0DLQWHQDQFH
:KDW
Preventive maintenance is the proactive monitoring and maintenance of the system.
:K\
< It is less disruptive and stressful if you can plan a convenient time to do a task, rather
than have it develop into an “emergency” situation.
< Fix a potential problem before it negatively impacts the system and company
operations.
An extreme situation is that the entire system is down until a particular task is
completed (for example, if the log file space goes down to zero (0), the database will
stop, and then R/3 also stops. Until sufficient file space is cleared, R/3 will not run and
certain business operations, such as shipping, may stop).
:KHQ
< Checking for problems should be a part of your regular routine.
< Scheduling tasks to fix a problem should be based on your situation, and when least
disruptive to your users.
+RZ
< Monitor the various logs and event monitors
< Obtain additional disk storage before you run out of room
< Regularly clean the tape drive(s)
< Check the database for consistency and integrity
'R1RW&KDQJH:KDW<RX'R1RW+DYH7R
:KDW
< If the system works, leave it alone.
< Do not change something just to upgrade to the latest version.
:K\
< Risk
When something changes, there is a chance that something else may break.
< Cost
Upgrading is expensive in terms of time, resources, and consulting, etc.
:KHQ
< A business need exists.
< Legal requirements call for an update.
This really is not an option. If you do not keep up you will not be complying with legal
requirements. The associated penalties can be expensive.
< If the hardware or software release is no longer supported by the vendor.
< The new release offers a specific functionality that offers added business value to your
company.
< Fixing a major problem requires an upgrade.
A fix is unavailable in a patch or an “advance release.”
+RZ
< If the change fails or causes problems, make certain you can recover to a before-the-
change condition.
< All changes must be regression tested to make sure that nothing else has been affected
by the change. In other words, everything still works as it is supposed to.
Regression testing of R/3 involves the functional team and users.
< Stage the change and test it in the following order:
1. Test system (a “Sandbox” system)
2. Development system
3. Quality Assurance system
4. Production system
Even if your company does not have all the above-mentioned systems, the key is to
maintain the general order. For example, if your company does not have a test system,
test the change in the following order:
1. Development
2. Quality Assurance
3. Production
Release 4.6A/B
1–10
Chapter 1: R/3 System Administration Basics
R/3 System Guidelines
By the time you reach the production system, you should be comfortable that nothing
will break.
'R1RW0DNH6\VWHP&KDQJHV'XULQJ&ULWLFDO3HULRGV
:KDW
A critical period is when system disruptions could cause severe operational problems.
:K\
If a problem occurs during a critical period, the business maybe severely impacted.
Note the following sequence of events:
1. A system administrator changes a printer in Shipping at the end of the month.
2. R/3 cannot send output to the new printer.
3. The users cannot print shipping documents.
4. The company cannot ship their products.
5. Revenue for the month is reduced.
:KHQ
A critical period is any time where the users and the company may be “severely” impacted
by a system problem. These periods differ depending on the particular industry or
company. What is a critical period for one company may not be critical for another
company.
The following are “real” examples of critical periods:
< At end of the month, when Sales and Shipping are booking and shipping as much as
they can, to maximize revenue for the month
< At the beginning of the month, when Finance is closing the prior month
< During the last month of the year, when Sales and Shipping are booking and shipping as
much as they can, to maximize the revenue for the year
< During the beginning of the year, when Finance is closing the books for the prior year
and getting ready for the financial audit
+RZ
< Always coordinate potentially disruptive system events with the users.
Different user groups in the company, such as Finance and Order Entry, may have
different quiet periods that need to be coordinated.
< Plan all potentially disruptive systems-related activities during quiet periods when a
problem will have minimal user impact.
'R1RW$OORZ'LUHFW'DWDEDVH$FFHVV
:KDW
Direct database access means allowing a user to run a query or update directly to the
database without going through R/3.
:K\
< By not going through R/3, there is the risk of corrupting the database.
< Directly updating the database could put the database out of sync with the R/3 buffers.
+RZ
< When R/3 writes to the database, it could be writing to many different tables.
If a user writes directly to the tables, missing a single table may corrupt the database by
putting the tables out of sync with each other.
< With direct database access, a user could accidentally execute an update or delete, rather
than a read.
.HHSDOO1RQ6$3$FWLYLW\2IIWKH56HUYHUV
:KDW
< Do not allow users to directly access (telnet, remote access, etc.) the R/3 server(s).
< Do not use the R/3 server as a general file server.
< Do not run programs that are not directly related to R/3 on an R/3 server.
:K\
< Security
Not allowing users to have access to the R/3 server reduces the chance of files from
being accidentally deleted or changed.
No access also means that user cannot look at confidential or sensitive information.
< Performance
Using the production R/3 sever as a file server creates resource contention, where
performance is a primary concern. Programs running on the R/3 servers will contend
for the same resources that R/3 is using, which affects the performance of R/3.
+RZ
Use other servers to perform functions unrelated to R/3.
Release 4.6A/B
1–12
Chapter 1: R/3 System Administration Basics
Corollaries to Murphy’s Law
0LQLPL]H6LQJOH3RLQWVRI)DLOXUH
:KDW
A single-point failure is when the failure of a single component, task, or activity causes the
system to fail or creates a critical event.
:K\
Each place where a single-point failure could occur increases the chances of a system failure
or other critical event.
For example, if:
< You only have one tape drive and it fails, you cannot back up your database.
< You rely on utility line power, and do not have a UPS, the server will crash during a
power failure and possibly corrupt the database.
< You are the only one who can complete a task, and you are on vacation, the task will not
be completed until you return (or you will be “on call” while on vacation).
To guard against a single-point failure, consider the following options:
< Systems configured with a built-in backup
< Redundant equipment, such as dual power supplies
< On-hand spares
< Sufficient personnel
< On-call consultants
< Cross-training
< Outsourcing
&RUROODULHVWR0XUSK\·V/DZ
6SHFLDO'HILQLWLRQV
There are terms used in this guidebook that have very specific meanings. To prevent
confusion, they are defined below:
'DWDEDVHVHUYHU
This is where R/3 and the database resides.
The system clock of the database server is the master clock for the R/3 system.
$SSOLFDWLRQVHUYHU
This is where R/3 application runs.
On a two-tiered system, this would be combined on the database server. Application
servers can be dedicated to online users, batch processing or a mix.
,QVWDQFH
An installation of R/3 on a server.
The two types of instances are central, and dialog. More than one instance could exist on a
physical server.
6\VWHP
The complete R/3 installation for a System ID (SID), for example PRD.
A system logically consists of the R/3 central instance and dialog instances for the SID. This
physically consists of the database server and application servers for that SID.
Release 4.6A/B
1–14
Chapter 1: R/3 System Administration Basics
Special Definitions
A two-tiered configuration combines the application and database layers on a single server.
Release 4.6A/B
1–16
&KDSWHU 'LVDVWHU5HFRYHU\
&RQWHQWV
Overview ..................................................................................................................2–2
Why Plan for a Disaster?........................................................................................2–3
Planning for a Disaster...........................................................................................2–4
Test your Disaster Recovery Procedure ............................................................2–15
Other Considerations ...........................................................................................2–16
Minimizing the Chances for a Disaster ...............................................................2–17
2YHUYLHZ
The purpose of this chapter is to help you understand what we feel is the most critical job of
a system administrator—disaster recovery.
We included this chapter at the beginning of our guidebook for two reasons:
< To emphasize the importance of the subject
Disaster recovery needs to be planned as soon as possible, because it takes time to
develop, test, and refine.
< To emphasize the importance of being prepared for a potential disaster
Murphy’s Law says:
“Disaster will strike when you are not prepared for it.”
The faster you begin planning, the more prepared you will be when a disaster does happen.
This chapter is not a disaster recovery “how to.” It is only designed to get you thinking
and working on disaster recovery.
:KDW,VD'LVDVWHU"
The goal of disaster recovery is to restore the system so that the company can continue
doing business. A disaster is anything that results in the corruption or loss of the R/3
System.
Examples include:
< Database corruption.
For example when test data is accidentally loaded into the production system.
This happens more often than people realize.
< A serious hardware failure.
< A complete loss of the R/3 System and infrastructure.
For example, the destruction of the building due to natural disaster.
The ultimate responsibility of a system administrator is to successfully restore R/3 after a
disaster.
The ultimate consequence of not restoring the system is that your company goes out of
business.
The administrator’s goal is to prevent the system from ever reaching the situation where the
ultimate responsibility is called upon.
Disaster recovery planning is a major project. Depending on your situation and the size and
complexity of your company, disaster recovery planning could take more than a year to
prepare, test, and refine. The plan could fill many volumes. This chapter helps you start
thinking about and planning for disaster recovery.
:K\3ODQIRUD'LVDVWHU"
< A system administrator should expect and plan for the worst, and then hope for the best.
< During a disaster recovery, nothing should be done for the first time.
Unpleasant surprises could be fatal to the recovery process.
Here are some of the reasons to develop a disaster recovery plan:
< Will business operations stop if R/3 fails?
< How much lost revenue and cost will be incurred for each hour that the system is down?
< Which critical business functions cannot be completed?
< How will customers be supported?
< How long can the system be down before the company goes out of business?
< Who is coordinating and managing the disaster recovery?
< What will the users do while R/3 is down?
< How long will the system be down?
< How long will it take before the R/3 System is available for use?
If you plan properly, you will be under less stress, because you know that the system can be
recovered and how long this recovery will take.
If the recovery downtime is unacceptable, management should invest in:
< Equipment, facilities, and personnel
< High availability (HA) options
HA options can be expensive. There are different degrees of HA, so customers need to
determine which option is right for them.
HA is an advanced topic beyond the scope of this guidebook. If you are interested in this
topic, contact an HA vendor.
3ODQQLQJIRUD'LVDVWHU
This chapter is not a disaster recovery “how to.” It is only designed to get you thinking
and working on disaster recovery.
&UHDWLQJD3ODQ
Creating a disaster recovery plan is a major project because:
< It can take over a year and considerable time to develop, test, and document.
< The documentation may be extensive (literally thousands of pages long).
If you do not know how to plan for a disaster recovery, get the assistance of an expert. A
bad plan (that will fail) is worse than no plan, because it provides a false sense of security.
:KDW$UHWKH%XVLQHVV5HTXLUHPHQWVIRU'LVDVWHU5HFRYHU\"
Who will provide the requirements?
< Senior management needs to provide global (or strategic) requirements and guidelines.
< The business units’ needs drive the specific detailed requirements.
These units should understand that as the requirement for the recovery time decreases,
the cost for disaster recovery increases. The units should budget for it, or if the funds
come from an administrative or IT budget, the units should support it.
What are the requirements?
Each requirement should answer the following questions:
< Who is the requestor?
< What is the requirement?
< Are other departments or customers affected by this requirement?
< Why is the requirement necessary?
When R/3 is offline, what does (or does not) happen?
What is the cost (or lost revenue) of an hour or a day of R/3 downtime?
The justification should be a concrete objective value (such as $20,000 an hour).
Define the cost (per hour, per day, etc.) of having the R/3 System down.
([DPSOH
What: No more than one hour of transaction data may be lost.
Why: The cost is 1,000 transactions per hour of lost transactions that are entered
in R/3 and cannot be recreated from memory.
This inability to recreate lost transactions may result in lost sales and upset
customers. If the lost orders are those that the customer quickly needs, this
situation can be critical.
([DPSOH
What: The system cannot be offline for more than three hours.
Why: The cost (an average of $25,000 per hour) is the inability to book sales.
([DPSOH
What: In the event of disaster, such as the loss of the building containing the R/3
data center, the company can only tolerate a two-day downtime.
Why: At that point, permanent customer loss begins.
Other: There must be an alternate method of continuing business.
:KHQ6KRXOGD'LVDVWHU5HFRYHU\3URFHGXUH%HJLQ"
Ask yourself the following questions:
< What criteria constitute a disaster?
< Have these criteria been met?
< Who needs to be consulted?
The person must be aware of the effect of the disaster on the company’s business and the
critical nature of the recovery.
([SHFWHG'RZQWLPHRU5HFRYHU\7LPH
([SHFWHG'RZQWLPH
Expected downtime is only part of the business cost of disaster recovery. For defined
scenarios, this cost is the expected minimum time before R/3 can be productive again.
Downtime may mean that no orders can be processed and no products shipped.
Management must approve this cost, so it is important that they understand that downtime
are potential business costs.
To help business continue, it is important to find out if there are alternate processes that can
be used while the R/3 System is being recovered.
5HFRYHU\*URXSDQG6WDIILQJ5ROHV
There are four key roles in a recovery group. The number of employees performing these
roles will vary depending on your company size. In a smaller company, for example, the
recovery manager and the communication liaison could be the same person. Titles and tasks
will probably differ based on your company’s needs.
We defined the following key roles:
< Recovery manager
Manages the entire technical recovery. All recovery activities and issues should be
coordinated through this person.
< Communication liaison
Handles user phone calls and keeps top management updated with the recovery status.
One person handling all phone calls allows the group doing the technical recovery to
proceed without interruptions.
To reduce interruption of the recovery staff, we recommend you maintain a status board.
The status board should list key points in the recovery plan and an estimate of when the
system will be recovered and available to use.
< If the disaster is a major geographical event (like an earthquake), your local staff will be
more concerned with their families—not the company.
< Depending on the disaster, key personnel could be injured or killed.
You should expect and plan for these situations. Plan for staff from other geographic sites
to be flown in and participate as disaster recovery team members.
A final staffing role is to plan for at least one staff member to be “unavailable.” Without this
person, the rest of the department must be able to perform a successful recovery. This issue
may become vital during an actual disaster.
7\SHVRI'LVDVWHU5HFRYHU\
Disaster recovery scenarios can be grouped into two types:
< Onsite
< Offsite
2QVLWH
Onsite recovery is disaster recovery done at your site. The infrastructure usually remains
intact. The best case scenario is a recovery done on the original hardware. The worst case
scenario is a recovery done on a backup system.
2IIVLWH
Offsite recovery is disaster recovery done at a disaster recovery site. In this scenario, all
hardware and infrastructure are lost as a result of facility destruction such as a fire, a flood,
or an earthquake. The new servers must be configured from scratch.
A major consideration is that once the original facility has been rebuilt and tested, a second
restore must take place back to the customer’s original facility. While this second restore can
be planned and scheduled at a convenient time to disrupt as few users as possible. The
timing is just as critical as the disaster. While the system is being recovered, it is down.
'LVDVWHU6FHQDULRV
There are an infinite number of disaster scenarios that could occur. It would take an infinite
amount of time to plan for them, and you will never account for all of them. To make this
task manageable, you should plan for at least three and no more than five scenarios. In the
event of a disaster, you would adapt the closest scenario(s) to the actual disaster.
The disaster scenarios are made up of:
< Description of the disaster event
< High level plan of major tasks to be performed
< Estimated time to have the system available to the users
To create your final scenario:
1. Use the Three Common Disaster Scenarios section below as a starting point.
2. Prepare three to five scenarios that cover a wide range of disasters that would apply to
you.
3. Create a high-level plan (are made up of major tasks) for each scenario.
4. Test the planned scenario, by creating different test disasters and determining if (and
how) your scenario(s) would adapt to an actual disaster.
5. If the test scenario(s) cannot be adapted, modify or develop more scenarios
6. Repeat the process.
7KUHH&RPPRQ'LVDVWHU6FHQDULRV
The following three examples range from a best-to-worst scenario order:
The downtimes in the examples below are only samples. Your downtimes will be different.
You must replace the sample downtimes with the downtimes applicable to your
environment.
$&RUUXSW'DWDEDVH
< A corrupt database could result from:
Accidentally loading test data into the production system.
A bad transport into production, which results in the failure of the production
system.
< Such a disaster requires the recovery of the R/3 database and related operating system
files.
< The “sample” downtime is eight hours.
$+DUGZDUH)DLOXUH
< The following types of items may fail:
A system processor
A drive controller
$&RPSOHWH/RVVRU'HVWUXFWLRQRIWKH6HUYHU)DFLOLW\
< The following items can be lost:
Servers
All supporting infrastructure
All documentation and materials in the building
The building
< A complete loss of the facility can result from the following types of disasters:
Fire
Earthquake
Flood
Hurricane
Tornado
Man-made disasters, such as the World Trade Center bombing
< Such a disaster requires:
Replacing the facilities
Replacing the infrastructure
Replacing lost hardware
Rebuilding the server and R/3 environment (hardware, operating system, database,
etc.)
Recovering the R/3 database and related files
< The “sample” downtime lasts eight days and comprises:
At least five days to procure hardware.
In a regional disaster, this purchase could take longer if your suppliers were also
affected by the disaster.
Use national vendors with several regional distribution centers and, as a backup,
have an out-of-area alternate supplier.
Two days to rebuild the NT server (one person); 16 hours actual work time
As the hardware is procured and the server is being rebuilt, an alternate facility is
obtained and an emergency (minimal) network is constructed
One day to integrate into the emergency network
5HFRYHU\6FULSW
:KDW
:K\
&UHDWLQJD5HFRYHU\6FULSW
Creating a recovery script requires:
< A checklist for each step
< A document with screenshots to clarify the instructions, if needed
< Flowcharts, if the flow of steps or activities is critical or confusing
5HFRYHU\3URFHVV
To reduce recovery time, define a process by:
< Completing as many tasks as possible in parallel
< Adding timetables for each step
0DMRU6WHSV
1. During a potential disaster, anticipate a recovery by:
< Collecting facts
< Recalling the latest offsite tapes
< Recalling the crash kit (see page 2–11 for more information).
< Calling all required personnel
These personnel include the internal SAP team, affected key
users, infrastructure support, IT, facilities, on-call consultants, etc.
< Preparing functional organizations (sales, finance, and shipping) for alternate
procedures for key business transactions and processes.
2. Minimize the effect of the disaster by:
< Stopping all additional transactions into the system
Waiting too long could worsen the problem
< Collecting transaction records that have to be manually reentered
3. Begin the planning process by:
< Analyzing the problem
< Fitting the disaster to your predefined scenario plans
< Modifying the plans as needed
4. Define when to initiate a disaster recovery procedure.
< What are the criteria to declare a disaster, and have they been met?
< Who will make the final decision to declare a disaster?
5. Declare the disaster.
6. Perform the system recovery.
7. Test and sign off on the recovered system.
Key users, who will use a criteria checklist to determine that the system has been
satisfactorily recovered should perform the testing.
8. Catch up with transactions that may have been handled by alternate processes during
the disaster.
Once completed, this step should require an additional sign-off.
9. Notify the users that the system is ready for normal operations.
10. Conduct a postmortem debriefing session.
Use the results from this session to improve your disaster recovery planning.
&UDVK.LW
:KDW
:K\
During a disaster, everything that is needed to recover the R/3 environment is contained in
one (or a few) containers. If you have to evacuate the site, you will not have the time to run
around, gathering the items at the last minute, hoping that you get everything you need.
In a major disaster you may not even have that opportunity.
:KHQ
When a change is made to a component (hardware or software) on the server, replace the
outdated items in the crash kit with updated items that have been tested.
A periodic review of the crash kit should be performed to determine if items need to be
added or changed. A service contract is a perfect example of an item that requires this type
of review.
:KHUHWR3XWWKH&UDVK.LW
The crash kit should be physically separated from the servers. If it is located in the server
room, and the server room is destroyed, this kit is lost.
Some crash kit storage areas include:
< Commercial offsite data storage
< Other company sites
< Another secure section of the building
+RZ
The following is an inventory list of some of the major items to put into the crash kit. You
will need to add or delete items for your specific environment. This inventory list is
organized into the following categories:
< Documentation
< Software
'RFXPHQWDWLRQ
An inventory of the crash kit should be taken by the person who seals the kit. If the seal is
broken, items may have been removed or changed, making the kit useless in a recovery.
The inventory list below must be signed and dated by the person checking the crash kit. The
following documentation must be included in the crash kit:
< Disaster recovery script
< Installation instructions for the:
Operating system
Database
R/3 System
< Special installation instructions for:
Drivers that have to be manually installed
Programs that must be installed in a specific manner
Ensure that maintenance agreements are still valid and check if the agreements expired.
These should be part of a regular schedule task.
6RIWZDUH
< Operating system:
Installation kit
Drivers for hardware, such as a Network Interface Card (NIC) or a SCSI
controller, which are not included in the installation kit
Service packs, updates, and patches
< Database:
Installation kit
Service packs, updates, and patches
Recovery scripts, to automate the database recovery
< For R/3:
Installation kit
Currently installed kernel
System profile files
tpparam file
saprouttab file
saplogon.ini
< Other R/3 integrated programs (for example, a tax package)
< Other software for the R/3 installation:
Utilities
Backup
UPS control program
Hardware monitor
FTP client
Remote control program
System monitor
%XVLQHVV&RQWLQXDWLRQ'XULQJ5HFRYHU\
Business continuation during a recovery is an alternate process to continue doing business
while recovering from a disaster. It includes:
< Cash collection
< Order processing
< Product shipping
< Bill paying
< Payroll processing
< Alternate locations to continue doing business
:K\
+RZ
2IIVLWH'LVDVWHU5HFRYHU\6LWHV
< Other company sites
< Commercial disaster recovery sites
< Share or rent space from other companies
,QWHJUDWLRQZLWK\RXU&RPSDQ\·V*HQHUDO'LVDVWHU3ODQQLQJ
Because there are many dependencies, the R/3 disaster recovery process must be integrated
with your company’s general disaster planning. This process includes telephone, network,
product deliveries, mail, etc.
:KHQWKH56\VWHP5HWXUQV
How will the transactions that were handled with the alternate process be entered into R/3
when it is operational?
7HVW\RXU'LVDVWHU5HFRYHU\3URFHGXUH
Unless you test your recovery process, you do not know if you can actually recover
your system.
A test is a simulated disaster recovery which verifies that you can recover the system and
exercise every task outlined in the disaster recovery plan.
< Test to find out if:
Your disaster recovery procedure works
Something changed, was not documented, or updated
There are steps that need clarification for others
The information that is clear to the person documenting the procedure may be
unclear to the person reading the procedure.
Older hardware is no longer available
Here, alternate planning is needed. You may have to upgrade your hardware to be
compatible with currently available equipment.
Since many factors affect recovery time, actual recovery times can only be determined by
testing. Once you have actual times (not guesses or estimates), your disaster planning
becomes more credible. If the procedure is practiced often, when a disaster occurs, everyone
will know what to do. This way, the chaos of a disaster will be reduced.
+RZ
1. Execute your disaster recovery plan on a backup system or at an offsite location.
2. Generate a random disaster scenario.
3. Execute your disaster plan to see if it handles the scenario.
:KHQ
:KHUH
< The disaster recovery test should be done at the same site that you expect to recover.
If you have multiple recovery sites, perform a test recovery at each site. The
equipment, facilities, and configuration may be different at each site. Document
all specific items that need to be completed for each site. You do not want
to discover that you cannot recover at a site after a disaster occurs.
< A backup onsite server
< Another company site
< At another company where you have a mutual support agreement
< A company that provides disaster recovery site and services
:KR6KRXOG3DUWLFLSDWH
< Primary and backup personnel who will do the job during a real disaster recovery
A provision should be made that some of the key personnel are to be unavailable during
a disaster recovery. A test procedure might involve randomly picking a name and
declare that person unavailable to participate. This procedure duplicates a real situation
in which a key person is seriously injured or killed.
< Personnel at other sites
Integrate these people into the test, since they may be needed to perform the recovery
during an actual disaster. These people will fill in for unavailable personnel.
2WKHU&RQVLGHUDWLRQV
2WKHU8SVWUHDPRU'RZQVWUHDP$SSOLFDWLRQV
For the company to function, other up (or down) stream applications also need to be
recovered with R/3. Some of these applications may be tightly associated with R/3. The
applications should be accounted for and protected in the company-wide disaster recovery
planning.
Applications located on only one person’s desktop computer must be backed up to a safe
location.
%DFNXS6LWHV
Having a contract with a disaster recovery site does not guarantee that the site will be
available. In a regional disaster, such as an earthquake or flood, many other companies will
be competing for the same commercial disaster sites. In this situation, you may not have a
site to recover to, if others have booked it before you.
The emergency backup site may not have equipment of the same performance level as your
production system. Reduced performance and transaction throughput must be considered.
Examples:
< A reduced batch schedule of only critical jobs
< Only essential business tasks will be done while on the recovery system
0LQLPL]LQJWKH&KDQFHVIRUD'LVDVWHU
There are many ways to minimize chances for a disaster. Some of these ideas seem obvious,
but it is these ideas that are often forgotten.
0LQLPL]H+XPDQ(UURU
Many disasters are caused by human error, such as a mistake or a tired operator. Do not
attempt dangerous tasks when you are tired. If you have to do a dangerous task, get a
second opinion before you start.
< Dangerous tasks should be scripted and checkpoints included to verify the steps.
Such tasks include:
Deleting the test database
Check that the delete command specifies the Test, not the
Production, database.
Moving a file
Verify that the target file (to be overwritten) is the old, not the new, file.
Formatting a new drive
Verify that the drive to be formatted is the new drive, not an existing drive with data
on it.
0LQLPL]H6LQJOH3RLQWVRI)DLOXUH
A single-point failure is when the failure of one component causes the entire system to fail.
To minimize single-point failure:
< Identify conditions where a single-point failure can occur
< Anticipate what will happen if this component or process fails
< Eliminate as many of these single points of failure as practical.
Practical is defined as the level of work involved or cost compared to the level of risk
and failure.
Types of single points of failure include:
< The backup R/3 server is located in the same data center as the production R/3 server.
If the data center is destroyed, the backup server is also destroyed.
< All the R/3 servers are on a single electrical circuit.
If the circuit breaker opens, everything on that circuit loses power, and all the servers
will crash.
&DVFDGH)DLOXUHV
A cascade failure is when one failure triggers additional failures, which increases the
complexity of a problem. The recovery involves the coordinated fixing of many problems.
([DPSOH $&DVFDGH)DLOXUH
&RQWHQWV
Overview ..................................................................................................................3–2
Restore.....................................................................................................................3–2
Backup .....................................................................................................................3–3
Tape Management.................................................................................................3–13
Performance ..........................................................................................................3–20
Useful SAP Notes..................................................................................................3–24
2YHUYLHZ
5HVWRUH
6WUDWHJ\
Business recovery time is the result of the time needed to:
< Find the problem
< Repair the damage
< Restore the database
Factors that affect the chosen restore strategy include:
< Business cost of downtime to recover
< Operational schedule
< Global or local users
< Number of transactions an hour
< Budget
Release 4.6A/B
3–2
Chapter 3: Backup and Recovery
Backup
The actual process to restore R/3 and the database will not be covered in this book. This
critical task has specific system dependencies, and we leave it to a specialist to teach. If a
restore must be done, contact a specialist or your Basis consultant. Work with your DBA or
consultant to test and document the restore process for your system. With proper training,
you should be able to do the restore.
If the restore is not done properly and completely, it could fail and must be restarted, or be
missing other files. There may be special data that you must record about your database to
recover it. Work with your specialist to identify and document this data.
7HVWLQJ5HFRYHU\
Since the restore procedure is one of the key issues of the R/3 System, database recovery
must be regularly maintained and tested. See chapter 2, Disaster Recovery.
%DFNXS
Backup is like insurance. You only need a backup if you need to restore your system.
:KDWWR%DFNXSDQG:KHQ
There are three categories of files to backup:
< Database
< Log files
< Operating system files
Note; you may need to use different tools to backup all the files. Some tools may only be
able to backup one or two of the three categories of files that need to be backed up. Example,
using the SAP DBA Calendar DB13 for on Microsoft SQL Server, it can backup the database
and the transaction log, but not the operating system files.
'DWDEDVH
:KDW
This is the core of the R/3 system and your data. Without the database backup, you cannot
recover the system.
:KHQ
The frequency of a full database backup determines how many days back in time you must
go to begin the restore:
< If a daily full backup is done, you will need yesterday’s full backup.
Only logs since yesterday’s backup need to be applied to bring the system current.
< If a weekly full backup is done, you will need last week’s full backup.
All the logs for each day (since the full backup) must now be applied to bring the system
current.
A daily full backup reduces the number of logs that need to be applied to bring the database
current. This backup reduces the risk of not getting a current database backup because of a
“bad” (unusable) log file.
If a daily full backup is not done, more logs would need to be applied. This step lengthens
the recovery process time and increases the risk of not being able to recover to the current
time. A point may be reached when it would take too long to restore the logs, because so
many logs need to be applied. For additional safety, we recommend that you do a full
monthly database backup in addition to the full daily backups.
([DPSOH:HHNO\%DFNXS
A restore from last week’s full backup that was done four days ago.
< There are 10 logs a day.
< A total of 40 logs (10 logs per day × 4 days) need to be restored.
< It takes 120 minutes to restore the log file from tape to disk (40 log x 3 minutes per
log).
< It takes 200 minutes to restore the log files to the database (40 logs x 5 minutes per
log).
< The total time to do the restore, excluding database files, is 320 minutes (5.3 hours).
([DPSOH:HHNO\%DFNXS
These examples show that the time it takes to do a log restore depends on how many days
back you have to go to get to the last full backup. Increasing the frequency of the full backup
(with less days between full backups) reduces the recovery time.
Also consider maintaining two backup cycles of the logs on disk to reduce the need to
restore these logs from tape.
Release 4.6A/B
3–4
Chapter 3: Backup and Recovery
Backup
7UDQVDFWLRQ/RJV
:KDW
Transaction logs are critical to the database recovery. These logs contain a record of the
changes made to the database, which is used to roll forward (or back) operations. It is
critical to have a complete chain of valid log backups. If you have to restore and one log is
corrupted, you cannot restore past the corrupt log.
Transaction log is stored in a directory, which must not be allowed to become full. If the
transaction log fills the available filespace, the database will stop, and no further processing
can be done in the database (and consequently) in R/3. It is important to be proactive and
periodically back up the transaction logs. Refer to the chapter specific to your database for
more information.
:KHQ
The frequency of the log backups is a business decision based on:
< Transaction volume
< Critical period(s) for the system
< Amount of data senior management is willing to lose
< Resources to perform the backups and take them offsite
Also see the examples in the database section above.
If your transaction volume is high, decrease the time interval between log backups. This
reduced time interval decreases the amount of data that could be lost in a potential data
center disaster.
+RZ
If you do not have an offsite backup server, back up the transaction log backups to tape
after each log backup and immediately send the tape offsite.
Do not back up the logs to the tape drive in “append” mode and append multiple
backups on the same tape. If a data center disaster occurs, the tape with all these logs
will be lost.
2SHUDWLQJ6\VWHP/HYHO)LOHV
:KDW
Operating system level files, which must also be backed up, are for:
< Operating environment (for example, system and network configuration)
< R/3 files
Spool files, if stored at the operating system level
(system profile: rspo/store_location = G)
Change management transport files located in /usr/sap/trans
< Other R/3 related applications
Interface or add-on products, such as those used for EDI or taxes, that store their
data or configuration outside the R/3 database.
The amount of data is small in relation to the R/3 database. Depending on how your system
is used, the above list should only require several hundred megabytes to a few gigabytes of
storage. In addition, some of the data could be “static” and may not change for months.
:KHQ
The frequency of the operating system level backup depends on the specific application. If
these application files must be kept in sync with the R/3 System, they must be backed up at
the same frequency as the log backup files. An example of this situation is a tax program
that stores its sales tax data in files external to the R/3 database. These files must be in sync
with the sales orders in the system.
A simple and fast method to back up operating system files is to copy all data file directories
to disk on a second server; from the second server, you can back up those files to tape. This
process minimizes file downtime.
Use the sample schedule below to determine your backup frequency:
%DFNXS7\SHV
Backup types is like a three-dimension matrix, where any combination can be used:
< What is backed up: full database vs incremental of the logs
< How the backup taken: online vs offline
< When the backup is made: scheduled vs nonscheduled (ad-hoc)
Release 4.6A/B
3–6
Chapter 3: Backup and Recovery
Backup
:KDW,V%DFNHG8S
< Full database backup
A backup of the entire database.
Advantages:
The entire database is backed up at once, making the restore of the database easier
and faster. There are less logs that need to be applied to bring the restored database
current.
Disadvantages:
Takes longer to run than an incremental log backup. Because of the longer backup
window there is more impact on the users while the backup is running.
< Incremental backup of the transaction logs
A backup of the transaction logs.
A full database backup is still required on a periodic basis. The usual arrangement is; a
full backup on the weekend and incremental backups during the week.
Advantages:
Much faster than a full database backup. Because of the smaller backup window,
there is less impact to the users.
Disadvantages:
A full backup is needed, as a starting point to restore the database.
To restore the database takes significantly longer and is more complicated than
restoring a full backup. The last full database backup must be restored, then all log
backups since the full backup. This can be many logs if for example the system
crashed on Friday, then the logs from Monday through Friday have to be applied.
If one log cannot be restored, all the logs after that point cannot be restored.
< Differential backup
Depending on your database and operating system, you may (or may not) have a third
option. A differential backup is a backup of only what has changed since the last full
backup. A full database backup is still required on a periodic basis. The usual
arrangement is; a full backup on the weekend and differential backups during the week.
Differential backup is not supported from within R/3 using DB13, you must use other
tools to perform a differential backup.
Microsoft SQL Server; to do a differential backup you must execute the differential
backup using Microsoft SQL Server tools.
Advantages:
The exposure to a corrupt log backup is reduced. Each differential backup is backing
up all the changes to the database since the last full backup.
Disadvantages:
Like the incremental log backup, a full backup is needed as the starting point.
The backup window for a differential is longer than a transaction log backup. It
starts as being short (just after the full backup) and gets longer as more data is
changed.
+RZWKH%DFNXS,V7DNHQ
< Offline
An offline backup is taken with the database and R/3 System down.
Advantages:
An offline backup is faster than an online backup.
During the backup, there is no issue with data changing in the database.
If the files are backed up at the same time, the related operating system files will be
in sync with the R/3 database.
Disadvantages:
R/3 is unavailable during an offline backup.
Buffers for R/3 and the database are flushed.
This process will impact performance until the buffers are populated.
< Online
An online backup is taken with the database and R/3 running.
Advantages:
R/3 is available to users during a backup.
This is needed where the system is running and used 24 hours a day and seven days
a week.
The buffers are not flushed.
Since buffers are not flushed, once the backup is complete, there is no impact on
performance.
Disadvantages:
An online backup is slower than an offline backup (a longer backup time).
Backup time is increased because processes such as R/3 are running and competing
for system resources.
Online performance is degraded while the backup is running.
Data may change in the database while it is being backed up.
Therefore, the transaction logs become critical to a successful recovery.
Related operating system level files may be out of sync with the R/3 database.
If you are using online backups, the transaction logs are critical to successfully
recovering the database.
Release 4.6A/B
3–8
Chapter 3: Backup and Recovery
Backup
:KHQWKH%DFNXS,V0DGH
< Scheduled
Scheduled backups are those that are run on a regular schedule, such as daily or weekly.
For normal operations, configure a scheduled backup. Automated backups should use
the DBA Planning Calendar (transaction DB13). This calendar provides the ability to set
up and review backup cycles. It also has the ability to process essential database checks
and update statistics. You can also set up CCMS to process the backup of transaction
logs.
Depending on the operating platform, backups and other processes configured here can
be viewed in the Batch Processing Monitors (transaction SM37). In general, the status of
the backups can be viewed using Backup Logs overview (transaction DB12).
< On-demand
On-demand backup is done on an ad hoc basis. It is done before a major change to the
system, such as for an R/3 upgrade. Backups that are controlled directly by an operator,
or on-demand, can be performed either by the DBA Planning Calendar (transaction
DB13), at the database, or at operating system level.
Although the DBA Planning Calendar can schedule backups for periodic use, it can also
be used to perform an immediate backup. For an on-demand backup, it is more common
to use tools at the database level such as Enterprise Manager (Microsoft SQL Server) or
SAPDBA (Oracle and Informix).
Regardless of the chosen backup method, you should achieve the following goals:
< Provide a reliable backup that can be restored.
< Keep the backup simple.
< Reduce the number of dependencies required for operation.
< Provide the above items with little or no impact to business units.
%DFNXS6WUDWHJ\'HVLJQ
SAP provides tools under CCMS-DB Administration in R/3 to assist in implementing your
strategy. The DBA Planning Calendar (transaction DB13) is designed for scheduling backups.
The other tool, the CCMS Monitoring tool (transaction DB12), provides historical
information to review backup statistics and tape management information. At the operating
system or database level, there are additional tools you can use to administer backup and
restores. These tools include SQL Enterprise Manager (Microsoft SQL Server) and SAPDBA
(Oracle and Informix).
To design your backup procedures:
1. Determine the recovery requirements based on an acceptable outage.
It is difficult to define the concept of acceptable outage, because “acceptable” is
subjective and will vary from company to company. The cost of what is an outage
includes productivity loss, time, money, etc. spent on recovery. This cost should be
evaluated in a manner similar to insurance. (The more coverage you want, the more the
insurance will cost.) Therefore, the faster the recovery time requirements, the more
expensive the solution.
2. Determine what hardware, software and process combinations can deliver the desired
solution.
Review the section on performance to decide which method is best. Follow the “Keep It
Simple” (KISS) rule, but more importantly, make sure your method is reliable.
3. Test your backup procedures by implementing the hardware and reviewing the actual
run times and test results.
Ensure that you get results from all types of backup that could be used in your
environment, not just the ones you think might be used. This information will aid
further evaluation and capacity planning decisions and provide useful comparison
information as needed.
4. Test your recovery procedures by creating various failure situations.
Document all aspects of the recovery including the process, who should perform various
tasks, who should be notified, etc. Remember that a recovery will be needed when you
least expect it so be prepared. Testing is not a one-time event. It should occur regularly,
with additional tests as hardware or software components change.
6XSSOHPHQWDU\%DFNXSV
Supplementary backups are made on special days (month-end, year-end), so that you can
restore the database to a previous state.
*HQHUDO3URFHGXUHV
%DFNXS
The unattended backup is performed based on the backup frequency table. The scheduling
functionality of the R/3 CCMS is used to schedule the backup. In CCMS, the required tapes
can be listed by choosing theVolumes Needed button on the backup scheduling screen. Extra
backups, such as the monthly and yearly backup, should be performed offline.
7UDQVDFWLRQ/RJ%DFNXS
If transaction log backup is performed during normal system operation, there is no user
impact. You can also find the tapes needed by choosing Volumes Needed.
No special archiving is required for offline backup. (Since the backup is performed offline,
the database remains in a consistent state.)
9HULI\LQJ%DFNXSV
Backups must be verified following a regular schedule. Transaction DB13 and other backup
utilities provide buttons such as Verify Backup to perform this task. Unless the backup is
verified, you will not know that you have properly backed up everything onto tape.
Release 4.6A/B
3–10
Chapter 3: Backup and Recovery
Backup
([DPSOH
A backup of several files was done, but the “append” switch was not properly set for
second and later files. Consequently, rather than appending the files one after the other,
for each file, the tape was rewound and the backed up. The end result was that only the
last backed up file was on the tape.
File verify has to be done after all files have been backed up. If it was done after each file, it
would not detect that the previous file was erased.
0RQLWRULQJ&RQWUROOLQJ
For each system, after backing up the database and finishing the archives, all logs must be
printed and placed in the folder.
'DWDEDVH,QWHJULW\
An integrity check of the database must be performed in one retention period to ensure that
no corrupted blocks exist in the database. These blocks may go unrecognized during backup
(see the chapter written for your database for more information).
To avoid backing up a hidden, inconsistent database, the database must be checked at least
once during a retention period.
5ROHVDQG5HVSRQVLELOLWLHV
Task Role
Backup Database Operator
Backup Archives Operator
Verifying Backups Operator/DBA
Monitoring/Controlling Operator/DBA
Database check DBA
'HVLJQ5HFRPPHQGDWLRQV
< Database
Assuming the size of your database and backup window permits it, we recommend a
full database backup be taken every day. For databases that are too large for daily full
database backup, a full backup should be taken weekly.
< Transaction Logs
Backing up the transaction logs is critical. If the filespace is used up, the database will
stop, which stops R/3.
Between 6:00 a.m. and 9:00 p.m., we recommend that you back up these logs at least
every three hours. A company with high transaction volume carries higher risk and
would increase the frequency accordingly, perhaps to every hour. Similarly, if you have
a Shipping department that opens at 3:00 a.m. and a Finance department that closes at
10:00 p.m., you would need to extend the start and end times.
< Operating System Level Files
The frequency of the operating system level backup depends on the application. If these
files must be kept in sync with R/3, they must be backed up with the same frequency
and at the same time as the database and log backups. An option for a non-sync-critical
situation is to back up these operating system level files once a day.
$6WUDWHJ\&KHFNOLVW
It is important to set up a proper procedure to back up the valuable system information.
Procedures should be defined as early as possible to prevent possible data loss. Resolve the
following list of backup issues before you go live:
< Decide how often to perform complete database backups
< Decide whether partial or differential backups are necessary
< Decide when to perform transaction log backups
< Have the ability to save a day’s worth of logs on the server.
< Provide ample disk space for the transaction log directory
< Consider using DBA Planning Calendar (DB13) to schedule transaction log backups
< Set the appropriate R/3, operating system, and database authorizations
< Create a volume labeling scheme to ensure smooth operations
< Decide on a backup retention period
< Determine tape pool size (tapes needed per day × retention + 20 percent)
Allow for growth and special needs.
< Initialize tapes
< Determine physical tape storage strategy
< Decide whether to use unattended operations
If using unattended operations, decide where (in CCMS or elsewhere).
< Document backup procedures in operations manual
< Train operators in backup procedures
Release 4.6A/B
3–12
Chapter 3: Backup and Recovery
Tape Management
7DSH0DQDJHPHQW
7UDFNLQJDQG'RFXPHQWLQJ
To easily retrieve tapes from storage, you need to track and document them.
The issues are:
< Labeling
< Tracking
< Handling
< Retention requirement
/DEHOLQJ
Tapes should be clearly labeled using one of many labeling methods. Three simple methods
are described in the examples below. Two of these methods are used by R/3 and are
important if you use DB13 to schedule your backups. Third-party backup management
software may assign their own tracking number for the labels. In this case, you must use the
label specified by the software.