02 Disaster Recovery PDF

&KDSWHU 'LVDVWHU5HFRYHU\
&RQWHQWV
Overview ..................................................................................................................22
Why Plan for a Disaster?........................................................................................23
Planning for a Disaster...........................................................................................24
Test your Disaster Recovery Procedure ............................................................215
Other Considerations ...........................................................................................216
Minimizing the Chances for a Disaster ...............................................................217
System Administration Made Easy
21
Chapter 2: Disaster Recovery

Overview
2YHUYLHZ
The purpose of this chapter is to help you understand what we feel is the most critical job of
a system administratordisaster recovery.
We included this chapter at the beginning of our guidebook for two reasons:
<
To emphasize the importance of the subject

Disaster recovery needs to be planned as soon as possible, because it takes time to
develop, test, and refine.
<
To emphasize the importance of being prepared for a potential disaster
Murphys Law says:

Disaster will strike when you are not prepared for it.
The faster you begin planning, the more prepared you will be when a disaster does happen.
This chapter is not a disaster recovery how to. It is only designed to get you thinking
and working on disaster recovery.
:KDW,VD'LVDVWHU"
The goal of disaster recovery is to restore the system so that the company can continue
doing business. A disaster is anything that results in the corruption or loss of the R/3
System.
Examples include:
< Database corruption.
For example when test data is accidentally loaded into the production system.
This happens more often than people realize.
<
A serious hardware failure.
<
A complete loss of the R/3 System and infrastructure.

For example, the destruction of the building due to natural disaster.
The ultimate responsibility of a system administrator is to successfully restore R/3 after a

disaster.
The ultimate consequence of not restoring the system is that your company goes out of
business.
The administrators goal is to prevent the system from ever reaching the situation where the
ultimate responsibility is called upon.
Disaster recovery planning is a major project. Depending on your situation and the size and
complexity of your company, disaster recovery planning could take more than a year to
22
Release 4.6 A/B

Why Plan for a Disaster?
prepare, test, and refine. The plan could fill many volumes. This chapter helps you start
thinking about and planning for disaster recovery.
:K\3ODQIRUD'LVDVWHU"
<
A system administrator should expect and plan for the worst, and then hope for the best.
<
During a disaster recovery, nothing should be done for the first time.
Unpleasant surprises could be fatal to the recovery process.
Here are some of the reasons to develop a disaster recovery plan:

<
Will business operations stop if R/3 fails?
<
How much lost revenue and cost will be incurred for each hour that the system is down?
<
Which critical business functions cannot be completed?
<
How will customers be supported?
<
How long can the system be down before the company goes out of business?
<
Who is coordinating and managing the disaster recovery?
<
<
What will the users do while R/3 is down?

How long will the system be down?
<
How long will it take before the R/3 System is available for use?
If you plan properly, you will be under less stress, because you know that the system can be
recovered and how long this recovery will take.
If the recovery downtime is unacceptable, management should invest in:
<
Equipment, facilities, and personnel
<
High availability (HA) options

HA options can be expensive. There are different degrees of HA, so customers need to
determine which option is right for them.
HA is an advanced topic beyond the scope of this guidebook. If you are interested in this
topic, contact an HA vendor.
23

Planning for a Disaster
3ODQQLQJIRUD'LVDVWHU
This chapter is not a disaster recovery how to. It is only designed to get you thinking
and working on disaster recovery.
&UHDWLQJD3ODQ
Creating a disaster recovery plan is a major project because:
<
It can take over a year and considerable time to develop, test, and document.
<
The documentation may be extensive (literally thousands of pages long).
If you do not know how to plan for a disaster recovery, get the assistance of an expert. A
bad plan (that will fail) is worse than no plan, because it provides a false sense of security.
:KDW$UHWKH%XVLQHVV5HTXLUHPHQWVIRU'LVDVWHU5HFRYHU\"
Who will provide the requirements?
< Senior management needs to provide global (or strategic) requirements and guidelines.
<
The business units needs drive the specific detailed requirements.

These units should understand that as the requirement for the recovery time decreases,
the cost for disaster recovery increases. The units should budget for it, or if the funds
come from an administrative or IT budget, the units should support it.
What are the requirements?

Each requirement should answer the following questions:
<
Who is the requestor?
<
What is the requirement?
<
<
Are other departments or customers affected by this requirement?

Why is the requirement necessary?
When R/3 is offline, what does (or does not) happen?
What is the cost (or lost revenue) of an hour or a day of R/3 downtime?
The justification should be a concrete objective value (such as $20,000 an hour).
Define the cost (per hour, per day, etc.) of having the R/3 System down.
24
Release 4.6 A/B

([DPSOH
What: No more than one hour of transaction data may be lost.

Why: The cost is 1,000 transactions per hour of lost transactions that are entered
in R/3 and cannot be recreated from memory.
This inability to recreate lost transactions may result in lost sales and upset
customers. If the lost orders are those that the customer quickly needs, this
situation can be critical.
([DPSOH
What: The system cannot be offline for more than three hours.
Why: The cost (an average of $25,000 per hour) is the inability to book sales.
([DPSOH
What: In the event of disaster, such as the loss of the building containing the R/3
data center, the company can only tolerate a two-day downtime.
Why: At that point, permanent customer loss begins.
Other: There must be an alternate method of continuing business.
:KHQ6KRXOGD'LVDVWHU5HFRYHU\3URFHGXUH%HJLQ"
Ask yourself the following questions:
<
What criteria constitute a disaster?
<
Have these criteria been met?
<
Who needs to be consulted?
The person must be aware of the effect of the disaster on the companys business and the
critical nature of the recovery.
([SHFWHG'RZQWLPHRU5HFRYHU\7LPH
([SHFWHG'RZQWLPH
Expected downtime is only part of the business cost of disaster recovery. For defined
scenarios, this cost is the expected minimum time before R/3 can be productive again.
Downtime may mean that no orders can be processed and no products shipped.
Management must approve this cost, so it is important that they understand that downtime
are potential business costs.
To help business continue, it is important to find out if there are alternate processes that can
be used while the R/3 System is being recovered.
25

The following costs are involved with downtimes:

<
The length of time that R/3 is down.

The longer the system is down, the longer the catch-up period when it is brought back
up. The transactions from the alternate processes that were in place during the disaster
have to be applied to the system to make it current. This situation is more critical in a
high-volume environment.
<
A downed system is more expensive during the business day when business activity
would stop than at the end of the business day when everyone has gone home.
<
When customers cannot be serviced or supported, they may be lost to a competitor.
The duration of acceptable downtime depends on the company and the nature of its
business.
5HFRYHU\7LPH
Unless you test your recovery procedure, the recovery time is only an estimate, or worse, a
guess. Different disaster scenarios have different recovery times, which are based on what
needs to be done to become operational again.
The time to recover must be matched to the business requirements. If this time is greater
than the business requirements, the mismatch needs to be communicated to the appropriate
managers or executives.
Resolving this mismatch involves:
<
Investing in equipment, processes, and facilities to reduce the recovery time.
<
Changing the business requirements to accept the longer recovery time and accepting
the consequences.
An extreme (but possible) example: A company cannot afford the cost and lost revenue for
the month it would take one person to recover the system. During that time, the competition
would take away customers, payment would be due to vendors, and bills would not be
collected. In this situation, senior management needs to allocate resources to reduce the
recovery time to an acceptable level.
5HFRYHU\*URXSDQG6WDIILQJ5ROHV
There are four key roles in a recovery group. The number of employees performing these
roles will vary depending on your company size. In a smaller company, for example, the
recovery manager and the communication liaison could be the same person. Titles and tasks
will probably differ based on your companys needs.
We defined the following key roles:
<
Recovery manager
Manages the entire technical recovery. All recovery activities and issues should be
coordinated through this person.
<
Communication liaison
Handles user phone calls and keeps top management updated with the recovery status.
One person handling all phone calls allows the group doing the technical recovery to
proceed without interruptions.
26
Release 4.6 A/B

<
Technical recovery team

Does the actual technical recovery. As the recovery progresses, the original plan may
have to be modified. This role must manage the changes and coordinate the technical
recovery.
<
Review and certification manager

Coordinates and plans the post-recovery testing and certification with users.
To reduce interruption of the recovery staff, we recommend you maintain a status board.
The status board should list key points in the recovery plan and an estimate of when the
system will be recovered and available to use.
<
If the disaster is a major geographical event (like an earthquake), your local staff will be
more concerned with their familiesnot the company.
<
Depending on the disaster, key personnel could be injured or killed.
You should expect and plan for these situations. Plan for staff from other geographic sites
to be flown in and participate as disaster recovery team members.
A final staffing role is to plan for at least one staff member to be unavailable. Without this
person, the rest of the department must be able to perform a successful recovery. This issue
may become vital during an actual disaster.
7\SHVRI'LVDVWHU5HFRYHU\
Disaster recovery scenarios can be grouped into two types:
<
Onsite
<
Offsite
2QVLWH
Onsite recovery is disaster recovery done at your site. The infrastructure usually remains
intact. The best case scenario is a recovery done on the original hardware. The worst case
scenario is a recovery done on a backup system.
2IIVLWH
Offsite recovery is disaster recovery done at a disaster recovery site. In this scenario, all
hardware and infrastructure are lost as a result of facility destruction such as a fire, a flood,
or an earthquake. The new servers must be configured from scratch.
A major consideration is that once the original facility has been rebuilt and tested, a second
restore must take place back to the customers original facility. While this second restore can
be planned and scheduled at a convenient time to disrupt as few users as possible. The
timing is just as critical as the disaster. While the system is being recovered, it is down.
27

'LVDVWHU6FHQDULRV
There are an infinite number of disaster scenarios that could occur. It would take an infinite
amount of time to plan for them, and you will never account for all of them. To make this
task manageable, you should plan for at least three and no more than five scenarios. In the
event of a disaster, you would adapt the closest scenario(s) to the actual disaster.
The disaster scenarios are made up of:
<
Description of the disaster event
<
High level plan of major tasks to be performed
<
Estimated time to have the system available to the users
To create your final scenario:

1. Use the Three Common Disaster Scenarios section below as a starting point.
2. Prepare three to five scenarios that cover a wide range of disasters that would apply to
you.
3. Create a high-level plan (are made up of major tasks) for each scenario.
4. Test the planned scenario, by creating different test disasters and determining if (and
how) your scenario(s) would adapt to an actual disaster.
5. If the test scenario(s) cannot be adapted, modify or develop more scenarios
6. Repeat the process.
7KUHH&RPPRQ'LVDVWHU6FHQDULRV
The following three examples range from a best-to-worst scenario order:
The downtimes in the examples below are only samples. Your downtimes will be different.
You must replace the sample downtimes with the downtimes applicable to your
environment.
$&RUUXSW'DWDEDVH
<
<
<
A corrupt database could result from:

Accidentally loading test data into the production system.
A bad transport into production, which results in the failure of the production
system.
Such a disaster requires the recovery of the R/3 database and related operating system
files.
The sample downtime is eight hours.
$+DUGZDUH)DLOXUH
<
28
The following types of items may fail:

A system processor
A drive controller
Release 4.6 A/B

<
<
Multiple-drives in a drive array, so that the drive array fails
Such a disaster scenario requires:

Replacing failed hardware
Rebuilding the server (operating system and all programs)
Recovering the R/3 database and related files
The sample downtime is seven days and comprises:
Five days to procure replacement hardware
Two days to rebuild the NT server (one person); 16 hours of actual work time
$&RPSOHWH/RVVRU'HVWUXFWLRQRIWKH6HUYHU)DFLOLW\
<
The following items can be lost:

Servers
All supporting infrastructure
All documentation and materials in the building
The building
<
A complete loss of the facility can result from the following types of disasters:
Fire
Earthquake
Flood
Hurricane
Tornado
Man-made disasters, such as the World Trade Center bombing
Such a disaster requires:
Replacing the facilities
Replacing the infrastructure
Replacing lost hardware
Rebuilding the server and R/3 environment (hardware, operating system, database,
etc.)
Recovering the R/3 database and related files
<
<
The sample downtime lasts eight days and comprises:

At least five days to procure hardware.
In a regional disaster, this purchase could take longer if your suppliers were also
affected by the disaster.
Use national vendors with several regional distribution centers and, as a backup,
have an out-of-area alternate supplier.
Two days to rebuild the NT server (one person); 16 hours actual work time
As the hardware is procured and the server is being rebuilt, an alternate facility is
obtained and an emergency (minimal) network is constructed
One day to integrate into the emergency network
29

<
Complete loss or destruction requires a recovery back to a new facility.
5HFRYHU\6FULSW
:KDW
A recovery script is a document that provides step-by-step instructions about:

< The process required to recover R/3
<
Who will complete each step
<
The expected time for long steps
<
Dependencies between steps
:K\
A script is necessary because it helps you:

<
Develop and use a proven series of steps to restore R/3
<
Prevent missing steps

Missing a critical step may require restarting the recovery process from the beginning,
which delays the recovery.
If the primary recovery person is unavailable, a recovery script helps the backup person
complete the recovery.
&UHDWLQJD5HFRYHU\6FULSW
Creating a recovery script requires:
<
A checklist for each step
<
A document with screenshots to clarify the instructions, if needed
<
Flowcharts, if the flow of steps or activities is critical or confusing
5HFRYHU\3URFHVV
To reduce recovery time, define a process by:
<
Completing as many tasks as possible in parallel
<
Adding timetables for each step
0DMRU6WHSV
1. During a potential disaster, anticipate a recovery by:
<
Collecting facts
<
Recalling the latest offsite tapes
<
Recalling the crash kit (see page 211 for more information).
<
Calling all required personnel

These personnel include the internal SAP team, affected key
users, infrastructure support, IT, facilities, on-call consultants, etc.
210
Release 4.6 A/B

<
Preparing functional organizations (sales, finance, and shipping) for alternate

procedures for key business transactions and processes.
2. Minimize the effect of the disaster by:

<
Stopping all additional transactions into the system

Waiting too long could worsen the problem
<
Collecting transaction records that have to be manually reentered
3. Begin the planning process by:

<
Analyzing the problem
<
Fitting the disaster to your predefined scenario plans
<
Modifying the plans as needed
4. Define when to initiate a disaster recovery procedure.

<
What are the criteria to declare a disaster, and have they been met?
<
Who will make the final decision to declare a disaster?
5. Declare the disaster.

6. Perform the system recovery.
7. Test and sign off on the recovered system.
Key users, who will use a criteria checklist to determine that the system has been
satisfactorily recovered should perform the testing.
8. Catch up with transactions that may have been handled by alternate processes during
the disaster.
Once completed, this step should require an additional sign-off.
9. Notify the users that the system is ready for normal operations.
10. Conduct a postmortem debriefing session.
Use the results from this session to improve your disaster recovery planning.
&UDVK.LW
:KDW
A crash kit contains everything needed to:

<
Rebuild the R/3 servers
<
Reinstall R/3
<
Recover the R/3 database and related files
:K\
During a disaster, everything that is needed to recover the R/3 environment is contained in
one (or a few) containers. If you have to evacuate the site, you will not have the time to run
around, gathering the items at the last minute, hoping that you get everything you need.
In a major disaster you may not even have that opportunity.
211


:KHQ
When a change is made to a component (hardware or software) on the server, replace the
outdated items in the crash kit with updated items that have been tested.
A periodic review of the crash kit should be performed to determine if items need to be
added or changed. A service contract is a perfect example of an item that requires this type
of review.
:KHUHWR3XWWKH&UDVK.LW
The crash kit should be physically separated from the servers. If it is located in the server
room, and the server room is destroyed, this kit is lost.
Some crash kit storage areas include:
<
<
Commercial offsite data storage

Other company sites
<
Another secure section of the building
+RZ
The following is an inventory list of some of the major items to put into the crash kit. You
will need to add or delete items for your specific environment. This inventory list is
organized into the following categories:
<
Documentation
<
Software
'RFXPHQWDWLRQ
An inventory of the crash kit should be taken by the person who seals the kit. If the seal is
broken, items may have been removed or changed, making the kit useless in a recovery.
The inventory list below must be signed and dated by the person checking the crash kit. The
following documentation must be included in the crash kit:
212
<
Disaster recovery script
<
Installation instructions for the:

Operating system
Database
R/3 System
<
Special installation instructions for:

Drivers that have to be manually installed
Programs that must be installed in a specific manner
Release 4.6 A/B

<
Copies of:
SAP license for all instances
Service agreements (with phone numbers) for all servers
Ensure that maintenance agreements are still valid and check if the agreements expired.
These should be part of a regular schedule task.
<
Instructions to recall tapes from offsite data storage
<
List of personnel authorized to recall tapes from offsite data storage

This list must correspond to the list maintained by the data storage company.
<
A parts list
If the server is destroyed, this list should be in sufficient detail to purchase or lease
replacement hardware. Over time, if original parts are no longer available, an alternate
parts list will have to be prepared. At this point, you might consider upgrading the
equipment.
<
File system layout
<
Hardware layout
You need to know which:
Cards go in which slots
Cables go where (connector-by-connector)
Labeling cables and connectors greatly reduces confusion
<
Phone numbers for:

Key users
Information services personnel
Facilities personnel
Other infrastructure personnel
Consultants (SAP, network, etc.)
SAP hotline
Offsite data storage
Security department or personnel
Service agreement contacts
Hardware vendors
6RIWZDUH
<
Operating system:
Installation kit
Drivers for hardware, such as a Network Interface Card (NIC) or a SCSI
controller, which are not included in the installation kit
Service packs, updates, and patches
213

<
<
<
<
Database:
Installation kit
Service packs, updates, and patches
Recovery scripts, to automate the database recovery
For R/3:
Installation kit
Currently installed kernel
System profile files
tpparam file
saprouttab file
saplogon.ini
Other R/3 integrated programs (for example, a tax package)
Other software for the R/3 installation:
Utilities
Backup
UPS control program
Hardware monitor
FTP client
Remote control program
System monitor
%XVLQHVV&RQWLQXDWLRQ'XULQJ5HFRYHU\
Business continuation during a recovery is an alternate process to continue doing business
while recovering from a disaster. It includes:
<
Cash collection
<
Order processing
<
Product shipping
<
Bill paying
<
Payroll processing
<
Alternate locations to continue doing business
:K\
Without an alternate process, your company would be unable to do business.

Some of the problems you would encounter include:
214
<
Orders cannot be entered
<
Product cannot be shipped
<
Money cannot be collected
Release 4.6 A/B

Test your Disaster Recovery Procedure
+RZ
There are many alternate processes, including:

<
Manual paper-based
<
Stand alone PC-based products
2IIVLWH'LVDVWHU5HFRYHU\6LWHV
<
Other company sites
<
Commercial disaster recovery sites
<
Share or rent space from other companies
,QWHJUDWLRQZLWK\RXU&RPSDQ\V*HQHUDO'LVDVWHU3ODQQLQJ
Because there are many dependencies, the R/3 disaster recovery process must be integrated
with your companys general disaster planning. This process includes telephone, network,
product deliveries, mail, etc.
:KHQWKH56\VWHP5HWXUQV
How will the transactions that were handled with the alternate process be entered into R/3
when it is operational?
7HVW\RXU'LVDVWHU5HFRYHU\3URFHGXUH
Unless you test your recovery process, you do not know if you can actually recover
your system.
A test is a simulated disaster recovery which verifies that you can recover the system and
exercise every task outlined in the disaster recovery plan.
<
Test to find out if:

Your disaster recovery procedure works
Something changed, was not documented, or updated
There are steps that need clarification for others
The information that is clear to the person documenting the procedure may be
unclear to the person reading the procedure.
Older hardware is no longer available
Here, alternate planning is needed. You may have to upgrade your hardware to be
compatible with currently available equipment.
Since many factors affect recovery time, actual recovery times can only be determined by
testing. Once you have actual times (not guesses or estimates), your disaster planning
215

Other Considerations
becomes more credible. If the procedure is practiced often, when a disaster occurs, everyone
will know what to do. This way, the chaos of a disaster will be reduced.
+RZ
1. Execute your disaster recovery plan on a backup system or at an offsite location.

2. Generate a random disaster scenario.
3. Execute your disaster plan to see if it handles the scenario.
:KHQ
A full disaster recovery should be practiced at least once a year.

:KHUH
<
The disaster recovery test should be done at the same site that you expect to recover.
If you have multiple recovery sites, perform a test recovery at each site. The
equipment, facilities, and configuration may be different at each site. Document
all specific items that need to be completed for each site. You do not want
to discover that you cannot recover at a site after a disaster occurs.
<
A backup onsite server
<
<
Another company site

At another company where you have a mutual support agreement
<
A company that provides disaster recovery site and services
:KR6KRXOG3DUWLFLSDWH
<
Primary and backup personnel who will do the job during a real disaster recovery
A provision should be made that some of the key personnel are to be unavailable during
a disaster recovery. A test procedure might involve randomly picking a name and
declare that person unavailable to participate. This procedure duplicates a real situation
in which a key person is seriously injured or killed.
<
Personnel at other sites

Integrate these people into the test, since they may be needed to perform the recovery
during an actual disaster. These people will fill in for unavailable personnel.
2WKHU&RQVLGHUDWLRQV
2WKHU8SVWUHDPRU'RZQVWUHDP$SSOLFDWLRQV
For the company to function, other up (or down) stream applications also need to be
recovered with R/3. Some of these applications may be tightly associated with R/3. The
applications should be accounted for and protected in the company-wide disaster recovery
planning.
216
Release 4.6 A/B

Minimizing the Chances for a Disaster
Applications located on only one persons desktop computer must be backed up to a safe
location.
%DFNXS6LWHV
Having a contract with a disaster recovery site does not guarantee that the site will be
available. In a regional disaster, such as an earthquake or flood, many other companies will
be competing for the same commercial disaster sites. In this situation, you may not have a
site to recover to, if others have booked it before you.
The emergency backup site may not have equipment of the same performance level as your
production system. Reduced performance and transaction throughput must be considered.
Examples:
<
A reduced batch schedule of only critical jobs
<
Only essential business tasks will be done while on the recovery system
0LQLPL]LQJWKH&KDQFHVIRUD'LVDVWHU
There are many ways to minimize chances for a disaster. Some of these ideas seem obvious,
but it is these ideas that are often forgotten.
0LQLPL]H+XPDQ(UURU
Many disasters are caused by human error, such as a mistake or a tired operator. Do not
attempt dangerous tasks when you are tired. If you have to do a dangerous task, get a
second opinion before you start.
<
Dangerous tasks should be scripted and checkpoints included to verify the steps.
Such tasks include:
Deleting the test database
Check that the delete command specifies the Test, not the
Production, database.
Moving a file
Verify that the target file (to be overwritten) is the old, not the new, file.
Formatting a new drive
Verify that the drive to be formatted is the new drive, not an existing drive with data
on it.
217

Minimizing the Chances for a Disaster
0LQLPL]H6LQJOH3RLQWVRI)DLOXUH
A single-point failure is when the failure of one component causes the entire system to fail.
To minimize single-point failure:
<
Identify conditions where a single-point failure can occur
<
Anticipate what will happen if this component or process fails
<
Eliminate as many of these single points of failure as practical.

Practical is defined as the level of work involved or cost compared to the level of risk
and failure.
Types of single points of failure include:

<
The backup R/3 server is located in the same data center as the production R/3 server.
If the data center is destroyed, the backup server is also destroyed.
<
All the R/3 servers are on a single electrical circuit.

If the circuit breaker opens, everything on that circuit loses power, and all the servers
will crash.
&DVFDGH)DLOXUHV
A cascade failure is when one failure triggers additional failures, which increases the
complexity of a problem. The recovery involves the coordinated fixing of many problems.
([DPSOH $&DVFDGH)DLOXUH
1. A power failure in the air conditioning system causes an environmental (air
conditioning) failure in the server room.
2. Without cooling, the temperature in the server room rises above the equipments
acceptable operating temperature.
3. The overheating causes a hardware failure in the server.
4. The hardware failure causes a database corruption.
In addition, overheating can damage many things, such as:
Network equipment
Phone system
Other servers
The recovery becomes complex because:

<
Fixing one problem may uncover other problems or damaged equipment.
<
Certain items cannot be tested or fixed until other equipment is operational.
In this case, a system that monitors the air conditioning system or the temperature in the
server room could alert the appropriate employees before the temperature in the server
room becomes too hot.
218
Release 4.6 A/B

02 Disaster Recovery PDF

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

02 Disaster Recovery PDF

Загружено:

Авторское право:

Доступные форматы

&KDSWHU 'LVDVWHU5HFRYHU\

System Administration Made Easy

Chapter 2: Disaster Recovery

To emphasize the importance of the subject

To emphasize the importance of being prepared for a potential disaster

Murphys Law says:

A serious hardware failure.

A complete loss of the R/3 System and infrastructure.

The ultimate responsibility of a system administrator is to successfully restore R/3 after a

Release 4.6 A/B

Chapter 2: Disaster Recovery

Here are some of the reasons to develop a disaster recovery plan:

Will business operations stop if R/3 fails?

Which critical business functions cannot be completed?

How will customers be supported?

Who is coordinating and managing the disaster recovery?

What will the users do while R/3 is down?

Equipment, facilities, and personnel

High availability (HA) options

System Administration Made Easy

Chapter 2: Disaster Recovery

The documentation may be extensive (literally thousands of pages long).

The business units needs drive the specific detailed requirements.

What are the requirements?

Who is the requestor?

What is the requirement?

Are other departments or customers affected by this requirement?

Release 4.6 A/B

Chapter 2: Disaster Recovery

What: No more than one hour of transaction data may be lost.

What criteria constitute a disaster?

Have these criteria been met?

Who needs to be consulted?

System Administration Made Easy

Chapter 2: Disaster Recovery

The following costs are involved with downtimes:

The length of time that R/3 is down.

When customers cannot be serviced or supported, they may be lost to a competitor.

Investing in equipment, processes, and facilities to reduce the recovery time.

Release 4.6 A/B

Chapter 2: Disaster Recovery

Technical recovery team

Review and certification manager

Depending on the disaster, key personnel could be injured or killed.

System Administration Made Easy

Chapter 2: Disaster Recovery

Description of the disaster event

High level plan of major tasks to be performed

Estimated time to have the system available to the users

To create your final scenario:

A corrupt database could result from:

The following types of items may fail:

Release 4.6 A/B

Chapter 2: Disaster Recovery

Multiple-drives in a drive array, so that the drive array fails

Such a disaster scenario requires:

The following items can be lost:

The sample downtime lasts eight days and comprises:

System Administration Made Easy

Chapter 2: Disaster Recovery

Complete loss or destruction requires a recovery back to a new facility.

A recovery script is a document that provides step-by-step instructions about:

&KDSWHU 'LVDVWHU5HFRYHU\