Вы находитесь на странице: 1из 85

Chapter 02

Disaster Recovery and Business Continuity

Why Business Continuity?


The Cost Of Downtime
LEGAL/REGULATORY Contractual Requirements Service Level Agreements mean time between failures (MTBF), mean time to repair or mean time to recovery (MTTR) REVENUE Regulatory Requirements Direct Loss Deferred Losses Compensatory Payments Lost Future Revenue Billing Losses Investment Losses

PRODUCTIVITY
Loss Of Productivity Employees Impacted. Burdened Hourly Rate

REPUTATION
Customers Suppliers Financial Markets Banks Business Partners Etc.

FINANCIAL PERFORMANCE
Lost Market Share Revenue Recognition Cash Flow Lost Discounts Payment Guarantees Stock Price Credit Rating

OTHER EXPENSES
Temporary employees Equipment Rental Overtime Extra Shipping Costs Travel Expenses

Business Continuity Program Pyramid

Steering Committee

Who

Senior personnel from all key entities with a stake in the ongoing program Have the authority to make decisions, implement new policies, commit resources to support and implement the projects/program
Provides strategic direction and decision making Approves annual program objectives and ensures appropriate commitment of resources to the program

Charter

Benefit

Builds consensus, unit of effort Project/Program policies, procedures, and guidance enforcement

Business Continuity Program Pyramid

Continuity Program Office

Who

Core dedicated staff with industry/government and business continuity expertise Business Continuity Program project management Lifecycle Continuity Program oversight and management Dedicated expertise and focus Continuity of planning and operations

Charter

Benefit

Business Continuity Program Pyramid

Continuity Planning

Who

All departments/entities of the corporation/government The ongoing design, procurement, and use of robust systems, facilities, staffing models, and equipment to mitigate the risk of outages, or the impact of outages. More robust processes, systems, facilities Less downtime

Charter (read What)

Benefit

Business Continuity Program Pyramid

Business Impact Analysis

Who

All business and support units/entities Identify/validate department/entity critical business and support functions determine Information Technology and connectivity requirements to support critical business/support functions determine the Recovery Time Objectives (RTO) for critical functions establish a Minimum Acceptable Recovery Configuration (MARC) for business and support units/entities

Charter

Benefit

Know your business Establish recovery requirements

Business Continuity Program Pyramid

Disaster Recovery/ Business Resumption Planning

Disaster Recovery Planning

The strategic and detailed planning for the timely restoration of information technology, network and

Business Resumption Planning

telephony following a disaster.

The strategic and detailed planning for the timely restoration of vital business/ support functions following a disaster.

Business Continuity Program Pyramid

Crisis Management Program


Who

All key business and support units/entities


Provide policies, procedures and guidance, to organize, train, equip and manage staff, equipment, and facilities to ensure a capability to rapidly evaluate and respond to significant incidents that impact, or may impact, an organizations critical operations

Charter

Benefits

Rapid, coordinated identification and response to incidents in an effort to prevent the incidents from becoming disasters Protection of: life; corporate image, prestige, revenue, market share Mitigation of incident generated legal and regulatory risks

Integrated Response/Recovery Plan Structure


CORPORATECRISIS MANAGEMENT PLAN

Corporate Crisis Management Team (CMT)

Business Resumption/Disaster Recovery/Crisis Management Plans

SITE RECOVERY PLANS


Sites A-n IMT Facilities Department A Department B Department n

DATA CENTER DISASTER RECOVERY PLANS


Sites A-n IMT App Group

Server Group

Tape Group

Network Group

DEPARTMENT SPECIFIC RECOVERY CHAPTERS

GENERAL IT/DATA NETWORK BUSINESS RESUMPTION SUPPORT PLANS

Business Continuity Program Pyramid

Plan Scorecarding and Testing

What

Crisis Management Plan Business Resumption Plans Disaster Recovery Plans Scorecarding- Evaluate plan content for structure, scope, and breadth of information in preparation for testing of plan for recovery operations Testing- Evaluation of plan content for effectiveness/adequacy in recovery operations Quality control of plans Training of personnel Confidence

Charter

Benefits

Business Continuity Program Pyramid

Certification Program

What

Business Resumption Plans Disaster Recovery Plans


Annual, formal rating of Business Resumption and Disaster Recovery Plans using scorecard results, testing results, and other criteria to assess plan readiness. a standardized assessment of plan quality and readiness targeted program planning and budgeting confidence in plan readiness and quality

Charter

Benefits

Continuity Program Development Cycle


Understand Your BusinessAccomplish A Business Impact Analysis

MaintainTesting Metrics/Program Maintenance Program, Change Management Program; Audit, Certification Program

CPO

DevelopIT Disaster Recovery And Business Resumption Strategies

4
Develop & ImplementAn Enterprise Recovery Management Process

DevelopIT Disaster Recovery Plans; Business Resumption Plans; Testing and Certification Program

CPO Project Responsibilities


INITIATION PROCESSES Feasibility High Level Planning Charter Definition
Initiation

PLANNING PROCESSES Plan Development Policies, Procedures, Guidance Communications Planning QC Planning Risk Management Plan Contract/Project Change Management Planning Deliverable Acceptance Criteria

Controlling

CONTROLLING PROCESSES Quality, Scope, Change, Risk, Schedule, Performance Control Analysis and Reporting

EXECUTION PROCESSES Information Coordination and Distribution Risk Response Risk Estimation Resource Management Issue Resolution

CLOSING PROCESSES

CPO Lifecycle Responsibilities


TestingMetrics, Accomplishment, And Required Plan Changes Program Methodology, Policies, Procedures, & Guidance Change ControlNew Systems New Functions New Designs Etc.

Plan Development/ Maintenance

Crisis Management & Recovery Plan Implementation

CPO

Recovery Plan Scorecarding & Certification

Training / Awareness Program Strategy Validation/ Updates

Initial and Ongoing Critical Vendor Qualification and SLA/ Contract Review

CPO Summary Chart Of Responsibilities


CRISIS MANAGEMENT Planning & Execution Evaluate Business Needs Business Cases Project Justification Recommendations Decisions Policy Vision Strategy Direction SLAs And Access IT and Business Units Recovery Planning & Testing Policies, Procedures, Guidance Reports Requirements

Steering Committee

Contractor Database

PROGRAM PP&Gs Methodology, Plan development templates Change control Communications management Crisis management Plan scorecards and certification Vendor qualification, SLAs Recovery strategies Risk management Testing and metrics Software PROJECT PP&Gs Project initiation Project planning Execution and control Closure

Contractor and Corp Resource Pool


Work with CPO Customize Project WBS/Schedule Daily Project Management Identify Issues

CONTINUITY PROGRAM OFFICE


Provides Guidance/ Oversight/ Analysis Project Managers

Provides

Training, Awareness And Education Program

Incorporates Best Practices Updates Templates Evaluates Project Results Maintains Knowledge Library

Knowledge Library

CPO Staffing- Executive Sponsor


STAFFING PLANNING ROLE IMPLEMENTATION ROLE
Integration with other corporate strategic initiatives. Issue resolution. Resource commitment. Approval authority for change requests. Link to Executive Steering Committee

Executive Sponsor

Secure funding and resources. Make Go/No-Go decisions. Link to Executive Steering Committee Provide strategic guidance to CPO

CPO Staffing- CPO Leader


STAFFING PLANNING ROLE Oversee development and approval of project management and Enterprise Recovery Management Process (ERMP) policies, procedures, methodology and guidance. Develop CPO & project staffing models. Project Initiation and Planning. Development of Issue Resolution Plan. Development of deliverable acceptance procedures. Design of vendor qualification program. Design and administration of the corporate Crisis Management Plan. IMPLEMENTATION ROLE Daily leadership, oversight, and management of CPO staff. Responsible for CPO performance and deliverables. Champion project management methodology implementation and ERMP. Project Implementation, Control, and Closure. Responsible for implementation and management of program/project Communications Plan, Certification Program, Awareness and Training Program, Change Control, and Risk Management Program. Leadership, or administration of, the corporate Crisis Management Team.

CPO Lead

CPO Staffing- CPO Leader


STAFFING PLANNING ROLE Develop program/project management books, forms, templates, etc. Develop plan Certification Program; Training and Awareness Program; Communications Management Plan; Risk Management Plan. Develop Change Management Program. Program budget. Business continuity risk analysis of new facilities design, hardware purchases, software, network design, business processes, vendors, etc. Project Initiation and Planning. IMPLEMENTATION ROLE Tracking of project and program progress against plans/schedule (project/ program implementation and control). Maintain recovery plan Certification Program database. Implement Awareness and Training Program. Set-up and maintenance of program Knowledge Library. Ensure compliance with program ERMP. Issue/problem resolution. Support Executive Sponsor with presentations and reports. Vendor qualification program. Support for corporate Crisis Management Team.

CPO Staff

Establishing a CPO

Identify and define measures of success for the CPO


Define goals and objectives of the CPO Codify the charter of the CPO Write a vision and mission statement for the CPO Document the purpose of the initiative and what value is to be created Determine how return-on-investment will be measured Determine what other metrics and measurements should be used (e.g., quality, customer satisfaction, productivity)

Establishing a CPO

Define governance structure

Define how the CPO will be organized and staffed Determine what rules the CPO will follow, how it will interface with corporate departments and subordinate headquarters Codify a CPO charter

Define the change and issue management processes

Establish policies, procedures, and guidance on how changes, issues and other events that will impact CPO projects and program will be recorded, tracked and resolved

Establishing a CPO

Define leadership and communications PP&Gs

Establish how information, status updates and decisions will be communicated Determine how and who will make key decisions

Define risks and develop mitigation strategy


Identify risks to program success Determine how risks will be mitigated Establish how additional risks that may arise later will be identified and mitigated

Establishing a CPO

Define program support

Identify support requirements for each CPO project, and lifecycle functions assigned the CPO Identify standard methods and procedure for project and program execution, reporting and management Develop process for the creation of additional standards as the need arises Decide if CPO should create a Disaster Recovery/Business Resumption Center of Excellence for critical technical knowledge that will be shared by multiple projects

Establishing a CPO

Define integration approach and methods

How will programs and projects that have interrelationships and dependencies be identified and integrated How well does the portfolio of programs and projects assigned to the CPO support the business goals and objectives of the corporation

CPAs Role in DRP, BCP & BIA


Disaster Recover Plan (DRP), Business Improvement Area (BIA) Information Systems Auditing & Control Association (ISACA) business continuity plan (BCP) COSO-Committee Of Sponsoring Organizations Control Objectives for Information and related Technology (CobIT) IT IS Information Technology Information Systems
Business Continuity Management (BCM) Business Continuity (BC)

Disaster Recovery and Business Continuity Planning in a University Environment


Mardecia Bell Ann Harris
Copyright Mardecia Bell/Ann Harris 2005. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.

The realization of a single point of failure with one data center for both the central academic and administrative IT environments, prompted NC State University to implement a disaster recovery strategy for communications and critical applications residing on the mainframe & open systems computing environment.

History/Timeline
1997 1999 2001 2002 2004 2005 Initiated with the administrative environment Mainframe environment recovery test Y2K - Business Continuity concept Acquired central repository software (LDRPS) Scheduled annual Mainframe recovery test Included communications & academic environment Expanded to include Enterprise Business Continuity/Disaster Recovery Planning Successful DR test of ERP systems Co-processing of production services began in Data Center II

Implementation Steps
Gain Sponsorship Establish Steering Committees Develop University Policy/Regulation Create DR Structure/Establish Staffing Market Program Establish Central Repository Review & Test Plans Regularly

Gain Sponsorship

Office of the President University System Chancellor Executive Management


Present your Business Case Identify the roles involved Provide Executive Summary of BC/DR Program Present Statement of Work and Project Plan

Add responsibilities to staff work plans

Establish Steering Committees

IT Steering Committee Business/Service Steering Committee Both committees are comprised of


Vice Chancellor/Vice Provost Level Representatives from Critical Areas of the Campus Ex Officio members from IT areas

Mission of IT Steering Committee Provide guidance and oversight for the combined academic and administrative Disaster Recovery Plan.

Policy/Regulations/Rule

Develop a Policy or Regulation to affirm the mandate and promote cooperation

Divide Campus Into Groupings


Space/Facilities Teaching and Academic Programs Academic IT Administrative IT Environmental Health and Public Safety Business Administration Research Programs Student Affairs Extension and Engagement

Resource Projections

Hire Full-Time Business Continuity and Disaster Recovery Personnel


Director of Business Continuity (plus 1 Business Analyst) Admin IT DR Coordinator (plus 1 Business Analyst) Academic DR Coordinator (part-time)

Add BC/DR responsibilities to work plan of existing staff Identify Coordinators for each business unit

Marketing
Present at campus departmental meetings Create a Website Utilize listserves Campus Newspaper Network with peer institutions Remain abreast of industry standards Attend conferences, workshops and seminars

Establish Central Information Repository

Continuous Implementation

Accomplishments

Disaster Recovery and Business Continuity Plan Risk Assessments for Critical Business Units Successful Mainframe Recovery Tests Designed and implemented infrastructure for central computing environment (academic & administrative) in secondary data center. Implementation of recovery strategies in secondary data center Creation of Administrative IT Disaster Recovery Unit

Illustration of Various DR Deployments


Fault-tolerant cluster (file and print services)
A Production
B Configuration

B Production
A Configuration

B Production A Production

Co-processing and load-balancing (ERP)


A Production A Production A Production

Distributed deployment (hosted systems)


A Production
A Development

A Production

Data replication (mainframe)


Server
Data

Server

Data

Server

Data

Enterprise Resource Planning (ERP) Deployment


Financial System Human Resources (Version 8.8) Student Information System (under construction) Campus
Users

DC I
Batch Server Batch Server Web Server Web Server Web Server Web Server

DC II
Batch Server Batch Server

Application Server

Application Server

Application Server

Application Server

Data
Storage Area Network

DB Server

DB Server

Summary and Future Steps


DC I
Novell Directory Services / Novell Email/Calendar Anti-SPAM File/Print, User Home

DC II
Novell Directory Services / Novell

Email/Calendar Anti-SPAM

File/Print, User Home

Citrix
Backup/vaulting

Citrix
Backup/vaulting

Hosted systems
Active Directory / Windows

Hosted systems

Data Data Data


Storage Area Network
Development Server

Active Directory / Windows

Data Data Data


Storage Area Network
Development Server

Infrastructure
Database Server

Infrastructure
Database Server

Web Server
ERP Application

Mainframe Server

Web Server
ERP Application

Mainframe Server

ERP Web

ERP DB Server

ERP Batch

ERP Web

ERP DB Server

ERP Batch

Administrative IT Disaster Recovery Unit Mission Ensure minimal risk of major disruptions to critical University systems and processes in the event that all or part of its computer operations are rendered inoperable.

Ensure timely recovery of infrastructure and services in the event of a disruption.


Ensure that business continuity plans are available and viable relative to its scenario.

Risk Management

Identify Mitigate Process Mapping

Risk Management
Risk Mitigation

Risk Assessment

Prioritize Actions Evaluate recommended Control Options Conduct Cost-Benefit Analysis Select Controls Assign Responsibility Develop Safeguard Implementation Plan Implement Selected Controls

System Characterization Threat Identification Vulnerability Identification Control Analysis Likelihood Determination Impact Analysis Risk Determination Control Recommendations Results Documentation

NIST SP 800-30

Process Mapping

Infrastructure
Total DR through distributed high availability Client Recovery Solutions Application Restoration Establish collaborative partnerships with other Universities

Client Recovery Solution(s)

Application Restoration

Event Time Scope of Impact

Infrastructure Software Hardware

Collaborative Partnerships

Vaulting

Readily accessible Secure Onsite Offsite

Critical Business Units


Advancement Services All Campus Network Budget Office College of Agriculture and Life Sciences - Personnel Office ComTech - Data Networking ComTech - Telecommunications Contracts and Grants Controller's Office Enterprise Application and Database Services EH&S - Business Continuity EH&S - Campus Police EH&S - Emergency Response EH&S - Environmental Affairs EH&S - Health and Safety EH&S - Industrial Hygiene EH&S - Insurance and Risk Management EH&S - Radiation Safety EH&S - Transportation EH&S - Waste Management Enrollment Management - Admissions Enrollment Management - Office of Scholarships & Financial Aid Enrollment Management - Registration and Records

Enterprise Technology Services and Support Facilities - Construction Management Facilities - Design and Construction Services Facilities - Operations Facilities - University Architect

Fire Protection Foundations Accounting & Investments HR - Benefits HR - Employment & Compensation HR - Human Resource Information Management HR - Payroll ITD - Business Services ITD - Computer Operations ITD - Computer Services ITD - Systems Libraries - Administration Materials Management - Materials Support Materials Management - Purchasing Materials Management - University Graphics Real Estate Student Health Services University Cashier's Office University Dining University Housing

Business Continuity Planning

Communication

Consistency in plan updating Training Partnering Emergency Communication standardization

Call Trees Mobile Devices Website Incident Command System Call Center Incident Report Plan

IT Disaster Categorization

Category 1: A single person or group in a Critical Business Unit (CBU) is unable to perform their critical functions Category 2: An entire CBU is unable to perform its critical functions Category 3: Multiple CBUs are unable to perform their critical functions Category 4: Non CBUs are not able to perform their critical functions Category 5: A wide spread event that impacts the entire University

Goals
Total DR through distributed high availability Standardized Emergency Communications Immediate Client Recovery Solutions Improved RTO

Control Objectives for Information and related Technology (CobIT)


Control Objective: Ensure Continuous Service Managing continuous service includes the ability to
recover from a disaster. Controls need to be in place to manage various disaster scenarios, from backup and recovery to full business continuity. Actions performed in this area align with the control activities and monitoring components of COSO-Committee Of
Sponsoring Organizations .

Deficiencies in this area could significantly impact financial reporting and disclosure of an entity.
For instance, the inability to recover from a disaster after year-end could prevent the organization from producing financial report that are supported with source documentation and details of transactions that make up financial reporting balances.

Ensure Continuous Service

IT management, in cooperation with business process owners, has established a business continuity framework that defines the roles, responsibilities, riskbased approach/methodology to be adopted, and the

COSO Component

Control Activities

Ensure Continuous Service


The business continuity plan identifies the critical application programs, third-party services, operating systems, personnel and supplies, data files, and time frames needed for recovery COSO Component

Control Activities

Ensure Continuous Service


The IT continuity plan is aligned with the overall business continuity plan to ensure consistency COSO Component

Control Activities

Ensure Continuous Service


The IT organization members responsible for disaster continuity plans have been trained regarding the procedures to be followed in case of an incident or a disaster COSO Component

Control Activities

Ensure Continuous Service


IT management has ensured that the continuity plan adequately tested, at least annually, and that any deficiencies are addressed within a reasonable period of time COSO Component

Control Activities

Ensure Continuous Service


Where new risks are identified, appropriate changes are made to the business continuity and disaster recovery plans COSO Component

Control Activities

Ensure Continuous Service


Offsite storage and recovery facilities are periodically assessed, at least annually, for viability, adequacy and security mechanisms COSO Component

Monitoring

Ensure Continuous Service


A business impact analysis assessment has been performed that considers the impact of systems failure on the financial reporting and disclosure process COSO Component

Control Activities

Ensure Continuous Service


Management has reviewed COSO Component the impact assessment Control Activities in determining the nature and extent of system recovery procedures necessary to support the timeliness of financial reporting and disclosure processes

IS Auditing Guideline

IS Auditing Guideline
1.4 Purpose of the Guideline 1.4.1 The primary objective of BCP is to protect the organization in the event that all or part of its operations and/or information systems services are rendered unusable and aid the organization to recover from the effect of such events. 1.4.2 The purpose of this guideline is to describe the recommended practices in

IS Auditing Guideline
1.4.3 The purpose of a BCP review is to identify, document, test and evaluate the controls and the associated risks relating to the process of BCP as implemented in an organization to achieve relevant control objectives. 1.4.4 These control objectives can be primary, directly related to BCP, and secondary, indirectly related to BCP

IS Auditing Guideline

1.4.5 This guideline provides guidance in applying IS auditing standard 060 (Performance of Audit Work) to obtain sufficient,reliable, relevant and useful evidence during review of the business continuity plan. The IS auditor should consider it in determining how to achieve implementation of the above standard, use professional judgment in its application and be prepared to justify any departure.

IS Auditing Guideline
1.5 Guideline Application 1.5.1 This guideline is applied when conducting a review of BCP from an IT perspective in an organization. 1.5.2 When applying this guideline, the IS auditor should consider its guidance in relation

to other relevant Information

Systems Auditing & Control Association

IS Auditing Guideline

2.1.3 Risk assessment followed by a

Business Improvement Area (BIA)


must be performed to assess the overall financial exposures and operational effects due to a disruption in business activities. The BIA should identify and prioritize the critical business processes supported by the IS infrastructure including, but not limited to, cost-benefit analysis of controls in different

IS Auditing Guideline

2.1.4 Disaster Recover Plan (DRP), a key component of BCP, refers to the technological aspect of BCPthe advance planning and preparations necessary to minimize loss and ensure continuity of critical business functions in the event of a disaster. DRP comprises consistent actions to be undertaken prior to, during and subsequent to a disaster.

IS Auditing Guideline

2.1.5 A sound DRP should be built from a comprehensive planning process, involving all of the enterprise. In today's interconnected economy, organizations are more vulnerable than ever to the possibility of technical difficulties disrupting business. Any disaster, from floods or fire to viruses any cyber terrorism, can affect the availability, integrity and confidentiality

BCM Model
Process Change Management Plans/Procedures Education Testing Risk Reduction Standby Facilities
Project

Testing/Review

Ongoing Process

Create Planning Organization Recovery Strategy Business Impact Analysis Risk Analysis

Policy

Organization

Resources

Scope

Business Continuity Planning Initiation


Source: Gartner Group

BCM Program Elements


Program Justification & Authorization
Risk Analysis (RA)

Recovery Plan Development


Business Recovery Organization (BRO)

Recovery Plan Maintenance


Plan Changes & Updates

BUSINESS

RECOVERY
PLAN

Process Identification

Recovery Location(s)
Switchable Telecomm. Network(s) Data/Records Backup & OffSite Storage
Budget & Policies

Training & Exercises-Prove Plans & Teams

Business Impact Analysis (BIA)

Test Systems, Software & Environments

RECOVERY CAPABILITY

Recovery Strategies

New Processes & Procedures

Recovery Actions, Tasks & Procedures

Updated Recovery Strategies

Commitment by Executive Management

Availability & Survivability Components


- Evacuation & Life-Safety Plans - Fire Detection, Alerting & Suppression - Physical & Logical Security - UPS & Emergency Generators - Redundant Equipment Components - Equipment Maintenance & Replacement - Redundant Power, Telecommunications and Water

CONTINUITY

BC Phases

Continuity Planning Policy

The process is generally initiated by issuing a continuity planning policy statement that:

Establishes and documents the basic planning requirements, standards, and guidelines that responsible offices will apply in developing, implementing, and executing their respective continuity plans. Outlines the organizational framework for continuity planning and execution Determines the scope (services, functions and resources subject to continuity planning requirement). Defines continuity planning objectives.

Continuity Planning Objectives

Organization-wide continuity planning policy objectives need to be:


Identified Prioritized, and Validated by senior management.

Policy objectives ensure that continuity plans focus on achieving essential mission requirements. Objectives establish the criteria for assessing and determining critical business functions.