Disclaimer
The following project outline is provided solely as a guide. It is only intended to be “one example” of requirements for a disaster recovery project plan. It is not, by any stretch of the imagination, the only way to set up a project plan.
If you are new to disaster recovery planning, make sure that you research the subject thoroughly before embarking on a disaster recovery planning project. Consider engaging a consultant (internal or external to your organization) to help you in your project planning effort. Disaster recovery planning is not a two-month project, neither is it a project that once completed, you can forget about. An effective disaster recovery planning plan is a live recovery plan. The plan must be maintained current and tested/exercised regularly.
- Program Description
- Pre-Planning Activities (Project Initiation)
- Vulnerability Assessment and General Definition
- Requirements
- Business Impact Analysis
- Detailed Definition of Requirements
- Plan Development
- Testing Program
- Maintenance Program
- Initial Plan Testing and Plan Implementation
- Planning Scope and Plan Objectives
- Project Organization and Staffing
- Project Control
- Schedule of Deliverables
- Resource Requirements
Page Contents
Disaster Recovery Planning
The primary objective of a Business Resumption Plan is to enable an organization to survive a disaster and to reestablish normal business operations. In order to survive, the organization must assure that critical operations can resume normal processing within a reasonable time frame. Therefore, the goals of the Business Resumption Plan should be to:
- Identify weaknesses and implement a disaster prevention program;
- minimize the duration of a serious disruption to business operations;
- facilitate effective co-ordination of recovery tasks; and
- reduce the complexity of the recovery effort.
Historically, the data processing function alone has been assigned the responsibility for providing contingency disaster recovery planning. Frequently, this has led to the development of recovery plans to restore computer resources in a manner that is not fully responsive to the needs of the business supported by those resources. Contingency planning is a business issue rather than a data processing issue. In today’s environment, the effects of long-term operations outage may have a catastrophic impact. The development of a viable recovery strategy must, therefore, be a product not only of the provider’s of the organization’s data processing, communications and operations centre services, but also the users of those services and management personnel who have responsibility for the protection of the organization’s assets.
The methodology used to develop the plans, emphasize the following key points:
- Providing management with a comprehensive understanding of the total effort required to develop and maintain an effective recovery plan;
- Obtaining commitment from appropriate management to support and participate in the effort;
- Defining recovery requirements from the perspective of business functions;
- Documenting the impact of an extended loss to operations and key business functions;
- Focusing appropriately on disaster prevention and impact minimization, as well as orderly recovery;
- Selecting project teams that ensure the proper balance required for plan development;
- Developing a contingency plan that is understandable, easy to use and easy to maintain; and
- Defining how contingency planning considerations must be integrated into ongoing business planning and system development processes in order for the plan to remain viable over time.
The successful and cost effective completion of such a project requires the close cooperation of management from all areas of Information Systems as well as business areas supported by Information Systems. Senior personnel from Information Systems and user areas must be significantly involved throughout the project for the disaster recovery planning process to be successful.
In closing, it is important to keep in mind that the aim of the disaster recovery planning process is to:
- assess existing vulnerabilities;
- implement disaster avoidance and prevention procedures;
- develop a comprehensive plan that will enable the organization to react appropriately and in a timely manner if disaster strikes.
Since disaster recovery planning is a very complex and labor intensive process, it therefore requires redirection of valuable technical staff and information processing resources as well as appropriate funding. In order to minimize the impact such an undertaking would have on scarce resources, the project for the development and implementation of disaster recovery planning and business resumption plans should be part of the organization’s normal disaster recovery planning activities.
The proposed project methodology consists of eight separate phases, as described below.
Phase 1 – Disaster Recovery Planning Pre-Planning Activities (Project Initiation)
Phase 1 is used to obtain an understanding of the existing and projected computing environment of the organization. This enables the project team to: refine the scope of the project and the associated work program; develop project schedules; and identify and address any issues that could have an impact on the delivery and the success of the project.
During this phase a Steering Committee should be established. The committee should have the overall responsibility for providing direction and guidance to the Project Team. The committee should also make all decisions related to the disaster recovery planning effort. The Project Manager should work with the Steering Committee in finalizing the detailed work plan and developing interview schedules for conducting the Security Assessment and the Business Impact Analysis.
Two other key deliverables of this phase are: the development of a policy to support the recovery programs; and an awareness program to educate management and senior individuals who will be required to participate in the project.
Phase 2 – Vulnerability Assessment and General Definition of Requirements
Security and control within an organization is a continuing concern. It is preferable, from an economic and business strategy perspective, to concentrate on activities that have the effect of reducing the possibility of disaster occurrence, rather than concentrating primarily on minimizing impact of an actual disaster. This phase addresses measures to reduce the probability of occurrence by monitoring.
This phase will include the following key tasks:
- A thorough Security Assessment of the computing and communications environment including personnel practices; physical security; operating procedures; backup and contingency planning; systems development and maintenance; database security; data and voice communications security; systems and access control software security; insurance; security planning and administration; application controls; and personal computers.
- The Security Assessment will enable the project team to improve any existing emergency plans and disaster prevention measures and to implement required emergency plans and disaster prevention measures where none exist.
- Present findings and recommendations resulting from the activities of the Security Assessment to the Steering Committee so that corrective actions can be initiated in a timely manner.
- Define the scope of the planning effort.
- Analyze, recommend and purchase recovery planning and maintenance software required to support the development of the plans and to maintain the plans current following implementation.
- Develop a Plan Framework.
- Assemble Project Team and conduct awareness sessions.
Phase 3 – Business Impact Assessment (BIA)
A Business Impact Assessment (BIA) of all business units that are part of the business environment enables the project team to: identify critical systems, processes and functions; assess the economic impact of incidents and disasters that result in a denial of access to systems services and other services and facilities; and assess the “pain threshold,” that is, the length of time business units can survive without access to systems, services and facilities.
The BIA Report should be presented to the Steering Committee. This report identifies critical service functions and the timeframes in which they must be recovered after interruption. The BIA Report should then be used as a basis for identifying systems and resources required to support the critical services provided by information processing and other services and facilities.
Phase 4 – Detailed Definition of Requirements
During this phase, a profile of recovery requirements is developed. This profile is to be used as a basis for analyzing alternative recovery strategies. The profile is developed by identifying resources required to support critical functions identified in Phase 3. This profile should include hardware (mainframe, data and voice communications and personal computers), software (vendor supplied, in-house developed, etc.), documentation (DP, user, procedures), outside support (public networks, DP services, etc.), facilities (office space, office equipment, etc.) and personnel for each business unit. Recovery Strategies will be based on short term, intermediate term and long term outages.
Another key deliverable of this phase is the definition of the plan scope, objectives and assumptions.
Phase 5 – Plan Development
During this phase, recovery plans components are defined and plans are documented. This phase also includes the implementation of changes to user procedures, upgrading of existing data processing operating procedures required to support selected recovery strategies and alternatives, vendor contract negotiations (with suppliers of recovery services) and the definition of Recovery Teams, their roles and responsibilities. Recovery standards are also be developed during this phase.
Phase 6 – Testing/Exercising Program
The plan Testing/Exercising Program is developed during this phase. Testing/exercising goals are established and alternative testing strategies are evaluated. Testing strategies tailored to the environment should be selected and an on-going testing program should be established.
Phase 7 – Maintenance Program
Maintenance of the plans is critical to the success of an actual recovery. The plans must reflect changes to the environments that are supported by the plans. It is critical that existing change management processes are revised to take recovery plan maintenance into account. In areas where change management does not exist, change management procedures will be recommended and implemented. Many recovery software products take this requirement into account.
Phase 8 – Initial Plan Testing and Implementation
Once plans are developed, initial tests of the plans are conducted and any necessary modifications to the plans are made based on an analysis of the test results.
Specific activities of this phase include the following:
- Defining the test purpose/approach;
- Identifying test teams;
- Structuring the test;
- Conducting the test;
- Analyzing test results; and
- Modifying the plans as appropriate.
The approach taken to test the plans depends, in large part, on the recovery strategies selected to meet the disaster recovery planning requirements of the organization. As the recovery strategies are defined, specific testing procedures should be developed to ensure that the written plans are comprehensive and accurate.
DISASTER RECOVERY PLANNING SCOPE AND PLAN OBJECTIVES
The primary objective of recovery planning is to enable an organization to survive a disaster and to continue normal business operations. In order to survive, the organization must assure that critical operations can resume/continue normal processing. Throughout the recovery effort, the plan establishes clear lines of authority and prioritizes work efforts. The key objectives of the contingency plan should be to:
- Provide for the safety and well-being of people on the premises at the time of a disaster;
- Continue critical business operations;
- Minimize the duration of a serious disruption to operations and resources (both information processing and other resources);
- Minimize immediate damage and losses;
- Establish management succession and emergency powers;
- Facilitate effective co-ordination of recovery tasks;
- Reduce the complexity of the recovery effort;
- Identify critical lines of business and supporting functions;
Although statistically the probability of a major disaster is remote, the consequences of an occurrence could be catastrophic, both in terms of operational impact and public image. Management appreciates the implications of an occurrence, therefore, it should assign on-going responsibility for recovery planning to an employee dedicated to this essential service.
Management must make a decision to undertake a project that satisfy the following objectives:
- Determine vulnerability to significant service interruptions in the Data Centre and business facilities and define preventive measures that may be taken to minimize the probability and impact of interruptions;
- Identify and analyze the economic, service, public image and other implications of extended service interruptions in the Data Centre and other business facilities;
- Determine immediate, intermediate and extended term recovery needs and resource requirements;
- Identify the alternatives and select the most cost effective approaches for providing backup operations capability and timely service restoration; and
- Develop and implement contingency plans that address both immediate and longer-term needs for the Data Centre and other business facilities.
PROJECT ORGANIZATION AND STAFFING
The project team organization is designed to maximize the flexibility needed to deal with the implementation of a plan in the most efficient manner possible. As explained earlier in this document, disaster recovery and business resumption planning is a complex and labour intensive program. A key factor in the successful development and implementation of recovery and resumption programs in other organizations is the dedication of a full-time resource to recovery/business continuity planning.
Recovery plans should be treated as living documents. Both the information processing and the business environments are constantly changing and becoming more integrated and complex. Recovery plans must keep pace with these changes. Continuous testing/exercising of plans is essential if the organization wants to ensure that recovery capability is maintained in such an environment. The organization also must ensure that staff with recovery responsibilities are prepared to execute the plans.
This cannot be achieved without a full-time resource with responsibility for: maintaining plans; coordinating components and full plan tests; training staff with recovery responsibilities; and updating plans to reflect changes to the information processing and business environments.
Steering Committee
The Steering Committee should include representatives from key areas of the organization:
- Information Systems
- Technology Support
- Systems Development
- Network and Operations Services
- Voice Communications
- Key Business Units
Project Team
The composition of the Project Team may vary depending on the environments and business units for which plans are developed. It is important to note that the managers of environments and business units for which plans are developed will be responsible for the maintenance and testing of their respective plans. However, the Person/unit responsible for recovery/continuity planning should retain the role of co-ordinator of testing activities, major plan revisions and maintainer of the Master Plan.
The Core Project Team is automatically part of other project teams. Internal Audit should be invited to be part of all teams. The managers represented on the various teams may choose to recommend other senior individuals in their area to represent them or to join specific teams where their expertise will be required for the development of the plans.
Suggested Core Project Team Composition
- Project Manager
- Computer and Network Operations
- Systems Support
- Voice, Network and Communications
Suggested Information Systems/Technology Support Team Composition
- Network & Communications
- Facilities Management
- Network Development and Support
- Database Administration
- Information Systems Security
- Operations
- Network Support
- Network Implementation
Business Units Team
The members of the various Business Unit teams will be different for each Business Unit.
PROJECT CONTROL
The management and control for this project should be supported by project management software. The software should be used for scheduling of personnel resources to specific tasks and identification of end deliverables and their target completion dates. Recovery Planning software implemented during Phase 2 of the project will be used to document the plans.
During Phase 1 activities, detail work plans for Data Processing and user personnel identifying tasks and responsibilities along with their associated start and completion dates will be developed.
SCHEDULE OF DELIVERABLES
The following is a schedule of deliverables by phase that will be developed and delivered as part of this project.
Phase/Deliverable
Phase 1 – Pre-Planning Activities (Project Initiation)
- Revised Detail Work Plan
- Interview Schedules
- Policy Statement
- Recovery Planning Awareness Program
Phase 2 – Vulnerability Assessment
- Security Assessment Report
- Scope of Planning Effort
- Plan Framework
- Recommendation on Recovery Planning Software
- Implementation of Recovery Planning Software
Phase 3 – Business Impact Analysis
- Business Impact Assessment Report
Phase 4 – Detailed Definition of Requirements
- Recovery Needs Profile
- Plan Scope, Objectives and Assumptions
Phase 5 – Plan Development
- Data Centre Recovery Plan
- Prototype Business Units Resumption Plan
- Recovery Standards
Phase 6 – Testing Program
- Testing Goals
- Testing Strategies
- Testing Procedures
Phase 7 – Maintenance Program
- Maintenance Procedures
- Change Management Recommendations
Phase 8 – Initial Plan Testing and Implementation
- Initial Test Report
- Implementation
RESOURCE REQUIREMENTS
Organization who have tried to develop disaster and business resumption plans without dedicating the required resources to the effort have been largely unsuccessful in implementing effective recovery plans. Some organizations, after spending time and money developing recovery plans, have failed in maintaining their recovery capability. This is mostly due to a lack of commitment to keep their plans current or to do regular testing of recovery capabilities.
It is therefore essential, that management is committed to the development, implementation and maintenance of this program, that required resources are freed up during the development cycle and that a resource be dedicated to the on-going maintenance of the program.
Resource requirements can be divided into three categories, namely:
- Personnel
- Capital Costs
- On-going costs
Capital Costs
A large volume of data will be gathered during various stages of the plan development. This data will be essential to the plan and has to be maintained on an on-going basis. There are several products on the market that have been designed to support the development, testing and maintenance of recovery plans. These products are evaluated during Phase 2 of the project. The final cost depends on the product chosen.
Other one-time costs may include the purchase of equipment related to establishing redundancy in the area of voice and data communications, data processing equipment (including personal computers), data processing emergency support and backup equipment (such as UPS, diesel generators, etc.) and business equipment (photocopiers, FAX machines, etc.).
On-Going Costs
On-going costs include rentals, services contracts and maintenance contracts. Some of these costs are hard to estimate ahead of time but could include the following:
- Shell/Hot Site Contract
- Recovery Planning Software Maintenance Contract
- Service and maintenance fees relating to recovery and backup equipment and services