Disaster Recovery Planning Process

By Geoffrey H. Wold

This is the first of a three-part series that describes the planning process related to disaster recovery. Based on the various considerations addressed during the planning phase, the process itself and related methodology can be equally as beneficial as the final written plan.
Most businesses depend heavily on technology and automated systems, and their disruption for even a few days could cause severe financial loss and threaten survival.
The continued operations of an organization depend on management’s awareness of potential disasters, their ability to develop a plan to minimize disruptions of critical functions and the capability to recovery operations expediently and successfully.
A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during and after a disaster. The plan should be documented and tested to ensure the continuity of operations and availability of critical resources in the event of a disaster.
The primary objective of disaster recovery planning is to protect the organization in the event that all or part of its operations and/or computer services are rendered unusable. Preparedness is the key. The disaster recovery planning process should minimize the disruption of operations and ensure some level of organizational stability and an orderly recovery after a disaster.
Other objectives of disaster recovery planning include:

 

  • Providing a sense of security
  • Minimizing risk of delays
  • Guaranteeing the reliability of standby systems
  • Providing a standard for testing the plan.
  • Minimizing decision-making during a disaster

The three-part diagram illustrates the disaster recovery planning process. The methodology is described below.

1. Obtain Top Management Commitment

Top management must support and be involved in the development of the disaster recovery planning process. Management should be responsible for coordinating the disaster recovery plan and ensuring its effectiveness within the organization.
Adequate time and resources must be committed to the development of an effective plan. Resources could include both financial considerations and the effort of all personnel involved.

2. Establish a planning committee

A planning committee should be appointed to oversee the development and implementation of the plan. The planning committee should include representatives from all functional areas of the organization. Key committee members should include the operations manager and the data processing manager. The committee also should define the scope of the plan.

3. Perform a risk assessment

The planning committee should prepare a risk analysis and business impact analysis that includes a range of possible disasters, including natural, technical and human threats.
Each functional area of the organization should be analyzed to determine the potential consequence and impact associated with several disaster scenarios. The risk assessment process should also evaluate the safety of critical documents and vital records.
Traditionally, fire has posed the greatest threat to an organization. Intentional human destruction, however, should also be considered. The plan should provide for the “worst case” situation: destruction of the main building.
It is important to assess the impacts and consequences resulting from loss of information and services. The planning committee should also analyze the costs related to minimizing the potential exposures.

4. Establish priorities for processing and operations

The critical needs of each department within the organization should be carefully evaluated in such areas as:

• Functional operations
• Key personnel
• Information
• Processing Systems
• Service
• Documentation
• Vital records
• Policies and procedures

Processing and operations should be analyzed to determine the maximum amount of time that the department and organization can operate without each critical system.
Critical needs are defined as the necessary procedures and equipment required to continue operations should a department, computer center, main facility or a combination of these be destroyed or become inaccessible.
A method of determining the critical needs of a department is to document all the functions performed by each department. Once the primary functions have been identified, the operations and processes should be ranked in order of priority: Essential, important and non-essential.

5. Determine Recovery Strategies

The most practical alternatives for processing in case of a disaster should be researched and evaluated. It is important to consider all aspects of the organization such as:

• Facilities
• Hardware
• Software
• Communications
• Data files
• Customer services
• User operations
• MIS
• End-user systems
• Other processing operations

Alternatives, dependent upon the evaluation of the computer function, may include:

• Hot sites
• Warm sites
• Cold sites
• Reciprocal agreements
• Two data centers
• Multiple computers
• Service centers
• Consortium arrangement
• Vendor supplied equipment
• Combinations of the above

Written agreements for the specific recovery alternatives selected should be prepared, including the following special considerations:

• Contract duration
• Termination conditions
• Testing
• Costs
• Special security procedures
• Notification of systems changes
• Hours of operation
• Specific hardware and other equipment required for processing
• Personnel requirements
• Circumstances constituting an emergency
• Process to negotiate extension of service
• Guarantee of compatibility
• Availability
• Non-mainframe resource requirements
• Priorities
• Other contractual issues

6. Perform Data Collection

Recommended data gathering materials and documentation includes:

• Backup position listing
• Critical telephone numbers
• Communications Inventory
• Distribution register
• Documentation inventory
• Equipment inventory
• Forms inventory
• Insurance Policy inventory
• Main computer hardware inventory
• Master call list
• Master vendor list
• Microcomputer hardware and software inventory
• Notification checklist
• Office supply inventory
• Off-site storage location inventory
• Software and data files backup/retention schedules
• Telephone inventory
• Temporary location specifications
• Other materials and documentation

It is extremely helpful to develop pre-formatted forms to facilitate the data gathering process.

7. Organize and document a written plan

An outline of the plan’s contents should be prepared to guide the development of the detailed procedures. Top management should review and approve the proposed plan. The outline can ultimately be used for the table of contents after final revision. Other benefits of this approach are that it:
• Helps to organize the detailed procedures
• Identifies all major steps before the writing begins
• Identifies redundant procedures that only need to be written once.
• Provides a road map for developing the procedures

A standard format should be developed to facilitate the writing of detailed procedures and the documentation of other information to be included in the plan. This will help ensure that the disaster plan follows a consistent format and allows for ongoing maintenance of the plan. Standardization is especially important if more than one person is involved in writing the procedures.
The plan should be thoroughly developed, including all detailed procedures to be used before, during and after a disaster. It may not be practical to develop detailed procedures until backup alternatives have been defined.
The procedures should include methods for maintaining and updating the plan to reflect any significant internal, external or systems changes. The procedures should allow for a regular review of the plan by key personnel within the organization.
The disaster recovery plan should be structured using a team approach. Specific responsibilities should be assigned to the appropriate team for each functional area of the company.
There should be teams responsible for administrative functions, facilities, logistics, user support, computer backup, restoration and other important areas in the organization.
The structure of the contingency organization may not be the same as the existing organization chart. The contingency organization is usually structures with teams responsible for major functional areas such as:

• Administrative functions
• Facilities
• Logistics
• User support
• Computer backup
• Restoration
• Other important areas

The management team is especially important because it coordinates the recovery process. The team should assess the disaster, activate the recovery plan, and contact team managers.
The management team also oversees, documents and monitors the recovery process. Management team members should be the final decision-makers in setting priorities, policies and procedures.
Each team has specific responsibilities that must be completed to ensure successful execution of the plan. The teams should have an assigned manager and an alternate in case the team manager is not available. Other team members should also have specific assignments where possible.

8. Develop testing criteria and procedures

It is essential that the plan be thoroughly tested and evaluated on a regular basis (at least annually). Procedures to test the plan should be documented. The tests will provide the organization with the assurance that all necessary steps are included in the plan. Other reasons for testing include:

• Determining the feasibility and compatibility of backup facilities and procedures
• Identifying areas in the plan that need modification
• Providing training to the team managers and team members
• Demonstrating the ability of the organization to recover
• Providing motivation for maintaining and updating the disaster recovery plan

9. Test the Plan

After testing procedures have been completed, an initial test of the plan should be performed by conducting a structured walk-through test. The test will provide additional information regarding any further steps that may need to be included, changes in procedures that are not effective, and other appropriate adjustments. The plan should be updated to correct any problems identified during the test. Initially, testing of the plan should be done in sections and after normal business hours to minimize disruptions to the overall operations of the organization.

Types of tests include:
• Checklist tests
• Simulation tests
• Parallel tests
• Full interruption tests

10. Approve the plan

Once the disaster recovery plan has been written and tested, the plan should be approved by top management. It is top management’s ultimate responsibility that the organization has a documented and tested plan.
Management is responsible for:
• Establishing policies, procedures and responsibilities for comprehensive contingency planning.
• Reviewing and approving the contingency plan annually, documenting such reviews in writing

If the organization receives information processing from a service bureau, management must also:

• Evaluate the adequacy of contingency plans for its service bureau
• Ensure that its contingency plan is compatible with its service bureau’s plan

Conclusion

Disaster recovery planning involves more than off-site storage or backup processing. Organizations should also develop written, comprehensive disaster recovery plans that address all the critical operations and functions of the business. The plan should include documented and tested procedures, which, if followed, will ensure the ongoing availability of critical resources and continuity of operations.
The probability of a disaster occurring in an organization is highly uncertain. A disaster plan, however, is similar to liability insurance: it provides a certain level of comfort in knowing that if a major catastrophe occurs, it will not result in financial disaster. Insurance alone is not adequate because it may not compensate for the incalculable loss of business during the interruption or the business that never returns.
Other reasons to develop a comprehensive disaster recovery plan include:

• Minimizing potential economic loss.
• Decreasing potential exposures
• Reducing the probability of occurrence
• Reducing disruptions to operations
• Ensuring organizational stability
• Providing an orderly recovery
• Minimizing insurance premiums
• Reducing reliance on certain key individuals
• Protecting the assets of the organization
• Ensuring the safety of personnel and customers
• Minimizing decision-making during a disastrous event
• Minimizing legal liability

The second part of this series will describe specific methods for organizing and writing a comprehensive disaster recovery plan.

 

This is the second of a three-part series that describes specific methods for organizing and writing a comprehensive disaster recovery plan. The first part of this series described the process for developing a thorough disaster recovery plan.
A well-organized disaster recovery plan will directly affect the recovery capabilities of the organization. The contents of the plan should follow a logical sequence and be written in a standard and understandable format.
Effective documentation and procedures are extremely important in a disaster recovery plan. Considerable effort and time are necessary to develop a plan. However, most plans are difficult to use and become outdated quickly. Poorly written procedures can be extremely frustrating. Well-written plans reduce the time required to read and understand the procedures and therefore, result in a better chance of success if the plan has to be used. Well-written plans are also brief and to the point.

Standard Format

A standard format for the procedures should be developed to facilitate the consistency and conformity throughout the plan. Standardization is especially important if several people write the procedures. Two basic formats are used to write the plan: Background information and instructional information.
Background information should be written using indicative sentences while the imperative style should be used for writing instructions. Indicative sentences have a direct subject-verb-predicate structure, while imperative sentences start with a verb (the pronoun “you” is assumed) and issue directions to be followed.
Recommended background information includes:
• Purpose of the procedure
• Scope of the procedure (e.g. location, equipment, personnel, and time associated with what the procedure encompasses)
• Reference materials (i.e., other manuals, information, or materials that should be consulted)
• Documentation describing the applicable forms that must be used when performing the procedures
• Authorizations listing the specific approvals required
• Particular policies applicable to the procedures
Instructions should be developed on a preprinted form. A suggested format for instructional information is to separate headings common to each page from details of procedures. Headings should include:
• Subject category number and description
• Subject subcategory number and description
• Page number
• Revision number
• Superseded date

Writing Methods

Procedures should be clearly written. Helpful methods for writing the detailed procedures include:
• Be specific. Write the plan with the assumption it will be implemented by personnel completely unfamiliar with the function and operation.
• Use short, direct sentences, and keep them simple. Long sentences can overwhelm or confuse the reader.
• Use topic sentences to start each paragraph.
• Use short paragraphs. Long paragraphs can be detrimental to reader comprehension.
• Present one idea at a time. Two thoughts normally require two sentences.
• Use active voice verbs in present tense. Passive voice sentences can be lengthy and may be misinterpreted.
• Avoid jargon.
• Use position titles (rather than personal names of individuals) to reduce maintenance and revision requirements.
• Avoid gender nouns and pronouns that may cause unnecessary revision requirements.
• Develop uniformity in procedures to simplify the training process and minimize exceptions to conditions and actions.
• Identify events that occur in parallel and events that must occur sequentially.
• Use descriptive verbs. Nondescriptive verbs such as “make” and “take” can cause procedures to be excessively wordy. Examples of descriptive verbs are:
Acquire Count Log
Activate Create Move
Advise Declare Pay
Answer Deliver Print
Assist Enter Record
Back Up Explain Replace
Balance File Report
Compare Inform Review
Compile List Store
Contact Locate Type

Scope

Although most disaster recovery plans address only data processing related activities, a comprehensive plan will also include areas of operation outside data processing.
The plan should have a broad scope if it is to effectively address the many disaster scenarios that could affect the organization.
A “worst case scenario” should be the basis for developing the plan. The worst case scenario is the destruction of the main or primary facility
Because the plan is written based on this premise, less critical situations can be handled by using only the needed portions of the plan, with minor ( if any) alterations required.

Planning Assumptions

Every disaster recovery plan has a foundation of assumptions on which the plan is based. The assumptions limit the circumstances that the plan addresses.
The limits define the magnitude of the disaster the organization is preparing to address. The assumptions can often be identified by asking the following questions:

• What equipment/facilities have been destroyed?
• What is the timing of the disruption?
• What records, files and materials were protected from destruction?
• What resources are available following the disaster:

– Staffing?
– Equipment?
– Communications?
– Transportation?
– Hot site/alternate site?

Following is a list of typical planning assumptions to be considered in writing the disaster recovery plan:

• The main facility of the organization has been destroyed
• Staff is available to perform critical functions defined within the plan
• Staff can be notified and can report to the backup site(s) to perform critical processing, recovery and reconstruction activities
• Off-site storage facilities and materials survive
• The disaster recovery plan is current
• Subsets of the overall plan can be used to recover from minor interruptions
• An alternate facility is available
• An adequate supply of critical forms and supplies are stored off-site, either at an alternate facility or off-site storage
• A backup site is available for processing the organization’s work
• The necessary long distance and local communications lines are available to the organization
• Surface transportation in the local area is possible
• Vendors will perform according to their general commitments to support the organization in a disaster

This list of assumptions is not all inclusive, but is intended as a thought provoking process in the beginning stage of planning.
The assumptions themselves will often dictate the makeup of the plan; therefore, management should carefully review them for appropriateness.

Team Approach

The structure of the contingency organization may not be the same as the existing organization chart.
The team approach is used in developing a plan as well as recovery from a disaster. The teams have specific responsibilities and allow for a smooth recovery.
Within each team a manager and an alternate should be designated. These persons provide the necessary leadership and direction in developing the sections of the plan and carrying out the responsibilities at the time of a disaster.

Potential teams include:
• Management team
• Business recovery team
• Departmental recovery team
• Computer recovery team
• Damage assessment team
• Security team
• Facilities support team
• Administrative support team
• Logistics support team
• User support team
• Computer backup team
• Off-site storage team
• Software team
• Communications team
• Applications team
• Computer restoration team
• Human relations team
• Marketing/Customer relations team
• Other teams

Various combinations of the above teams are possible depending on the size and requirements of the organization. The number of members assigned to a specific team can also vary depending on need.

Summary

The benefits of effective disaster recovery procedures include:

• Eliminating confusion and errors
• Providing training materials for new employees
• Reducing reliance on certain key individuals and functions

In the next issue, the third part of this series will describe specific methods and materials that can expedite the data collection process.

 

This is the third part of a series that describe specific methods for organizing and writing a comprehensive disaster recovery plan. The first part of this series described the process for developing a thorough disaster recovery plan. The second article described specific methods for organizing and writing a comprehensive disaster recovery plan. This article presents particular methods and materials that can expedite the data collection process.
Disaster recovery is a concern of the entire organization, not just data processing. To develop an effective plan, all departments should be involved. Within all departments the critical needs should be identified. Critical needs include all information and equipment needed in order to continue operations should a department be destroyed or become inaccessible.

DETERMINING CRITICAL NEEDS

To determine the critical needs of the organization, each department should document all the functions performed within that department. An analysis over a period of two weeks to one month can indicate the principle functions performed inside and outside the department, and assist in identifying the necessary data requirements for the department to conduct its daily operations satisfactorily. Some of the diagnostic questions that can be asked include:

1. If a disaster occurred, how long could the department function without the existing equipment and departmental organization?
2. What are the high priority tasks including critical manual functions and processes in the department? How often are these tasks performed, e.g., daily, weekly, monthly, etc.?
3. What staffing, equipment, forms and supplies would be necessary to perform the high priority tasks?
4. How would the critical equipment, forms and supplies be replaced in a disaster situation?
5. Does any of the above information require long lead times for replacement?
6. What reference manuals and operating procedure manuals are used in the department? How would these be replaced in the event of a disaster?
7. Should any forms, supplies, equipment, procedure manuals or reference manuals from the department be stored in an off-site location?
8. Identify the storage and security of original documents. How would this information be replaced in the event of a disaster? Should any of this information be in a more protected location?
9. What are the current microcomputer backup procedures? Have the backups been restored? Should any critical backup copies be stored off-site?
10. What would the temporary operating procedures be in the event of a disaster?
11.How would other departments be affected by an interruption in the department?
12.What effect would a disaster at the main computer have on the department?
13.What outside services/vendors are relied on for normal operation?
14.Would a disaster in the department jeopardize any legal requirements for reporting?
15.Are job descriptions available and current for the department?
16. Are department personnel cross-trained?
17. Who would be responsible for maintaining the department’s contingency plan?
18. Are there other concerns related to planning for disaster recovery?

The critical needs can be obtained in a consistent manner by using a User Department Questionnaire. As illustrated, the questionnaire focuses on documenting critical activities in each department and identifying related minimum requirements for staff, equipment, forms, supplies, documentation, facilities and other resources.

SETTING PRIORITIES ON PROCESSING AND OPERATIONS

Once the critical needs have been documented, management can set priorities within departments for the overall recovery of the organization. Activities of each department could be given priorities in the following manner
• Essential activities – A disruption in service exceeding one day would jeopardize seriously the operation of the organization.
• Recommended activities – a disruption of service exceeding one week would jeopardize seriously the operation of the organization.
• Nonessential activities – This information would be convenient to have but would not detract seriously from the operating capabilities if it were missing.

RECORD RETENTION
GUIDELINES

A systematic approach to records management is an important part of a comprehensive disaster recovery plan. Additional benefits include:
• Reduced storage costs.
• Expedited customer service.
• Federal and state regulatory compliance.

Records are not only retained as proof of financial transactions, but also to verify compliance with legal and regulatory requirements. In addition, businesses must satisfy retention requirements as an organization and employer. These records are used for independent examination and verification of sound business practices. Federal and State requirements for records retention must be analyzed by each organization individually. Each organization should have its legal counsel approve its own retention schedule.
As well as retaining records, the organization should be aware of the specific record salvage techniques and procedures to follow for different types of media. Potential types of media include:
• Paper
• Magnetic
• Microfilm/Microfiche
• Image
• Photographic
• Other

OTHER DATA GATHERING TECHNIQUES

Other information that can be compiled by using preformatted data gathering forms include:
• Equipment Inventory to document all critical equipment required by the organization. If the recovery lead time is longer than acceptable, a backup alternative should be considered.
• Master vendor List to identify vendors that provide critical goods and services.
• Office Supply Inventory to record the critical office supply inventory to facilitate replacement. If an item has a longer lead time than is acceptable, a larger quantity should be stored off-site.
• Forms Inventory Listing to document all forms used by the organization to facilitate replacement. This list should include computer forms and non-computer forms.
• Documentation Inventory Listing to record inventory of critical documentation manuals and materials. It is important to determine whether backup copies of the critical documentation are available. They may be stored on disk, obtained from branch offices, available from outside sources, vendors and other sources.
• Critical Telephone Numbers to list critical telephone numbers, contact names, and specific services for organizations and vendors important in the recovery process.
• Notification Checklist to document responsibilities for notifying personnel, vendors and other parties. Each team should be assigned specific parties to contact.
• Master Call List to document employee telephone numbers.
• Backup Position Listing to identify backup employees for each critical position within the organization. Certain key personnel may not be available in a disaster situation; therefore, backups for each critical position should be identified.
• Specifications for Off-Site Location to document the desired/required specifications of a possible alternative site for each existing location.
• Off-Site Storage Location Inventory to document all materials stored off-site.
• Hardware and Software Inventory Listing to document the inventory of hardware and software.
• Telephone Inventory Listing to document existing telephone systems used by the organization.
• Insurance Policies Listing to document insurance policies in force.
• Communications Inventory Listing to document all components of the communications network.

There are several PC-based disaster recovery planning systems that can be used to facilitate the data gathering process and to develop the plan. Typically, these systems emphasize either a database application or a word processing application. The most comprehensive systems use a combination of integrated applications.
Some PC-based systems include a sample plan that can be tailored to the unique requirements of each organization. Other materials can include instructions which address the disaster recovery related issues that the organization must consider during the planning process such as disaster prevention, insurance analysis, record retention and backup strategies. Specialized consulting may also be available with the system to provide on-site installation, training and consulting on various disaster recovery planning issues.
The benefits of using a PC-based system for developing a disaster recovery plan include:

• A systematic approach to the planning process.
• Pre-designed methodologies.
• An effective method for maintenance.
• A significant reduction in time and effort in the planning and development process.
• A proven technique.

Recently, other PC-based tools have been developed to assist with the process, including disaster recovery planning tutorial systems and software to facilitate the testing process.

CONCLUSION

Disaster recovery planning involves more than off-site storage or backup processing. Organizations should also develop written, comprehensive disaster recovery plans that address all the critical operations and functions of the business. The plan should include documented and tested procedures, which, if followed, will ensure the ongoing availability of critical resources and continuity of operations.
The benefits of developing a comprehensive disaster recovery plan include:
• Minimizing potential economic loss.
• Decreasing potential exposures.
• Reducing the probability of occurrence.
• Reducing disruptions to operations.
• Ensuring organizational stability.
• Providing an orderly recovery.
• Minimizing insurance premiums.
• Reducing reliance on certain key individuals.
• Protecting the assets of the organization.
• Ensuring the safety of personnel and customers.
• Minimizing decision-making during a disastrous event.
• Minimizing legal liability.

 

Geoffrey H. Wold is the National Director of Information Systems and Technology Consulting for the CPA/Consulting firm of McGladrey & Pullen. He has written four books on disaster recovery planning.