Severe Weather throughout the South & Midwest

Posted by: Lars Anderson, Director, Public Affairs

As the risk for severe weather conditions continue throughout parts of the Midwest and South, we wanted to take a second to remind everyone in areas expected to see severe weather to take necessary precautions now. We encourage all individuals in areas where severe weather is expected to listen to NOAA Weather Radio, especially as we head into the evening and overnight, and local news for severe weather updates and warnings and to always follow the direction provided by their local officials.

Here are a few severe weather terms you should familiarize yourself with now:

  • Severe Thunderstorm Watch – Tells you when and where severe thunderstorms are likely to occur. Watch the sky and stay tuned to NOAA Weather Radio, commercial radio or television for information.
  • Severe Thunderstorm Warning – Issued when severe weather has been reported by spotters or indicated by radar. Warnings indicate imminent danger to life and property to those in the path of the storm. 
  • Tornado Watch – Tornadoes are possible. Remain alert for approaching storms. Watch the sky and stay tuned to NOAA Weather Radio, commercial radio or television for information. 
  • Tornado Warning – A tornado has been sighted or indicated by weather radar. Take shelter immediately. 

As weather conditions often change quickly, it’s important to stay updated on your local forecast conditions at weather.gov (or mobile.weather.gov on your mobile device).

If severe weather is expected in your area, keep in mind these safety tips:

  • Continue to monitor your battery-powered radio or television for emergency information. 
  • Do not touch downed power lines or objects in contact with downed lines. Report downed power lines and electrical hazards to the police and the utility company. 
  • Injury may result from the direct impact of a tornado or it may occur afterward when people walk among debris and enter damaged buildings. Wear sturdy shoes or boots, long sleeves and gloves when handling or walking on or near debris. 
  • After a tornado, be aware of possible structural, electrical or gas-leak hazards in your home. Contact your local city or county building inspectors for information on structural safety codes and standards. They may also offer suggestions on finding a qualified contractor to do work for you.

Visit www.ready.gov/severe-weather  for more tips on what to do if severe weather is expected in your area. You can also visit http://m.fema.gov for safety tips on your mobile device.

Sandy Hook Elementary School Added Security and Emergency Preparedness

BY VANESSA REMMERS (STAFF WRITER)

DINWIDDIE – Just days after a gunman killed 20 first-graders and six staff members at a Connecticut elementary school, security measures and emergency preparedness climbed higher on the Board of Supervisors’ priority list for the new year.

Supervisors on Tuesday looked to the broadened focus of the Dinwiddie Local Emergency Planning Committee as one way to prevent incidences like the Sandy Hook Elementary School shooting from happening in Dinwiddie.

A 1984 chemical disaster in Bhopal, India, that resulted in 8,000 deaths over two days prompted LEPC establishment throughout the country. The primary goal of the LEPC became educating residents about potential chemical hazards and developing emergency response plans.

But after years of natural disasters, fire, terrorism threats and school shootings, Dinwiddie’s LEPC has taken on additional roles. Since its establishment in 2007, Dinwiddie’s LEPC membership has also grown to include 22 state and local organizations, all members of the Board of Supervisors and the county administrator.

“The law says each locality has to have an LEPC, but there is nothing that’s prescribed or legislated as far as how far a local planning committee can go in terms of some emergency preparedness and the like,” said LEPC Chairman Ray Spicer.

This year, the LEPC has set their most ambitious goal yet for reaching out to the community with 30 presentations completed by June 2013.

“The premise was chemical hazards, and it has kind of evolved from there. They voted to be more inclusive of all those disasters. And why not … let’s face it, the incidences of major significance are happening more frequently,” Spicer said.

The LEPC presentation stresses knowledge of emergency plans for different local facilities as well as creating a disaster kit to help be prepared for a minimum of five days.

“We do want to get the message out that there is some amount of personal responsibility,” said County Administrator Kevin Massengill. “In thinking about the incident in Connecticut, parents need to know plans for their home, but also the school’s emergency plans.”

School Board Chairman William Haney said that school emergency plans will not see change, but there may be tightening of school security in the aftermath of the Sandy Hook Elementary School shooting.

“We are doing a survey and seeing what we can afford to do and what is cost-effective,” Haney said.

A security review several years ago implemented some remote cameras and increased lighting, but the School Board could not afford other desired changes. In the next year, Haney said that the School Board is looking at controlling school access and increasing security systems within the school’s physical facilities.

“I was in California, and a lot of the schools were open campuses. And I was thinking, oh my goodness, at least we don’t have that. They need a fence, some kind of physical barrier to keep outsiders out,” Haney said. “We are probably going to take steps in the fairly new future … to implement some kind of security system without being too restrictive.”

The Dinwiddie school system is not the only one of its kind that the Sandy Hook Elementary School shooting will effect. According to a statement, Prince George Police Chief Ed Frankenstein, School Superintendent Dr. Bobby Browder and County Administrator Percy Ashcraft are planning to incorporate additional training exercises for school personnel and public safety first responders. Officials are also in the process of re-evaluating existing school crisis policies.

On a legislative level, Dinwiddie Board members wondered what could be done to prevent incidences like the Sandy Hook shooting.

“With the young man in question, mental health and illness is a big issue. We know that there is a lot of mental illness within the jails, but there are so many walking the streets that we still don’t know about,” said Supervisor Brenda Ebron-Bonner.

Adam Lanza, the Sandy Hook Elementary School gunman, has been reported to suffer from Asperger’s Syndrome. Supervisor Daniel Lee felt that more than mental health awareness was needed to combat such shootings.

“We need to look at both gun control and mental health, and that comes from someone who owns guns,” Lee said.

 

Sandy Hook Elementary – The Voice – Tribute to Sandy Hook Elementary School

How numerous lives does your data have?

By Sameer Sule

SANDY- if you live in the northeast you will not forget her name for a long time. Every CEO, business owner and home owner was holding his/her breath as Sandy blew over us. I know I was. My house is surrounded by trees and every time a 50 mph gust came, I was praying to the higher power that the branches held up. Unfortunately a tree on the adjoining street couldn’t hold up and came down, knocking the power out from our neighborhood for a day. We were the lucky ones! Others in the NY and NJ area weren’t so lucky. 

The damage to people, property and businesses in NY and NJ  is unimaginable.  According to early estimates over 100,000 homes and businesses were completely destroyed or severely damaged. Many business owners have lost everything and may never recover. All their life’s work gone in a blink of an eye.  My prayers go out to people who have been disastrously affected by Sandy. Could they have done more to protect their businesses? In some cases the answer is no; we are powerless in front of mother nature and despite our best preparations things can go real bad. But in many cases, I am sure business owners are cursing themselves for not being better prepared. Most businesses do not have disaster recovery plans in place. Simple things like backing up data in a secure place, having redundant power supply such as a portable generator are not in place.Taking these simple steps can mean the difference between business recovery or business death. 
Events like Hurricane Sandy remind us how close we get to losing everything. Its just a matter of luck that one business or home gets destroyed and another doesn’t. Yet many of us thank our stars and move on without really considering what we can do to protect our family, home and business in the event of a disaster. We live in an information age and our life is practically a collection of bytes. Apart from a few hard copies most of our information is now stored in electronic format. Now is the time for those of us lucky enough to escape unscathed from Sandy to take a look at what is important in our lives and take steps to safeguard it. Do we have all our important documents in a safe place? How about all our electronic data- our files, family pictures, legal information, financial information? Have they been backed up online and can we recover them easily afterwards?
Knowing that we can recover our critical data after a disaster will make the recovery process relatively easier. So unless your data is a cat with nine lives, Sandy just used up one. How many more lives does your data have?

Sameer Sule is a Business Technology Consultant at Kinara Insights, a company providing contingency/disaster recovery planning services to doctors, dentists and healthcare practices. He helps his clients understand and use technology to reduce practice downtime, increase efficiency and improve quality of patient care.

Check out Sameer’s Google+ profile

 

Employing the Cloud to Make sure Enterprise Continuity

In today’s fast-paced society, companies of all sizes need affordable ways to deliver quality IT services reliably and continuously.  One of the key benefits of cloud computing, one that is also often overlooked, is how cloud computing can help ensure business continuity, as well as speedy disaster recovery.  Cloud hosting offers a low-cost disaster recovery and business continuity solution for small to midsize businesses and a more cost-effective DR alternative to larger, cost-conscious corporations.

With the cloud as your disaster recovery solution, you can use your in-house systems to run your core business and work with a cloud hosting provider for your business continuity and disaster protection.  With cloud hosting, your data and software are replicated automatically in the cloud, creating increased redundancy.  You don’t have to buy extra hardware or software to mirror your data center environment.  Instead, cloud servers can be easily partitioned to create multiple environments in the cloud, and these cloud servers can be spun up and configured in a matter of minutes.  In addition, with cloud computing and cloud storage, you only pay for the resources you use, so the cost is minimal.

A cloud-based disaster recovery/business continuity solution works well for any business with a low tolerance for downtime and data loss.  For example, most SMBs and larger businesses today fall into this category, rather than the local irrigation maintenance company, who may be able to survive a week without their data.  Businesses like hospitals have very minimal tolerance for downtime and data loss due to the urgency and sensitivity of their data.

With cloud hosting from a premier cloud hosting provider like Atlantic.Net, your data and applications reside in an offsite, secure data center facility with a backup, uninterrupted power supply, and dedicated support staff to support business continuity in any situation.

Applying Root Lead to Evaluation (RCA) to Organization Continuity

By Stacy Gardner, Avalution Consulting
Article originally posted on Avalution Consulting’s Blog

Though many business continuity standards emphasize the importance of tracking corrective actions to address identified issues, the recently published ISO 22301 (and previously BS 25999-2) also requires conducting a root cause analysis – looking not just at an issue, but its cause and how it can be prevented in the future.   Root cause analysis (RCA) is an approach that seeks to proactively prevent reoccurrences of the same adverse event or systems failure by tracing causal relationships of a failure to its most likely impactful origin, then putting measures in place to mitigate underlying causes to ultimately help prevent recurrence of the adverse event in the future.  While common in disciplines that deal with extreme precision and protection of life (e.g. quality and environmental health and safety), there’s no reason the business continuity discipline cannot benefit from a similar approach, particularly for practitioners looking to fully implement ISO 22301.  This article explains root cause analysis and identifies how organizations can benefit from implementing the concept in a business continuity context.

The concept of root cause analysis was originally developed by Sakichi Toyoda (the founder of Toyota Motor Corporation), who developed a process called the “Five Whys” to understand potential causes for problems beyond what was immediately obvious.  Root cause analysis became more formalized as it was integrated into several different fields as a performance driver, such as safety, quality, operations and information security.  In each of these areas, reactively responding to an issue was not enough – future issues needed to be prevented, and root cause analysis was the path to enable improved performance and risk mitigation by eliminating true causes, rather than just symptoms.  Incorporating root cause analysis into existing business continuity-related corrective action efforts could very well minimize the likelihood of future disruptive incidents and decrease recovery times.

At times, performing RCA is as easy as implementing the five whys, repeatedly asking “why” something occurred until it seems like you’ve reached the baseline cause of how failure occurred.  The key is a disciplined application of asking probing questions.  For example, analyzing the root cause of why an organization failed to meet a 24-hour recovery time objective for its SAP environment during a recent test could look something like this:

  1. Problem: IT recovery personnel failed to recover the organization’s SAP system within its recovery time objective of 24 hours during last week’s IT DR test   …. Why?
  2. IT recovery personnel said that SAN LUNs were not mapped correctly, which drastically delayed the start of restoration from disk   … Why?
  3. Vendor personnel responsible for prepping the equipment failed to execute the setup specifically to documented expectations   … Why?
  4. Vendor personnel indicated that the instructions seemed contradictory and did not provide the level of detail necessary to execute steps, so they used a basic default setup  …Why?
  5. Upon analysis, documentation did leave out several crucial steps necessary to enable this complex LUN mapping to occur   …Why was this not found earlier?
  6. When performing previous testing, personnel did not fully leverage existing plan documentation  … What changed this time?
  7. The individual responsible for documenting the plan and performing past testing was unavailable, and personnel who performed testing this time indicated they were not properly trained on use of the plans, nor were they instructed on how to escalate issues regarding recovery processes.

Although it might seem the root cause was reached, simply fixing the documentation does not ensure future documentation will be accurate.  Taking it deeper, the previous IT subject matter expert responsible for documenting the procedures often does onsite testing without using documentation, as he has extensive experience in this field and felt he could perform tasks more quickly by recovering based on experience as opposed to documented procedures.  Exploring the issue further revealed that newer personnel assigned to recovery tasks were far less experienced and had not yet received an appropriate level of awareness training.  Related to this point, the IT Director admitted he never required other personnel to validate documentation, as testing takes time away from production support and leveraging the “experts” in each phase lessens testing time.

Part of the solution to this could be to implement an expectation that all documented procedures be validated at least annually by another IT individual within a different area of expertise.  A second part of the solution could be to perform appropriate training up front (that emphasizes familiarity with plans and knowledge of escalation procedures) for both alternate internal individuals and any vendor resources responsible for plan execution.  Together, these efforts could help assure that all IT DR documentation can be effectively used by both internal and external resources during testing.

Although simple in theory, identifying the actual root cause and figuring out when you’ve gone far enough can be complex in practice.  To help understand primary root causes, you must repeatedly ask variants of “why” (and a few other probing questions), then look for the answer that seems most likely to have influenced the issue.  While there may not be a “hard science” to root cause analysis, the deeper you look for causes, the more likely you are to find issues to resolve.  In most cases, the biggest issue most organizations face is not exploring problems in the first place!  Our example demonstrated this problem in the recovery of SAP.  However, it’s likely this problem (the shortcuts) exists in other areas, and addressing the root cause could improve performance and recoverability elsewhere.

Variants of

Within business continuity, there are several areas that can commonly be identified as root causes for risk mitigation, response and recovery performance issues, although again, it requires tracing issues back further than most professionals choose to explore.  To properly integrate root cause analysis into continuous improvement activities, each issue should be adequately documented, including source of issue, a detailed description, an identification date, and it should also have a field to capture root cause analysis.  Rather than one individual trying to identify the root cause, business continuity personnel should organize and facilitate discussions that involve subject matter experts to whom issues may be assigned or who can provide insight on an issue, and then the group should seek to trace the issue back to its origin together.

Within business continuity, there are numerous root causes that can lead to a variety of issues or complications. The following table notes a few examples, together with likely root causes, though this is far from a complete list.  Also, it’s important to note that just like with tree roots that feed a tree’s growth, there could be more than one root cause that affects a system and results in a problem, so it is important to trace all potential paths of an issue’s origin back, rather than just pursuing one direct cause, to identify all influencing factors.

Problem and Potential Root Cause

Again, root cause analysis is not just solving one instance of a problem, it’s also seeking opportunities to prevent future occurrences of an issue.  Once the origin of an issue is identified, it’s important to evaluate all areas of the business to identify other at-risk areas and ensure proper risk mitigation measures are put in place.  A solution in one area may not necessarily be applicable to all other areas of an organization, but even if it’s not, the act of identifying other similar at-risk areas raises awareness and enables the organization to develop additional solutions that make sense and address these risks before they result in future issues or downtime.

As business continuity management systems continue to mature, root cause analysis will become a powerful tool for business continuity professionals to deeply examine the cause of issues and provide an opportunity to correct them before they occur again.

____________

Stacy Gardner, Managing Consultant
Avalution Consulting: Business Continuity Consulting

Our consulting team regularly publishes perspectives (shorter, independent articles) that touch on the trends currently affecting our profession and the strategic issues facing our clients. This is one of our most recent posts, but the full catalog of our perspectives – over 100 published since 2005 – can be accessed via our blog.

Mgt Summit RCA presentation

New psychological very first aid guide to strengthen humanitarian relief

16 August 2011 — Humanitarian emergencies – like earthquakes, extreme drought, or war – not only have an effect on people’s physical wellness but also their psychological and social well being and well-becoming. A new guide makes certain that greatest practices are regularly applied in humanitarian settings to enhance the mental health of disaster impacted populations.

Data Can Be Just as Safe (If Not Far more So) in the Cloud

According to a survey performed earlier this year by CIO.com, 54% of all IT security professionals cite cloud computing security as their top priority.  Another 32% cite security as a middle priority for them.  However, 85% of IT professionals are confident in their cloud provider’s ability to provide a secure environment for their data. 

Security has always been a concern when sensitive data is involved and this concern is heightened when it comes to cloud services outside of the corporate wall because no longer is it under the company’s direct supervision.  It is human nature to be afraid of the unknown, but the risks of cloud computing come with a plethora of benefits as well.  For example, the cloud offers greater flexibility, scalability, and agility, allowing IT staff to complete tasks in hours rather than weeks or months.

Depending on the size and nature of your business, entrusting your data to a cloud provider may be every bit as secure (if not more so) than your in-house security. This is because top-quality cloud hosting providers invest a significant amount of resources into security, much more than most small to medium-sized businesses can afford.  Also, most cloud providers make an effort to always keep up with the latest in security so that they can provide the best service to their customers. 

Atlantic.Net, a privately-held leading cloud hosting provider, offers a secure and robust platform that is routinely and systematically inspected with focus on control objectives in the areas of organizational structure, governance, administration, physical/environmental controls, and physical/logical security.

Hurricane Checklist

Hurricane Checklist: What to do BEFORE, DURING and AFTER a Disaster

  1. Know Your Risk. Understand your hurricane checklist and check your hurricane evacuation level and FEMA flood maps to determine if your business location is vulnerable to storm surge or freshwater flooding. Have your building(s) inspected by a licensed professional to find out if your workplace is vulnerable to hurricane force winds and what is recommended to retrofit.
  2. Take the Necessary Precautions. If a storm threatens, secure your building. Cover windows. Cover and move equipment/ furniture to a secured area.
  3. Always Protect Your Data With Backup Files. If dependent on data processing, consider an alternate site. Make provisions for alternate communications and power.
  4. Make Plans To Work With Limited Cash, No Water, Sewer or Power For Two Weeks. Store emergency supplies at the office.
  5. Protect Your Employees. Employee safety comes first! Prepare, distribute and discuss your business hurricane plan for recovery. Consider providing shelter to employees and their families and helping employees with supplies after the storm. Establish a rendezvous point and time for employees in case damage is severe and communications are disrupted. Establish a call-down procedure for warning and post-storm communications. Provide photo ID’s and a letter of authorization to enter the building.
  6. Contact Your Customers & Suppliers and share your communications and recovery plan in advance. Prepare a list of vendors to provide disaster recovery services.
  7. Review Your Insurance Coverage. Have your business appraised at least every five years. Inventory, document and photograph equipment, supplies and workplace. Have copies of insurance policies and customer service/home numbers. Obtain Business Interruption Insurance. Consider “Accounts Receivable” and “Valuable Papers” coverage and “Income Destruction” insurance. If you have a Business Owners Protection Package (BOPP), check the co-insurance provisions. Remember: Flood damage requires separate coverage and is NOT covered under other insurance programs.
  8. After the Storm. Use caution before entering your business. Check for power lines, gas leaks and structural damage. If any electrical equipment is wet, contact an electrician. Prepare loss information for insurance claims and get independent estimates of damages. Take pictures before cleanup. Minimize additional damage.

Hurricane Checklist:  As the Storm Approaches

  1. Listen For Weather Updates on local stations and on NOAA Weather Radio. Don’t trust rumors and stay turned to the latest information.
  2. Check Your Disaster Supplies Kit at work. Obtain any needed items. Contact employees and instruct them to do the same.
  3. Instruct Employees To Refill Prescriptions and to maintain at least a two week supply during hurricane season.
  4. Clear Property or tie down any items that could become flying missiles in high winds such as lawn furniture, potted plants, and trashcans.
  5. Protect Windows and Glass Doors. If you do not have impact resistant windows, install shutters or plywood to cover glass. Brace double entry and garage doors at the top and bottom.
  6. Fill Fleet Cars and Equipment Gas Tanks and check oil, water and tires. Gas pumps don’t operate without electricity.
  7. Secure Your Boat Early. Drawbridges will be closed to boat traffic after an evacuation order is issued.
  8. Obtain Sufficient Cash for business operations recognizing that banks and ATMs won’t be in operation without electricity and few stores will be able to accept credit cards or personal checks.
  9. Discuss the Business Recovery Plan With Employees to ensure that communications are up-to-date and employees are aware of their responsibilities after the storm.
  10. Back Up All Computer Data and ensure that back up is stored in a safe place off-site.
  11. Close The Office in sufficient time to allow employees to secure their homes, obtain needed supplies and evacuate if necessary.

Hurricane Checklist:  No Evacuation

If your facility is outside the evacuation area and NOT a work trailer, your facility
may be able to remain open or serve as shelter for employees.

  1. Protect Windows and Doors and secure the facility.
  2. Clean Containers For Drinking Water and sinks for storing cleaning water. Plan on three gallons per person, per day for all uses.
  3. Offering Your Facility As Shelter To Employees and their families who live in vulnerable areas or mobile homes will have benefits to your operations but may also have some liability. Check first with legal representation.
  4. Check the Disaster Supplies Kit. Make sure to have at least a two-week supply of non-perishable foods. Don’t forget a non-electric can opener. Instruct any employees to augment the supply with a kit of their own.
  5. During the Storm, everyone should stay inside and away from windows, skylights and glass doors. Find a safe area in the facility (an interior reinforced room, closet or bathroom on the lower floor) if the storm becomes severe.
  6. Wait For Official Word That The Danger Is Over. Don’t be fooled by the storm’s calm “eye.”
  7. If Flooding Threatens Your Facility, electricity should be turned off at the main breaker.
  8. If Your Facility Loses Power, turn off major appliances, such as the air conditioner and water heater to reduce damage.

Hurricane Checklist: Securing Your Facility

Stay tuned to the local radio and television stations for emergency broadcasts. If
ordered to evacuate, do so immediately.

  1. Ensure Important Documents, files, back up tapes, emergency contact information, etc., are taken to a safe location. See “GO BOX.”
  2. Let Employees, Customers and Vendors know your continuity plans. Make sure your employees have a safe ride.
  3. Turn Off electricity, water and gas.
  4. Lock windows and doors.

Hurricane Checklist: After the Storm

After a disaster, the business may be without power, water, food or any of the services we rely on. Immediate response may not be possible, so residents and businesses must be prepared to be self-reliant for several weeks.

RE-ENTRY

  1. Be Patient. Access to affected areas will be controlled. You won’t be able to return to your facility until search and rescue operations are complete and safety hazards, such as downed trees and power lines are cleared. It may take up to three days for emergency crews to reach your area. It may take 2-4 weeks before utilities are restored. On barrier islands, it could take much longer.
  2. Stay Tuned To Local Radio stations for advice and instructions about emergency medical aid, food and other forms of assistance.
  3. Security Operations Will Include Checkpoints. It will be critical for you and your employees to have valid identification with your current local address as well as something to prove your employment and need to get back into the area. It is recommended that businesses contact the county emergency management agency and local jurisdiction to determine what specifically would be required.
  4. Avoid Driving. Roads will have debris that will puncture tires. Don’t add to the congestion of relief workers, supply trucks, law enforcement, etc.
SAFETY CHECKLIST
  1. Avoid Downed or Dangling Utility Wires. Metal fences may have been “energized” by fallen wires. Be especially careful when cutting or clearing fallen trees. They may have power lines tangled in them.
  2. Beware of Snakes, insects or animals driven to higher ground by floods.
  3. Enter Your Facility With Caution. Open windows and doors to ventilate and dry the building.
  4. If There Has Been Flooding, have an electrician inspect the office before turning on the breaker.
  5. Be Careful With Fire. Do not strike a match until you are sure there are no breaks in gas lines. Avoid candles. Use battery-operated flashlights and lanterns instead.
  6. Use Your Telephone Only For Emergencies to keep lines open for emergency communications.

 

Spring Planet 2013 Features Exclusive Senior Advanced Track

Spring World 2013 Features Exclusive Senior Advanced Track

Senior practitioners are invited to attend our one-day track on Monday at Spring World 2013. This exclusive track, How To Achieve True Enterprise Resiliency, will feature General Session 3 in the morning (attended by everyone)and then a separate breakout track in the afternoon. The one-day track will conclude with an exclusive “Meet the Expert’ reception in the evening. There is no additional cost for this new track.

Reserve your space in this exclusive track! It is an excellent way to receive top information from some of the industry’s most experienced C-level+ executives. Learn from those who make the decisions and implement the programs! To find out qualification requirements, email [email protected]

The featured breakouts are:

SA-1: Driving an Enterprise Resiliency Partnership
Monday, March 18, 1:30 – 2:30 p.m.

SA-2: Moving your Organization from Continuity to Resiliency: Lessons Learned from Wall Street
Monday, March 18, 2:45 – 3:45 p.m.

SA-3: Roundtable Discussion: Gaining Perspective on Business Continuity Challenges and Trends
Monday, March 18, 4:15 – 5:15 p.m.

 

  • Spring World Homepage
  • Exhibiting/Sponsoring
  • Justification Kit

Applying Root Trigger Analysis (RCA) to Organization Continuity

By Stacy Gardner, Avalution Consulting
Article originally posted on Avalution Consulting’s Blog

Though many business continuity standards emphasize the importance of tracking corrective actions to address identified issues, the recently published ISO 22301 (and previously BS 25999-2) also requires conducting a root cause analysis – looking not just at an issue, but its cause and how it can be prevented in the future.   Root cause analysis (RCA) is an approach that seeks to proactively prevent reoccurrences of the same adverse event or systems failure by tracing causal relationships of a failure to its most likely impactful origin, then putting measures in place to mitigate underlying causes to ultimately help prevent recurrence of the adverse event in the future.  While common in disciplines that deal with extreme precision and protection of life (e.g. quality and environmental health and safety), there’s no reason the business continuity discipline cannot benefit from a similar approach, particularly for practitioners looking to fully implement ISO 22301.  This article explains root cause analysis and identifies how organizations can benefit from implementing the concept in a business continuity context.

The concept of root cause analysis was originally developed by Sakichi Toyoda (the founder of Toyota Motor Corporation), who developed a process called the “Five Whys” to understand potential causes for problems beyond what was immediately obvious.  Root cause analysis became more formalized as it was integrated into several different fields as a performance driver, such as safety, quality, operations and information security.  In each of these areas, reactively responding to an issue was not enough – future issues needed to be prevented, and root cause analysis was the path to enable improved performance and risk mitigation by eliminating true causes, rather than just symptoms.  Incorporating root cause analysis into existing business continuity-related corrective action efforts could very well minimize the likelihood of future disruptive incidents and decrease recovery times.

At times, performing RCA is as easy as implementing the five whys, repeatedly asking “why” something occurred until it seems like you’ve reached the baseline cause of how failure occurred.  The key is a disciplined application of asking probing questions.  For example, analyzing the root cause of why an organization failed to meet a 24-hour recovery time objective for its SAP environment during a recent test could look something like this:

  1. Problem: IT recovery personnel failed to recover the organization’s SAP system within its recovery time objective of 24 hours during last week’s IT DR test   …. Why?
  2. IT recovery personnel said that SAN LUNs were not mapped correctly, which drastically delayed the start of restoration from disk   … Why?
  3. Vendor personnel responsible for prepping the equipment failed to execute the setup specifically to documented expectations   … Why?
  4. Vendor personnel indicated that the instructions seemed contradictory and did not provide the level of detail necessary to execute steps, so they used a basic default setup  …Why?
  5. Upon analysis, documentation did leave out several crucial steps necessary to enable this complex LUN mapping to occur   …Why was this not found earlier?
  6. When performing previous testing, personnel did not fully leverage existing plan documentation  … What changed this time?
  7. The individual responsible for documenting the plan and performing past testing was unavailable, and personnel who performed testing this time indicated they were not properly trained on use of the plans, nor were they instructed on how to escalate issues regarding recovery processes.

Although it might seem the root cause was reached, simply fixing the documentation does not ensure future documentation will be accurate.  Taking it deeper, the previous IT subject matter expert responsible for documenting the procedures often does onsite testing without using documentation, as he has extensive experience in this field and felt he could perform tasks more quickly by recovering based on experience as opposed to documented procedures.  Exploring the issue further revealed that newer personnel assigned to recovery tasks were far less experienced and had not yet received an appropriate level of awareness training.  Related to this point, the IT Director admitted he never required other personnel to validate documentation, as testing takes time away from production support and leveraging the “experts” in each phase lessens testing time.

Part of the solution to this could be to implement an expectation that all documented procedures be validated at least annually by another IT individual within a different area of expertise.  A second part of the solution could be to perform appropriate training up front (that emphasizes familiarity with plans and knowledge of escalation procedures) for both alternate internal individuals and any vendor resources responsible for plan execution.  Together, these efforts could help assure that all IT DR documentation can be effectively used by both internal and external resources during testing.

Although simple in theory, identifying the actual root cause and figuring out when you’ve gone far enough can be complex in practice.  To help understand primary root causes, you must repeatedly ask variants of “why” (and a few other probing questions), then look for the answer that seems most likely to have influenced the issue.  While there may not be a “hard science” to root cause analysis, the deeper you look for causes, the more likely you are to find issues to resolve.  In most cases, the biggest issue most organizations face is not exploring problems in the first place!  Our example demonstrated this problem in the recovery of SAP.  However, it’s likely this problem (the shortcuts) exists in other areas, and addressing the root cause could improve performance and recoverability elsewhere.

Variants of

Within business continuity, there are several areas that can commonly be identified as root causes for risk mitigation, response and recovery performance issues, although again, it requires tracing issues back further than most professionals choose to explore.  To properly integrate root cause analysis into continuous improvement activities, each issue should be adequately documented, including source of issue, a detailed description, an identification date, and it should also have a field to capture root cause analysis.  Rather than one individual trying to identify the root cause, business continuity personnel should organize and facilitate discussions that involve subject matter experts to whom issues may be assigned or who can provide insight on an issue, and then the group should seek to trace the issue back to its origin together.

Within business continuity, there are numerous root causes that can lead to a variety of issues or complications. The following table notes a few examples, together with likely root causes, though this is far from a complete list.  Also, it’s important to note that just like with tree roots that feed a tree’s growth, there could be more than one root cause that affects a system and results in a problem, so it is important to trace all potential paths of an issue’s origin back, rather than just pursuing one direct cause, to identify all influencing factors.

Problem and Potential Root Cause

Again, root cause analysis is not just solving one instance of a problem, it’s also seeking opportunities to prevent future occurrences of an issue.  Once the origin of an issue is identified, it’s important to evaluate all areas of the business to identify other at-risk areas and ensure proper risk mitigation measures are put in place.  A solution in one area may not necessarily be applicable to all other areas of an organization, but even if it’s not, the act of identifying other similar at-risk areas raises awareness and enables the organization to develop additional solutions that make sense and address these risks before they result in future issues or downtime.

As business continuity management systems continue to mature, root cause analysis will become a powerful tool for business continuity professionals to deeply examine the cause of issues and provide an opportunity to correct them before they occur again.

____________

Stacy Gardner, Managing Consultant
Avalution Consulting: Business Continuity Consulting 

Our consulting team regularly publishes perspectives (shorter, independent articles) that touch on the trends currently affecting our profession and the strategic issues facing our clients. This is one of our most recent posts, but the full catalog of our perspectives – over 100 published since 2005 – can be accessed via our blog.