Employing the Cloud to Make sure Enterprise Continuity

In today’s fast-paced society, companies of all sizes need affordable ways to deliver quality IT services reliably and continuously.  One of the key benefits of cloud computing, one that is also often overlooked, is how cloud computing can help ensure business continuity, as well as speedy disaster recovery.  Cloud hosting offers a low-cost disaster recovery and business continuity solution for small to midsize businesses and a more cost-effective DR alternative to larger, cost-conscious corporations.

With the cloud as your disaster recovery solution, you can use your in-house systems to run your core business and work with a cloud hosting provider for your business continuity and disaster protection.  With cloud hosting, your data and software are replicated automatically in the cloud, creating increased redundancy.  You don’t have to buy extra hardware or software to mirror your data center environment.  Instead, cloud servers can be easily partitioned to create multiple environments in the cloud, and these cloud servers can be spun up and configured in a matter of minutes.  In addition, with cloud computing and cloud storage, you only pay for the resources you use, so the cost is minimal.

A cloud-based disaster recovery/business continuity solution works well for any business with a low tolerance for downtime and data loss.  For example, most SMBs and larger businesses today fall into this category, rather than the local irrigation maintenance company, who may be able to survive a week without their data.  Businesses like hospitals have very minimal tolerance for downtime and data loss due to the urgency and sensitivity of their data.

With cloud hosting from a premier cloud hosting provider like Atlantic.Net, your data and applications reside in an offsite, secure data center facility with a backup, uninterrupted power supply, and dedicated support staff to support business continuity in any situation.

Applying Root Lead to Evaluation (RCA) to Organization Continuity

By Stacy Gardner, Avalution Consulting
Article originally posted on Avalution Consulting’s Blog

Though many business continuity standards emphasize the importance of tracking corrective actions to address identified issues, the recently published ISO 22301 (and previously BS 25999-2) also requires conducting a root cause analysis – looking not just at an issue, but its cause and how it can be prevented in the future.   Root cause analysis (RCA) is an approach that seeks to proactively prevent reoccurrences of the same adverse event or systems failure by tracing causal relationships of a failure to its most likely impactful origin, then putting measures in place to mitigate underlying causes to ultimately help prevent recurrence of the adverse event in the future.  While common in disciplines that deal with extreme precision and protection of life (e.g. quality and environmental health and safety), there’s no reason the business continuity discipline cannot benefit from a similar approach, particularly for practitioners looking to fully implement ISO 22301.  This article explains root cause analysis and identifies how organizations can benefit from implementing the concept in a business continuity context.

The concept of root cause analysis was originally developed by Sakichi Toyoda (the founder of Toyota Motor Corporation), who developed a process called the “Five Whys” to understand potential causes for problems beyond what was immediately obvious.  Root cause analysis became more formalized as it was integrated into several different fields as a performance driver, such as safety, quality, operations and information security.  In each of these areas, reactively responding to an issue was not enough – future issues needed to be prevented, and root cause analysis was the path to enable improved performance and risk mitigation by eliminating true causes, rather than just symptoms.  Incorporating root cause analysis into existing business continuity-related corrective action efforts could very well minimize the likelihood of future disruptive incidents and decrease recovery times.

At times, performing RCA is as easy as implementing the five whys, repeatedly asking “why” something occurred until it seems like you’ve reached the baseline cause of how failure occurred.  The key is a disciplined application of asking probing questions.  For example, analyzing the root cause of why an organization failed to meet a 24-hour recovery time objective for its SAP environment during a recent test could look something like this:

  1. Problem: IT recovery personnel failed to recover the organization’s SAP system within its recovery time objective of 24 hours during last week’s IT DR test   …. Why?
  2. IT recovery personnel said that SAN LUNs were not mapped correctly, which drastically delayed the start of restoration from disk   … Why?
  3. Vendor personnel responsible for prepping the equipment failed to execute the setup specifically to documented expectations   … Why?
  4. Vendor personnel indicated that the instructions seemed contradictory and did not provide the level of detail necessary to execute steps, so they used a basic default setup  …Why?
  5. Upon analysis, documentation did leave out several crucial steps necessary to enable this complex LUN mapping to occur   …Why was this not found earlier?
  6. When performing previous testing, personnel did not fully leverage existing plan documentation  … What changed this time?
  7. The individual responsible for documenting the plan and performing past testing was unavailable, and personnel who performed testing this time indicated they were not properly trained on use of the plans, nor were they instructed on how to escalate issues regarding recovery processes.

Although it might seem the root cause was reached, simply fixing the documentation does not ensure future documentation will be accurate.  Taking it deeper, the previous IT subject matter expert responsible for documenting the procedures often does onsite testing without using documentation, as he has extensive experience in this field and felt he could perform tasks more quickly by recovering based on experience as opposed to documented procedures.  Exploring the issue further revealed that newer personnel assigned to recovery tasks were far less experienced and had not yet received an appropriate level of awareness training.  Related to this point, the IT Director admitted he never required other personnel to validate documentation, as testing takes time away from production support and leveraging the “experts” in each phase lessens testing time.

Part of the solution to this could be to implement an expectation that all documented procedures be validated at least annually by another IT individual within a different area of expertise.  A second part of the solution could be to perform appropriate training up front (that emphasizes familiarity with plans and knowledge of escalation procedures) for both alternate internal individuals and any vendor resources responsible for plan execution.  Together, these efforts could help assure that all IT DR documentation can be effectively used by both internal and external resources during testing.

Although simple in theory, identifying the actual root cause and figuring out when you’ve gone far enough can be complex in practice.  To help understand primary root causes, you must repeatedly ask variants of “why” (and a few other probing questions), then look for the answer that seems most likely to have influenced the issue.  While there may not be a “hard science” to root cause analysis, the deeper you look for causes, the more likely you are to find issues to resolve.  In most cases, the biggest issue most organizations face is not exploring problems in the first place!  Our example demonstrated this problem in the recovery of SAP.  However, it’s likely this problem (the shortcuts) exists in other areas, and addressing the root cause could improve performance and recoverability elsewhere.

Variants of

Within business continuity, there are several areas that can commonly be identified as root causes for risk mitigation, response and recovery performance issues, although again, it requires tracing issues back further than most professionals choose to explore.  To properly integrate root cause analysis into continuous improvement activities, each issue should be adequately documented, including source of issue, a detailed description, an identification date, and it should also have a field to capture root cause analysis.  Rather than one individual trying to identify the root cause, business continuity personnel should organize and facilitate discussions that involve subject matter experts to whom issues may be assigned or who can provide insight on an issue, and then the group should seek to trace the issue back to its origin together.

Within business continuity, there are numerous root causes that can lead to a variety of issues or complications. The following table notes a few examples, together with likely root causes, though this is far from a complete list.  Also, it’s important to note that just like with tree roots that feed a tree’s growth, there could be more than one root cause that affects a system and results in a problem, so it is important to trace all potential paths of an issue’s origin back, rather than just pursuing one direct cause, to identify all influencing factors.

Problem and Potential Root Cause

Again, root cause analysis is not just solving one instance of a problem, it’s also seeking opportunities to prevent future occurrences of an issue.  Once the origin of an issue is identified, it’s important to evaluate all areas of the business to identify other at-risk areas and ensure proper risk mitigation measures are put in place.  A solution in one area may not necessarily be applicable to all other areas of an organization, but even if it’s not, the act of identifying other similar at-risk areas raises awareness and enables the organization to develop additional solutions that make sense and address these risks before they result in future issues or downtime.

As business continuity management systems continue to mature, root cause analysis will become a powerful tool for business continuity professionals to deeply examine the cause of issues and provide an opportunity to correct them before they occur again.

____________

Stacy Gardner, Managing Consultant
Avalution Consulting: Business Continuity Consulting

Our consulting team regularly publishes perspectives (shorter, independent articles) that touch on the trends currently affecting our profession and the strategic issues facing our clients. This is one of our most recent posts, but the full catalog of our perspectives – over 100 published since 2005 – can be accessed via our blog.

Mgt Summit RCA presentation

Data Can Be Just as Safe (If Not Far more So) in the Cloud

According to a survey performed earlier this year by CIO.com, 54% of all IT security professionals cite cloud computing security as their top priority.  Another 32% cite security as a middle priority for them.  However, 85% of IT professionals are confident in their cloud provider’s ability to provide a secure environment for their data. 

Security has always been a concern when sensitive data is involved and this concern is heightened when it comes to cloud services outside of the corporate wall because no longer is it under the company’s direct supervision.  It is human nature to be afraid of the unknown, but the risks of cloud computing come with a plethora of benefits as well.  For example, the cloud offers greater flexibility, scalability, and agility, allowing IT staff to complete tasks in hours rather than weeks or months.

Depending on the size and nature of your business, entrusting your data to a cloud provider may be every bit as secure (if not more so) than your in-house security. This is because top-quality cloud hosting providers invest a significant amount of resources into security, much more than most small to medium-sized businesses can afford.  Also, most cloud providers make an effort to always keep up with the latest in security so that they can provide the best service to their customers. 

Atlantic.Net, a privately-held leading cloud hosting provider, offers a secure and robust platform that is routinely and systematically inspected with focus on control objectives in the areas of organizational structure, governance, administration, physical/environmental controls, and physical/logical security.

Applying Root Trigger Analysis (RCA) to Organization Continuity

By Stacy Gardner, Avalution Consulting
Article originally posted on Avalution Consulting’s Blog

Though many business continuity standards emphasize the importance of tracking corrective actions to address identified issues, the recently published ISO 22301 (and previously BS 25999-2) also requires conducting a root cause analysis – looking not just at an issue, but its cause and how it can be prevented in the future.   Root cause analysis (RCA) is an approach that seeks to proactively prevent reoccurrences of the same adverse event or systems failure by tracing causal relationships of a failure to its most likely impactful origin, then putting measures in place to mitigate underlying causes to ultimately help prevent recurrence of the adverse event in the future.  While common in disciplines that deal with extreme precision and protection of life (e.g. quality and environmental health and safety), there’s no reason the business continuity discipline cannot benefit from a similar approach, particularly for practitioners looking to fully implement ISO 22301.  This article explains root cause analysis and identifies how organizations can benefit from implementing the concept in a business continuity context.

The concept of root cause analysis was originally developed by Sakichi Toyoda (the founder of Toyota Motor Corporation), who developed a process called the “Five Whys” to understand potential causes for problems beyond what was immediately obvious.  Root cause analysis became more formalized as it was integrated into several different fields as a performance driver, such as safety, quality, operations and information security.  In each of these areas, reactively responding to an issue was not enough – future issues needed to be prevented, and root cause analysis was the path to enable improved performance and risk mitigation by eliminating true causes, rather than just symptoms.  Incorporating root cause analysis into existing business continuity-related corrective action efforts could very well minimize the likelihood of future disruptive incidents and decrease recovery times.

At times, performing RCA is as easy as implementing the five whys, repeatedly asking “why” something occurred until it seems like you’ve reached the baseline cause of how failure occurred.  The key is a disciplined application of asking probing questions.  For example, analyzing the root cause of why an organization failed to meet a 24-hour recovery time objective for its SAP environment during a recent test could look something like this:

  1. Problem: IT recovery personnel failed to recover the organization’s SAP system within its recovery time objective of 24 hours during last week’s IT DR test   …. Why?
  2. IT recovery personnel said that SAN LUNs were not mapped correctly, which drastically delayed the start of restoration from disk   … Why?
  3. Vendor personnel responsible for prepping the equipment failed to execute the setup specifically to documented expectations   … Why?
  4. Vendor personnel indicated that the instructions seemed contradictory and did not provide the level of detail necessary to execute steps, so they used a basic default setup  …Why?
  5. Upon analysis, documentation did leave out several crucial steps necessary to enable this complex LUN mapping to occur   …Why was this not found earlier?
  6. When performing previous testing, personnel did not fully leverage existing plan documentation  … What changed this time?
  7. The individual responsible for documenting the plan and performing past testing was unavailable, and personnel who performed testing this time indicated they were not properly trained on use of the plans, nor were they instructed on how to escalate issues regarding recovery processes.

Although it might seem the root cause was reached, simply fixing the documentation does not ensure future documentation will be accurate.  Taking it deeper, the previous IT subject matter expert responsible for documenting the procedures often does onsite testing without using documentation, as he has extensive experience in this field and felt he could perform tasks more quickly by recovering based on experience as opposed to documented procedures.  Exploring the issue further revealed that newer personnel assigned to recovery tasks were far less experienced and had not yet received an appropriate level of awareness training.  Related to this point, the IT Director admitted he never required other personnel to validate documentation, as testing takes time away from production support and leveraging the “experts” in each phase lessens testing time.

Part of the solution to this could be to implement an expectation that all documented procedures be validated at least annually by another IT individual within a different area of expertise.  A second part of the solution could be to perform appropriate training up front (that emphasizes familiarity with plans and knowledge of escalation procedures) for both alternate internal individuals and any vendor resources responsible for plan execution.  Together, these efforts could help assure that all IT DR documentation can be effectively used by both internal and external resources during testing.

Although simple in theory, identifying the actual root cause and figuring out when you’ve gone far enough can be complex in practice.  To help understand primary root causes, you must repeatedly ask variants of “why” (and a few other probing questions), then look for the answer that seems most likely to have influenced the issue.  While there may not be a “hard science” to root cause analysis, the deeper you look for causes, the more likely you are to find issues to resolve.  In most cases, the biggest issue most organizations face is not exploring problems in the first place!  Our example demonstrated this problem in the recovery of SAP.  However, it’s likely this problem (the shortcuts) exists in other areas, and addressing the root cause could improve performance and recoverability elsewhere.

Variants of

Within business continuity, there are several areas that can commonly be identified as root causes for risk mitigation, response and recovery performance issues, although again, it requires tracing issues back further than most professionals choose to explore.  To properly integrate root cause analysis into continuous improvement activities, each issue should be adequately documented, including source of issue, a detailed description, an identification date, and it should also have a field to capture root cause analysis.  Rather than one individual trying to identify the root cause, business continuity personnel should organize and facilitate discussions that involve subject matter experts to whom issues may be assigned or who can provide insight on an issue, and then the group should seek to trace the issue back to its origin together.

Within business continuity, there are numerous root causes that can lead to a variety of issues or complications. The following table notes a few examples, together with likely root causes, though this is far from a complete list.  Also, it’s important to note that just like with tree roots that feed a tree’s growth, there could be more than one root cause that affects a system and results in a problem, so it is important to trace all potential paths of an issue’s origin back, rather than just pursuing one direct cause, to identify all influencing factors.

Problem and Potential Root Cause

Again, root cause analysis is not just solving one instance of a problem, it’s also seeking opportunities to prevent future occurrences of an issue.  Once the origin of an issue is identified, it’s important to evaluate all areas of the business to identify other at-risk areas and ensure proper risk mitigation measures are put in place.  A solution in one area may not necessarily be applicable to all other areas of an organization, but even if it’s not, the act of identifying other similar at-risk areas raises awareness and enables the organization to develop additional solutions that make sense and address these risks before they result in future issues or downtime.

As business continuity management systems continue to mature, root cause analysis will become a powerful tool for business continuity professionals to deeply examine the cause of issues and provide an opportunity to correct them before they occur again.

____________

Stacy Gardner, Managing Consultant
Avalution Consulting: Business Continuity Consulting 

Our consulting team regularly publishes perspectives (shorter, independent articles) that touch on the trends currently affecting our profession and the strategic issues facing our clients. This is one of our most recent posts, but the full catalog of our perspectives – over 100 published since 2005 – can be accessed via our blog.

Information Can Be Just as Secure (If Not A lot more So) in the Cloud

According to a survey performed earlier this year by CIO.com, 54% of all IT security professionals cite cloud computing security as their top priority.  Another 32% cite security as a middle priority for them.  However, 85% of IT professionals are confident in their cloud provider’s ability to provide a secure environment for their data. 

Security has always been a concern when sensitive data is involved and this concern is heightened when it comes to cloud services outside of the corporate wall because no longer is it under the company’s direct supervision.  It is human nature to be afraid of the unknown, but the risks of cloud computing come with a plethora of benefits as well.  For example, the cloud offers greater flexibility, scalability, and agility, allowing IT staff to complete tasks in hours rather than weeks or months.

Depending on the size and nature of your business, entrusting your data to a cloud provider may be every bit as secure (if not more so) than your in-house security. This is because top-quality cloud hosting providers invest a significant amount of resources into security, much more than most small to medium-sized businesses can afford.  Also, most cloud providers make an effort to always keep up with the latest in security so that they can provide the best service to their customers. 

Atlantic.Net, a privately-held leading cloud hosting provider, offers a secure and robust platform that is routinely and systematically inspected with focus on control objectives in the areas of organizational structure, governance, administration, physical/environmental controls, and physical/logical security.

Utilizing Cloud Hosting as Your Business’ Disaster Recovery Remedy

It is clear to see why businesses put so much emphasis on backing up their data – they need their data to be secure so that their customers can rely on them.  Therefore, an effective disaster recovery plan is essential for every business that relies on stored data.  Furthermore, a successful disaster recovery solution requires additional resources identical to those used during daily operations.  

While there is a wide selection of disaster recovery solutions, cloud hosting provides the most flexibility and ease of use, while remaining cost-effective.  As opposed to purchasing two physical servers (one as your day-to-day server and the other as your backup), cloud servers provide the benefit of being able to easily create multiple servers in the cloud without needing to lease/own physical servers. 

In the same way that server redundancy provides failover protection for business continuity and disaster preparedness, cloud hosting provides increased stability and security, as well as improved scalability.  The redundancy delivers a backup for anything that may occur, such as a natural disaster or a security hack that comprises data.

Due to its cost effectiveness, cloud computing provides disaster recovery methods for small businesses that were previously possible only in large enterprises.  Cloud hosting enables significantly faster recovery times in the event of a disaster, as servers can be spun up in minutes on a cloud host platform.

With cloud hosting from a premier hosting provider like Atlantic.Net, principal IT infrastructure is also essentially a dynamic backup system, as your data and applications reside in an offsite, secure data center facility with a backup, uninterrupted power supply, and dedicated support staff, just in case.

Three Trends in IT Disaster Recovery

Disaster recovery is constantly being influenced by trends in the IT industry.  These trends are forcing businesses to reevaluate how they plan, test, and execute their disaster recovery plans.  The following are a few IT trends and how they are affecting the disaster recovery strategies for businesses in every industry.

Cloud Services:  As the cloud computing industry grows and businesses adopt more cloud services, they are realizing that the cloud can become part of their disaster recovery plan.  Instead of buying resources in case of a disaster, cloud computing allows companies to pay for long-term data storage on a pay-per-use basis, and therefore only pay for servers if they have a need to run them for a disaster or test.  Cloud-based disaster recovery gives businesses the potential for a lower cost, faster, and more flexible recovery solution for backing up their data.

Virtualization:  Server virtualization has become a key component of the disaster recovery plan for many businesses because it enables greater flexibility with computing resources.  Virtualization allows businesses to create an image of an entire data center that can be quickly activated when needed, giving companies a faster recovery time at a relatively low cost.

Mobile Connectivity:  In terms of disaster recovery, the growing use of mobile devices in the workplace facilitates business continuity when disaster strikes because mobile devices give people the ability to work remotely and maintain communication in the event of a disaster.  This keeps business operations functioning and minimizes downtime.

Because natural disasters such as hurricanes, floods, fires, earthquakes, and snow storms can put a business out of commission for a while, it is important to have an efficient, low cost, reliable disaster recovery plan in place.  IT managers should consider how these trends in the industry can be best leveraged to improve disaster recovery strategies.  

Atlantic.Net has been recognized throughout the world by disaster recovery hosting professionals and has been chosen by the Disaster Recovery Journal as their official data center!

 

How to Calculate ROI from Cloud Computing

In a business world that is embracing the cloud more and more every day, it is interesting to see that, while the cloud benefits companies in several ways, these companies seldom demonstrate their advantage from the cloud in terms of ROI (return on investment).  This may be because many of the benefits from cloud computing are intangible and may not be fully realized until further down the road.  

Therefore, to calculate returns from cloud computing, a business will most likely not employ the standard ROI calculations.  Instead, the company may use one of the following ways to determine ROI from cloud computing:

  1. Rate of adaption in the market:  With the flexibility that the cloud offers in terms of quick transitioning of capabilities, businesses can adapt to ever-changing market trends and therefore improve standing against competitors in the industry.  Consequently, increased revenue may be realized due to their ability to grab market share at an improved pace.
  2. Utilization and control of resources: The scalability of cloud computing allows businesses to avoid under or over utilizing resources, which in turn ensures effective capacity utilization and the avoidance of waste.
  3. Cost of ownership:  With little to no barriers to entry and the low skill level needed to configure and use cloud infrastructure, businesses can save the money that would otherwise be used for staff training, installation, and maintenance of the infrastructure.
  4. Growth potential:  As a business in today’s world, it is important to have room for growth.  Traditionally, if a business demanded additional resources (in terms of infrastructure and IT personnel), it may have taken weeks to acquire the infrastructure and to train/transition the staff.  However, with cloud computing, resources can be scaled almost instantaneously to accommodate the growing demands of the business.

Depending on the specific needs of your business, you may calculate ROI in any one of these ways, or another.  As you can see, it may be hard to quantify the returns on cloud computing, even if the benefits are quite substantial. 

At Atlantic.Net, we want to make sure that disaster recovery professionals are aware of the best cloud hosting options available to them.  We realize that they need a solid platform designed to deliver the speed and reliability demanded by today’s businesses.  Atlantic.Net will be dedicating some resources to educate the business continuity and disaster recovery professionals with the best practices when it comes to deploying cloud servers for a 100% up-time guarantee!

Can Cloud Technology Assist with Disaster Recovery and Resiliency?

“Civilization advances by extending the number of
important operations which we can perform without thinking of them. “ Alfred North Whitehead

In my last blogs we talked about virtualization and the every growing maturity of technology, and how to select a data center.  While technology has stabilized, performance and resiliency is needed more than ever.
Cloud Computing is agile, however still needs to have redundancy for failover to another facility.  Many forget that Cloud allows you to expand your business quickly in peak seasons, and reduce costs and infrastructure needed during off  seasons.

Cloud is the next wave of technology and most companies are now ensuring that their applications are able to run in the cloud.  Cloud also enables the ability to outsource IT across the enterprise, likely leading to cost savings.  With the  intent of growing your business towards resiliency please ensure to pick a provider and use the following guidelines when selecting your provider.

Are you using cloud technology?

Does your provider have a failover facility?

Does your provider have the ability to
grow your business?        

Please let me know your thoughts on cloud technology and disaster recovery.  How can you be ready?