Enterprise disaster recovery is a process for getting your organization back on track and resuming operations as quickly as possible following a disaster or business interruption.
Business disruptions can be exceptionally costly for enterprise organizations, especially when they impact critical assets, applications, or revenue-generating services. For these organizations, the development of an enterprise disaster recovery plan is a crucial step towards preventing business disruptions and ensuring that disrupted applications can be restored rapidly in the event of a disaster.
To help you get started, we’ve put together this guide outlining 7 steps for building your enterprise disaster recovery plan.
7 Steps for Building Your Enterprise Disaster Recovery Plan
Step One: Perform a Risk Assessment
Overview: Performing a risk assessment is the first step in building your enterprise disaster recovery plan. The purpose of a risk assessment is to identify potential hazards that could impact continuity for your business and to identify the business-specific assets that are at risk.
To perform a risk assessment, start with risk identification: list all the hazards you can think of that would disrupt your business. Here’s a short list to help you get started:
- Pandemic Disease
- Utility Outage
- Cyber Attack
- Supplier/Vendor Failure
- Hardware Failure
- Workplace Accident
- Human Error
Each risk that you identify should be assigned two relative scores: one for its probability of occurrence, and one for the magnitude of the risk and its potential to disrupt your business.
Assigning scores in this way will help you allocate resources towards the most significant sources of risk.
Next, list out all your most important and high-value business assets that could be impacted by the hazards you have identified. Include things like:
- Property & Critical Infrastructure
- IT Infrastructure (hardware and software)
- Systems & Equipment
- Business-Critical Operations
- Business Reputation
- Ability to maintain compliance with regulatory requirements during the disruption and any resulting fees for failure to do so
For each asset, look for deficiencies that could make that asset more vulnerable to the hazards you have identified.
- What are the potential hazards or disasters that could impact my business?
- What is the probability or likelihood of occurrence for each potential hazard?
- What assets (people, property, information technology, business processes, supply chain, etc.) are at risk?
- Which assets, processes, or resources might be vulnerable to hazards?
Inputs: Risk Identification, Asset Identification
Outputs: Risk Assessment Report – Identifies and documents sources of risk, their probability and disruptive potential, the assets they impact, and any vulnerabilities currently associated with those assets.
Step Two: Conduct a Business Impact Analysis (BIA)
Overview: Conducting a business impact analysis (BIA) is the second step in building your enterprise disaster recovery plan.
The purpose of a BIA is to understand the real repercussions that disruptions to specific assets or services might have for the business. Your organization must look at each asset you have documented and work to determine what the financial and operational consequences would be if that asset or business function were disrupted. Potential consequences could include things like:
- Loss of revenue or business income
- Damage to business reputation
- Contractual penalties
- Damage to property
To understand the business impacts of the disruption of an asset, organizations may conduct a survey or interview with the manager responsible for the asset.
Organizations should also consider how the timing of a business disruption could amplify financial or operational consequences.
By understanding and predicting the consequences of business disruption to your key assets, your organization can better decide which recovery strategies are appropriate for specific services and efficiently allocate resources between disaster prevention, mitigation, and recovery initiatives.
- What are the financial/operational consequences of the disruption of key business functions or assets?
- How could the timing and duration of a business disruption impact financial/operational consequences?
Inputs: Risk Assessment Report
Outputs: Business Impact Analysis Report – A BIA report documents the predicted consequences of business disruptions to assets, services, and business functions identified in the risk assessment. It uses information from the risk assessment and from management interviews to quantify the financial impact of identified risk scenarios that result in business interruption.
Step Three: Define Recovery Objectives for Every Asset
Overview: Defining recovery objectives for each business asset is the next step in your enterprise disaster recovery plan.
At this point, you should have completed a risk assessment and business impact analysis. You should have identified potential risks to your business and determined which ones are the most likely to occur.
You should have listed all assets that are at risk, identified which assets are most vulnerable to disruption, and quantified the impact of disruption for each asset.
Start categorizing your applications, IT infrastructure components, and data based on their importance to your business. A simple categorization model could include:
- Mission-critical applications – Critical revenue-generating applications with minimal tolerance for downtime. An example target time to recover these applications might be 15 minutes or less. To ensure a timely response, this category should be reserved for a very small number of applications. If all applications are critical, you will overprovision your DR solution resulting in higher costs and none of your applications will be the top priority during an outage.
- Business-critical applications — These are also critical applications but can tolerate a slightly longer outage as a more cost-effective approach. For example, these might be able to be restored within an hour instead of fifteen minutes.
- Business-imperative applications – Minor revenue-generating applications with slightly more tolerance for downtime than mission-critical applications.
- Non-critical applications – Rarely used or non-critical applications with low impact that can experience long downtimes without impacting business performance.
For each asset, you should establish two types of recovery objectives:
Recovery Time Objective (RTO) – RTO is the length of downtime between when a disaster is declared and when normal operations are restored and available to users.
Recovery Point Objective (RPO) – RPO is the amount of time between the service interruption and the last backup that will be restored after a disaster is declared, and defines the maximum allowable amount of lost data measured in hours.
More aggressive RTOs and RPOs require more expensive disaster recovery solutions, so it’s important to carefully consider and categorize applications accordingly.
- Which assets are the most critical and have the least tolerance for downtime?
- What is the maximum acceptable downtime for each business asset?
- What is the maximum acceptable period of data loss for each business asset?
Inputs: Business Impact Analysis Report
Outputs: Enterprise Disaster Recovery Plan Requirements – The recovery objectives you have established for your business assets are the requirements for your disaster recovery plan. In the next two steps, you will choose a technical approach and develop a disaster recovery strategy that satisfies your specific recovery objective for each asset or asset category.
Step Four: Determine a Technical Approach to Enterprise Disaster Recovery (EDR)
Overview: The fourth step to building your enterprise disaster recovery plan is to determine a technical approach to disaster recovery that can satisfy your RTOs and RPOs for each asset.
For each asset, your organization can implement three different types of control measures to help mitigate risk:
1. Preventive measures help reduce the risk of business interruptions in the event of a disaster.
2. Detective measures help identify potential issues within the IT infrastructure that could lead to vulnerabilities or impact the successful execution of the disaster recovery plan.
3. Corrective measures restore the asset to a secondary environment in case of a business disruption.
Your organization must also determine whether to use on-premise or cloud-based IT infrastructure to support your corrective measures. You may choose to leverage public cloud services or take a hybrid cloud approach that allows for more customization.
Another option is cloud-based Disaster Recovery as a Service (DRaaS). In this model, your organization partners with a managed services provider who uses their own servers and IT infrastructure to manage all your disaster recovery needs. The key benefit of cloud-based DRaaS is that enterprise organizations can rapidly recover their most business-critical assets and consistently meet RTO and RPO objectives without taking on the additional technical overhead and up-front costs associated with building and managing their own EDR solution.
Faction offers five different configurations for cloud-based DRaaS:
- Active-Active/Hot Recovery – A geographically redundant setup and automatic failover to instantly recover your most critical assets or services.
- Active-Active with Scaleup – An Active-Active configuration that requires scale-up at the second location in the event of a failure at the first location, ideal for recovering assets within 5-10 minutes.
- Warm Standby – An Active-Passive configuration where the secondary site has systems in place to immediately take over in case of a failure. Ideal for recovering assets within 10-60 minutes.
- Pilot Standby – A configuration where certain critical systems are kept on at the secondary site and other systems are only turned on in case of a failure at the primary site. This is ideal for recovering assets within 1-4 hours.
- Cold Standby – A configuration with no systems activated at the second site and the ability to restore assets with an RTO of 4 hours or more.
Your organization’s technical approach to enterprise disaster recovery includes the control measures, vendors, technologies, and recovery models you choose for each business asset.
1. What control measures should be implemented for each business asset?
2. Which disaster recovery technologies, solutions, or vendors are the most robust and cost-effective?
3. Which enterprise disaster recovery strategy or configuration should be used for each asset?
Inputs: RTOs and RPOs
Outputs: Enterprise Disaster Recovery Strategy – Your technical approach to EDR reflects your overall strategy for reducing business disruption by matching business assets with recovery methods based on their relative importance to your organization.
Step Five: Develop and Document Your Enterprise Disaster Recovery Plan
Overview: At this point, you’re finally ready to write your disaster recovery plan. Your plan should include information from every step so far in this process. It should include your hazard and asset listings, risk assessment, business impact analysis, RTOs and RPOs for all assets, and your chosen technical approach for recovering each asset.
Your EDR plan should also:
- Identify staff members who will be responsible for executing the EDR plan and define their roles and responsibilities.
- Establish a clear Disaster Response Process for staff members to follow in case of a disaster.
- Establish a clear Recovery Plan for each individual asset, service, or application.
- Identify any resources that are required to support the disaster recovery plan.
1. Who will participate in disaster recovery? What are their roles and responsibilities?
2. What will the disaster response process look like?
3. How will services be recovered after a business interruption or disaster?
4. How will employees access services if the office or data center is physically inaccessible or unsafe in a disaster?
Inputs: Enterprise Disaster Recovery Strategy
Outputs: Enterprise Disaster Recovery Plan – At this point, your organization should have a functioning enterprise disaster recovery plan. This document should include all the information necessary to effectively recover business assets and ensure business continuity in case of a disaster.
Step Six: Practice Executing Your Disaster Recovery Plan
Overview: There are two main things that organizations must do to ensure that their enterprise disaster recovery plan is ready to deliver results:
1. Training – Conduct orientation and training exercises for members of the business continuity team to ensure their awareness and preparedness for disaster recovery protocols.
2. Testing – Practice executing your disaster recovery plan to ensure that critical business assets can be reliably recovered while satisfying their RTO and RPO objectives.
Testing should be completed regularly to ensure that your organization can consistently meet the RTO and RPO objectives for your most important applications. Test your disaster recovery plan for critical processes at least twice a year and document the results.
Faction supports self-service disaster recovery using VMware Cloud on AWS Site Recovery Service, which allows organizations to build, test, and execute their own disaster recovery plans. Organizations can also choose a fully managed DRaaS implementation where Faction handles EDR planning, testing, and the disaster recovery process.
1. Will my EDR plan work when I need it to?
2. Is my business continuity team ready to execute the disaster recovery plan?
Step Seven: Monitor and Improve Your Disaster Recovery Plan
Overview: As you continue to test your disaster recovery plan, ask your business continuity team members to collect observations and provide feedback to help improve the process. Review the documented results of tests to ensure that your organization can consistently meet RTO and RPO targets for critical business assets.
1. How did our EDR plan perform?
2. What was the difference between target/expected performance and actual performance?
3. What actions can we take to improve our results in future tests or in a real disaster situation?
How Faction’s Cloud-Based Disaster Recovery Can Protect Your IT Infrastructure
Faction offers powerful cloud-based disaster recovery solutions to get your critical applications back online fast and prevent disruptions to your business.
Our Hybrid Disaster Recovery-as-a-Service (HDRaaS) solution combines managed disaster recovery services with Faction Cloud Control Volumes to replicate workloads, automate failover and failback, and enable rapid, cost-effective recovery of your most important business applications.