Last Updated: 2015-11-02 00:47:18 UTC
by Tony Carothers (Version: 1)
One of the security questions being asked of security professionals, by business executives these days, from both internal and external entities, is ‘What is the status of our Disaster Recovery plan?” The driving force behind the question varies, from ‘compliance’ and “our business partners are asking” to “I read an article about an earthquake….” A disaster recovery plan is one of those things that you don’t want to define the requirements as you go, this is one that is truly about the *plan*. We are going to talk about the basics of a disaster recovery plan, and some references to assist in preparing and implementing your own disaster recovery plan. I will attempt to expand on some of these ares in future diaries, because it can be quite a bit to digest without getting a bit drowsy ;)
What is a Disaster Recovery plan?
A disaster recovery (DR) plan is simply a document that can be used as a guide to restore systems and services, in a secure state, in the event of a disaster or other unwanted event. It is typically broken up into several parts, with roles and responsibilities defined within the document.
What comprises a DR plan?
There are numerous methods of executing a DR program, and in my experiences I have found the NIST 800-34 standard as a guiding framework because it is easy to understand, and is recognized by many organizations I work with for compliance, effectiveness, and reporting purposes. The DR plan, as defined in the NIST standard, consists of three phases:
- Activation & Notification
Activation & Notification Phase
The Activation & Notification phase takes place after an outage occurs that may extend beyond the accepted parameters. For example, an outage may occur that is expected to take the business offline for 8 hours, such as a cut internet cable. If the business requirements mandate that an organization cannot be down for more than 4 hours, then this becomes a situation where the Activation & Notification phase would be activated to assess the outage, impact, and report to executive management for a decision to recover operations. Communications to impacted users also begins at this phase
The recovery phase is the period of time in which systems and people are brought back online, often times in a temporary facility or location. Communications between IT and users occurs extensively during this phase, as people and systems are restored to an operational state. Restoration priorities is an area that must be well defined during the planning phase, and updated on a regular basis. The business impact analysis, which should be performed during the planning phase, will help define the restoration priority. The plans and procedures for systems recovery is critical at this junction, because it will drive what needs to be restored, and what order, due to application dependencies.
The reconstitution phase is the period of time where operations are returned to a ‘steady-state’, system data and functionality is verified as normal, and cleanup actions occur. Backups are often implemented at this stage, as well as deactivation of any assets used for recovery actions. The last, and one of the most important pieces, is the documentation of the lessons learned. The last component is the compilation of input from team members on their observations, and updating of all documentation to reflect the current operating state and lessons learned.
tony d0t carothers --gmail