Friday, January 3, 2014

Incident Response and Disaster Planning

Over the last few months I have been involved with a lot of discussions about Disaster Recovery versus Disaster Avoidance. I am surprised that I keep hearing the misconception that if we employ disaster avoidance, we no longer need disaster recovery plans or procedures. I can understand this misconception to a certain point… If I have my data and servers spread across multiple locations and datacenters, why would I need to have separate backups? I will just restore from another datacenter, right?

I believe this comes from the old mentality of hot-site / warm-site methods of disaster recovery. This is where data is replicated from a primary site to an offsite location with varying levels of equipment to restore critical systems. We are now seeing more Active-Active disaster avoidance scenarios, where the data is replicated between multiple hot sites. This allows a company to have an active hot site and actually use all the equipment it is purchasing.

http://www.vclouds.nl/2012/04/16/understanding-stretched-clustering-and-disaster-avoidance/

But there is still a real need for disaster recovery plans and procedures as well as incident response plans and procedures. Data corruption and loss is still very real and painful, so a good solid, tested backup solution is still necessary. Data spills still happen, people still delete the wrong files, and equipment still fails. The better your documentation is, the less painful an incident will be. Equipment and technology can only take us so far - There is still the human factor to consider, and humans make mistakes.

About five years ago, I was working on a customer’s virtual environment and was asked to delete a server that was no longer needed. The environment was replicated over four different sites globally, backups were done nightly, and all datacenters were hot sites with failover capacity for the alternate sites. Pretty much bullet-proof - except that I deleted the wrong server. I had little knowledge of their procedures as I was just onsite performing some maintenance. Luckily, I was working with one of the company engineers who was able to pull an incident response plan to have the server restored from backup. It contained contact information, the proper procedures on who to notify, what customers it affected, and so on. We were able to have the system restored and operational again in less than an hour. Had I been alone to guess at it, it would have taken considerably longer.

Proper documentation with defined incident and response procedures, as well as a comprehensive disaster recovery plan and policy will make your life much easier and can ultimately save your bacon when failures occur.

http://www.7x24exchangedelval.org/pdf/What_to_protect_against_DA_Vs_DR.pdf

No comments:

Post a Comment