I am a California girl and as such earthquakes are the natural disasters that I have the most experience with, followed by floods, especially in the El Nino years. So reading a few pieces on how the data centers in Japan fared during and after the earthquake that occurred on March 11th of this year was an interesting read for me and very enlightening.
In all they managed the shaking ground fantastically because of buildings that behave like giant shock absorbers and racks that also cushion the shock from the abrupt movement and shearing effect. Post-earthquake power was a concern, getting the fuel to power the UPS systems was a headache, and getting the government to not cap power consumption were also expected problems. The small things like simple power cords were almost impossible to come by as were batteries, etc…. But all in all, every data center was online and functioning with minimal interruption.
It all got me to thinking,
As a trusted adviser when and how do we guide our customer’s thinking about business continuity and disaster recovery planning on a system level, a site level, company wide and globally?
Also, how frequently are these plans and contingencies reviewed and tested?
Of course these questions should come up when designing a system, and during implementation, or quite possibly during a system upgrade. We write disaster recovery plans and back up plans, and if there is enough time we will test these plans for qualification and verification. Then they are put in a binder or archived, etc. But disasters, natural or not, are what start companies thinking about the old adage of ‘are we doing enough’ to continue our business and not affect our customer?
I tend to advocate that clients review all business critical plans, and not just the application or system plan the team has been working on, at least twice a year. I also recommend that the plans are regularly tested even if it is in a development environment. I’m pretty sure, after working in Japan for a time, that wasn’t the first time they had looked at their disaster plans and made sure they worked. It is interesting that when testing plans we tend to get myopic and narrow in our vision. Does the application or system fail over or switch to another data center? Many times we forget to look at the larger view of ‘does everything critical’ fail and then still work together.
Disaster Recovery In the Cloud
With the advent of the cloud, not only do we need to review the client’s disaster recovery and business continuity plans, but also review the plans and contingencies of the cloud provider.
It sounds like a tall order to not only suggest to our clients that they take out the plans and dust them off on a regular basis and test them but to also try and ‘visualize’ possible risks that could occur that will affect the business. Add to that, virtualization within the cloud, and we have third parties that are a growing norm in our world.
So perhaps it is the small things, like suggesting an overstock of power cords and other small necessities or checking into how to fuel generators that run the UPS systems, that will make all the difference in a crisis.