By: Mike Carter
Are you prepared? Are you sure?
In the early minutes of Saturday morning, April 2011, a water main broke underneath the street in Columbia, SC.
A $30B lending provider, located just feet away from the break, with over 1,200 servers (95% of them virtualized) located in their basement Data Center, experienced a massive subterranean flood that destroyed the entire electrical subsystem beneath the building, effectively rendering it useless due to power loss.
There was no physical damage to the computing systems themselves, which were safely above the flood waters; however, since the entire electrical distribution for the building was submerged under feet of water, neither commercial grid power or UPS power could be leveraged.
An entire data center lay asleep…in the dark…with no ability to turn it on.
eGroup received the call shortly after 12:45am Saturday morning, requesting assistance of any kind to get critical line of business systems up and operational prior to Monday (As a lending institution, systems and processes certainly ran 24x7x365, however the nationwide mob with torches wouldn’t show up until Monday morning). We sprang into action, mobilizing an entire team that delivered a portable data center rack – complete with storage, compute, and networking and pre-configured and ready to accept workload restoration – to a colocation facility in Alpharetta, Georgia, by Sunday afternoon.
A lot of things went right with that situation – the timing of the calamity, the industry of the business affected, no loss of human life, no collateral damage to critical assets. It was just a complete loss of power, with a 48-hour opportunity to get ahead of it.
In the days, weeks, and months that followed this event, much work was done to fortify their disaster preparedness posture, a term that was used more than disaster recovery, because “who wants to recover from a disaster” when you can be “prepared for any disaster”? Recovery was a slow, intensive, expensive process. Future preparedness was even slower, more expensive, and complicated. Think in terms of “years” and “millions”. The building was declared uninhabitable – for months.
If you had asked this team a few days before the flood if they felt “prepared” for an event, they would have said “yes,” and they would have been right – but only for the events they had prepared for. Clearly, this scenario wasn’t on the threat list ahead of time. They had prepared for the expected – not the unexpected.
It was John Lennon who most famously sang “life is what happens to you while you’re busy making other plans” and I believe this story emphasizes that principle from an IT perspective.
Preparedness means being “operationally ready for the unexpected” to ensure availability. My next post Friday will introduce what that actually means and what’s at stake if ignored.