Availability = Survival

As IT veterans, I’m certain we all have stories like the one I shared previously – many of those without the happy ending.

It reminds me of the prime concepts of a (fascinating) survival book I read several years ago by Laurence Gonzales called “Deep Survival: Who Lives, Who Dies, and Why,” where the key themes of accident investigation are:

  • Things that have never happened before, happen every day (for example, the NASA Challenger booster explosion due to cold weather creating brittle seals, despite the fact that the shuttle had endured prior cold weather launches successfully many times before)
  • Accidents are almost always a result of a series of seemingly benign individual events that when taken together create an unexpected, and compounded, calamity

But, if things that have never happened before, happen every day, why do many organizations play the IT game as if nothing is about to happen?

Most folks invest only in the minimum – data protection based on point in time backups – but they rarely test them. Most don’t know how long it will take to completely restore a system. We’ve seen groups invest large sums in “fast backups” with no regard to “fast recovery” (not with eGroup by the way) not understanding that your backup is only as good as your ability to recover – quickly.

Even larger-scale Disaster Recovery capabilities are rarely tested. Compounding this shortfall, often times the IT team isn’t even checked out on the specific steps (or timelines) of what a partial or full recovery may look like.

How many businesses will pay the ransom or the consequences for their lack of preparedness?

eGroup believes that Disaster Preparedness is really an exercise in discipline over your Disaster Recovery process. You must “inspect what you expect” from it. Drilling (practicing) with the right tools regularly – often enough that everyone on the team knows what to do and when to do it, while feeling comfortable and confident – is key. Frequent execution of the plan, with the ability to improve and automate aspects of it, ensures fluidity and “muscle memory” when pressed into action when the unexpected happens.

Furthermore, the most prepared teams embrace a philosophy of “you fight like you train” or “you play like you practice” and live in a constant state of preparedness by living in a constant state of recovery. These teams operate regularly in a state of expectation that something is offline. We’ve seen this concept executed in superb fashion through the likes of the Netflix Chaos Monkey and, more locally, a Federal Credit Union client who routinely fails their systems over, runs for a month or two, and then fails back again…rinse and repeat.

However, the fundamental challenge for most organizations is the cost – both funding and effort – of being prepared. Historically, IT has not been “core” to the business, as it might be with Netflix, so disaster preparedness, or more appropriately “IT Resilience”, has often been sidelined in favor of more profit-generating or cost-saving initiatives.

But with the advent of the “digital transformation” (a discussion for another day), IT is not just a convenience but, rather, an essential component of an organization’s value. As Alec Ross, author of “The Industries of the Future” says: “Land was the raw material of the agricultural age. Iron was the raw material of the industrial age. Data is the raw material of the information age.

Data (and access to it) is so vital that it is more valuable than oil.

In a world where every business will be naturally selected – at the will of its clientele – based on its ability to access its data, downtime – and the inability to avoid it – will spell certain death.

Data Availability=Survival!

In my final follow-on post, I’ll discuss why the solutions of last year are no longer appropriate, and where you can adapt to win.