Maintaining data center operations, come hell or high water
The past several months have seen an onslaught of hurricanes, earthquakes and wildfires around the world, providing yet another object lesson in how unpredictable and unforgiving natural disasters can be.
Throughout history, the Asia-Pacific region has been hit with its share of floods, earthquakes, landslides and typhoons. Regional utilities, governments and businesses can be crippled when disasters like these compromise mission-critical facilities.
So, what can data center owners and operators do to ensure 24x7x365 operations with the ever-present threat of environmental catastrophes?
Most mission-critical facilities develop and maintain disaster recovery and business continuity plans for worst-case scenarios.
But as boxing legend Mike Tyson eloquently put it, "Everyone has a plan until they get punched in the mouth.
And getting punched in the mouth by a Category 5 hurricane or a state-wide power outage will do a lot to test the mettle of any data center's disaster readiness.
Overcoming the challenges presented by natural disasters requires more than just detailed plans.
Disaster procedures and protocols need to be practiced ad nauseum, and then revisited, tweaked and revised with frequency.
For data centers, the ability to "weather the storm" is often a function of management and operations principals.
In fact, Uptime Institute's Tier Certification of Operational Sustainability describes the very management behaviors that can enable enterprises to withstand catastrophic conditions.
The SA Blackout
In 2016, South Australia (SA) suffered a statewide blackout in the wake of the worst storm to hit the continent in half a century.
Although the forecasts were well-publicized and the impending crisis was hardly a surprise, large areas experienced severe flooding and virtually the entire region was powerless after the storm knocked out high-voltage power pylons that carry electricity throughout the state.
While entire cities went dark, local data centers were working to stay online.
In severe storms like these, preventable factors often contribute to the loss of data center IT computing services. The key word here is preventable.
While achieving 100% preparedness for every disaster scenario is a tall order, data centers can build, prepare and practice for the worst-case scenario, eliminating all the uncertainty and unnecessary hazards in between.
It's essential for data center managers, leadership and staff to proactively identify potential risks and address them before it's too late. To stay operational during a region-wide disaster, enterprise data centers can take basic ongoing precautions such as:
- Establishing a frequent cadence of tests, assessments and upkeep, for infrastructure and operations (including exercises to switch power from utility to engine generator).
- Auditing the position of critical infrastructure components to ensure they're safe from flood risks (this includes fuel stores and generators).
- Calculating fuel consumption rates for disaster scenarios, ensuring that adequate stores are always maintained onsite, and accounting for potential interruptions to fuel delivery schedules.
One local data center company that made it through the SA Blackout found that preparations for fuel storage and delivery were a major factor in their ability to maintain power and prevent operations disruption.
Despite the fact that one of their sites had more than enough diesel storage to make it through the blackout (enough for 48 hours of operation), they plan to double fuel storage at all their facilities in order to minimize dependence on external sources during or after extended crises.
The company is also planning on incorporating better diesel transport between locations, so larger sites can act as distributors for smaller sites, independent of fuel supply companies.
These observations and adjustments demonstrate exactly the type of fine-tuning every data center should employ when it comes to their disaster recovery plans.
The Value of IT Resiliency
Due to the unpredictable nature of environmental disasters, the ability to shift IT workloads at a moment's notice is incredibly important for enterprises.
Many enterprises have yet to implement a multi-site resiliency strategy because of the cost, complexity and uncertainty associated with determining the number, location and type of facilities required to meet their specific business needs.
While understandable, that hesitation can be costly for a variety of reasons, including unavoidable disasters.
The good news is that according to the 2017 Uptime Institute Data Center Industry Survey, 68% of respondents have deployed some form of multi-site resiliency strategy, as more companies are beginning to deliver mission-critical IT services through distributed data centers.
Risk, performance, cost and availability are all factors that drive the adoption of resiliency schemes, but the ability to quickly redirect applications, data and traffic across geographies in the event of a natural disaster is an outstanding example of the business benefits IT resiliency can offer.
The Way Forward
The unfortunate reality is that most enterprises only learn about their flaws after unforeseen catastrophes expose them.
All organizations must regularly review and update their operational preparedness plans and practice them until they become second nature.
As natural disasters continue to occur with greater frequency, enterprises need to get proactive about identifying preventable weaknesses before a crisis does.
Avoid getting punched in the mouth by the next major storm with continued investment in planning, operations and staff for your business-critical infrastructure.