DataCenterNews Asia logo
Specialist data center news for Asia
Story image

Anatomy of an outage: Understanding the lifecycle of downtime

By Julia Gabel
Thu 19 Apr 2018
FYI, this story is more than a year old

Article by Eric Vaughn, chief revenue officer at Neverfail

No one likes dealing with an outage.

They’re bad for nearly every aspect of a business, from employee morale to productivity, competitiveness, profitability, and reputation.

It’s a common problem, though, and there are lots of noisy claims on the market from many different vendors. You’ll hear everything from “Recover in minutes” to “Avoid downtime” to “Zero data loss”.

Figuring out which claims are bogus and which ones make sense for YOUR operation is daunting. Each organization has unique requirements and, since budgets are never unlimited, IT managers must do more than simply implement gold-plated protection for every application and data repository. 

What exactly is an outage?

Before you can decide what the optimum RTOs and RPOs are for your business, and select the right DR and BC services, you need to understand what constitutes an “outage.” It’s not as obvious as it sounds.

To understand the concept clearly, let’s break it down into four components:

  • Awareness
  • Resolution
  • Failover
  • Recovery

An outage includes all four stages, and the time to deal with an outage completely includes all the time required to get through all these steps. Recovery is usually the shortest part of any outage, and, as a result, you’ll often hear vendors focus almost exclusively on their short recovery times. Fine, as far as it goes.

But ignoring the other three stages - and failing to take them into account - ­will get you into hot water. When you’re making commitments to the business about how short any potential outages will be, it pays to be realistic. 

Stage 1: Awareness

The first stage is usually the longest: Figuring out that you actually have an outage.

IT often finds out about a problem when users start calling to complain that they can’t work. At this point, the outage has been underway for some time and is already having an impact on the business.

Understanding what is really going on - is it an outage or user error, for example - can often take an hour or more. Once you’ve confirmed that you’re dealing with an outage, you can move immediately to Stage 2.

Stage 2: Resolution

Now you must triage the system and make some decisions. Is it something you can fix quickly or do you need to failover to backup systems?

Sometimes what looks like a system outage is highly localized. For example, a virus-ridden laptop might be causing problems for someone, or even a group of people, but looks like a problem at the server level.

This type of local problem is still an outage as far as your people are concerned, of course, but you can usually deal with it quickly and without affecting the rest of the organization.

It takes another half hour minimum to confirm you’re looking at a system-level issue that requires a failover.

Stage 3: Failover

After you’ve confirmed there really is an outage and determined that there is no quick fix, you have to failover to your backup systems.

Depending on how well your backup systems are equipped and configured, this can take anywhere from a few seconds to over an hour.

When you failover, there are some key assumptions you make in the hopes of a successful recovery later. We all know what they say about “assume”, so it’s worth taking the time to make sure these are more like certainties than assumptions:

With unplanned outages, you must ensure that your data is complete and not corrupted, hardware and infrastructure are available, and that the internal resources are available to complete the failover process successfully.

If any of these assumptions is not true, your ability to recover in Stage 4 will be compromised, if recovery is even possible.

One of your biggest risk factors is right here. You must take the actions necessary to ensure that all the elements – people, equipment, communications, network connectivity, and so on – are in place and working so an unplanned failover will always result in a successful recovery.

Stage 4: Recovery

We’re not finished! The clock is still ticking on your outage when the recovery stage starts.

At this point, you have to restore services to your production servers or site and reinstitute your normal BC/DR practices. If you don’t do it now, you’re putting your operation at severe risk; what if another outage occurs before then?

This stage can easily consume several hours, since you want to be absolutely certain that all production systems are working as expected, and that your DR/BC systems are ready to handle it when the next outage occurs.

Most solutions require at least a little downtime during the failover process to re-synchronize data and to establish user connections to the backup environment.

The same process has to occur again, but in reverse, to bring your production systems back online. Essentially, every IT outage produces two potential business outages — two periods of time when employees can’t do their work.

Be sure that you’re taking that cold, hard fact into account in your planning – and your presentations to management.

How much do outages really cost?

The Ponemon Institute has conducted several studies of this, with the most recent one published in 2016.

According to their latest study, the average cost of a data center outage has steadily increased from $505,502 in 2010 to $740,357 in 2016 (38% increase!) We can safely assume outages are even more expensive today.

To make their assessment, the Institute looked at these primary factors in calculating costs:

  • Damage to mission-critical data
  • Impact of downtime on organizational productivity
  • Damages to equipment and other assets
  • Cost to detect and remediate systems and core business processes
  • Legal and regulatory impact, including litigation defense cost
  • Lost confidence and trust among key stakeholders
  • Diminishment of marketplace brand and reputation

Ponemon found that cost of downtime has a large range, depending on industry and many other factors, but the average is about $9,000 per minute.

Another way to look at this number: 99.9% uptime may sound good, but that translates into about 44 minutes of downtime per month, therefore costing the average business almost $400,000! Adding another 9 to that SLA - making it 99.99% - translates into about 4 minutes per month, which obviously represents a huge savings, and provides firm justification for investing in an appropriate level of protection.

Putting it all together

When you understand the anatomy of an outage and all of its stages, you can plan effectively and make good decisions. You can also develop proposals to take to management that they will understand in business terms.

Don’t be fooled by a “failover in minutes” pitch. It’s easy to avoid overpromising and underdelivering when you understand all the implications - and the costs - associated with each stage of an outage.

Finding the right partner

The market for backup and recovery services is crowded and you will find no shortage of potential vendors. The keys are finding one that not only has suitable technology and is able to deliver what you need at a competitive price, but also has people you can depend on when the time comes.

Find a partner who has a large number of clients who are similar to your business. Talk to some of their engineering people, not just an account rep. Can you communicate easily with them?

Are they listening and truly understanding your priorities and unique challenges?

Once you’ve selected a partner, work with them to develop a plan that will fit your organization’s unique set of requirements and put together an implementation schedule that won’t disrupt your operations.

The whole process, from initial consultation through to first tests and then to completed deployment, can take less than a month — with the right partner.

Related stories
Top stories
Story image
Macquarie Data Centres
Macquarie deal to pioneer CO2-cutting data centre tech in Australia
Macquarie Data Centres has signed a multi-year deal with ResetData, an Australian first provider using Submer data centre technology. 
Story image
SNP unveils next generation of CrystalBridge software platform
Data is a key pillar of every customer-centric organisation, as it relies on agile decisions to become increasingly sustainable and intelligent.
Story image
Zscaler launches co-located data centres in Canberra and Auckland
The investment will offer public and private sector enterprises greater resilience in support of their zero trust cybersecurity posture.
Story image
Hybrid Cloud
HPE GreenLake advances hybrid cloud experience with new services
"The innovations unveiled today further build on our vision to provide the market with an unmatched platform to spur innovation and drive transformation.”
Story image
Evolution Data Centres reveals target of 20 tonnes of CO2 per GWh
Evolution Data Centres launches their new Sustainable Data Centre Charter, which includes targets like only 20 tonnes of carbon emissions per GWh by 2030.
Story image
Digital Transformation
The Huawei APAC conference kicks off with digital transformation
More than 1500 people from across APAC have gathered for the Huawei APAC Digital Innovation Congress to explore the future of digital innovation.
Story image
Legrand unveils Nexpand, a data center cabinet platform
Legrand has unveiled a new data center cabinet platform, Nexpand, to offer the necessary scalability and future-proof architecture for digital transformation.
Story image
Cisco Live showcases new offerings in its first hybrid event
Cisco Live 2022 has seen Cisco executives and customers take the stage to present a range of discussions in the company’s first-ever hybrid event.
Story image
Oracle Cloud Infrastructure expands distributed cloud services
“Distributed cloud is the next evolution of cloud computing, and provides customers with more flexibility and control in how they deploy cloud resources."
Story image
Kaspersky opens three new centers to boost data management
Cybersecurity company Kaspersky has opened three new Transparency Centers, one in Japan, the second in Singapore and the third in the United States.
Story image
Cloudflare outage in 19 data centers worldwide due to own error
Cloudflare says its outage for 19 of its data centers yesterday was because of a change in a long-running project to increase resilience in its busiest locations.
Story image
New Uptime analysis highlights worsening downtime costs and consequences
New data from Uptime Institute has found that downtime costs and consequences are worsening as those involved in data infrastructure fail to find ways to curb outages.
Story image
QuSecure partners with DataBridge Sites to showcase platform
QuSecure has partnered with DataBridge Sites to showcase its Quantum-as-a-Service (QaaS) orchestration platform, QuProtect.
Story image
SoftIron named global leader for efficient DC infrastructure solutions
SoftIron has been named a global leader for supplying energy-efficient data infrastructure solutions for core-to-edge data centers after an assessment by Earth Capital Ltd.
Story image
AirTrunk boosts Japan presence with West Tokyo data center
AirTrunk is planning to build TOK2, a new hyperscale data center in Japan which will strengthen the company’s presence in the country.
Story image
Data and analytics could be key to higher selling prices in APAC
Sisense's latest report has found that almost half of data professionals in APAC think customised data and analytics can create better selling prices for their products.
Story image
Aligned Data Centers increases sustainability-linked loan
Aligned Data Centers has increased its sustainability-linked loan from $375 million to $1.75 billion to speed up the next phase of its strategic growth.
Story image
Data Science
Neo4j announces service delivery alliance with Deloitte
Neo4j has announced a service delivery alliance with Deloitte Consulting Southeast Asia for a range of services to customers within the region.
Story image
Boomi surpasses 20,000 customers. Sets record for the iPaaS space
Boomi has announced it has surpassed the 20,000 customer mark, setting the record for the largest customer base among iPaaS vendors.
Story image
Colt Technology expands into South Korea data center market
Colt Technology Services has expanded its network into the South Korean market, offering the country’s businesses cost-effective, low latency connectivity.
Story image
Digital Edge chooses Nortek’s StatePoint for new data center
Digital Edge will use Nortek's StatePoint liquid cooling technology in its new data center, the first commercial colocation operator in Asia to do so.
Story image
Talend introduces new data health solutions for businesses
Talend has announced its latest version of Talend Data Fabric, with the release of Talend Trust Score enabling data teams to establish a foundation for data health.
Story image
SnapLogic improves Intelligent Integration Platform
SnapLogic has released new features and improvements to its Intelligent Integration Platform, which will allow IT, data and business teams to make select processes faster and more straightforward.
Story image
Singapore found to have the speediest internet rates in the world
New research from BanklessTimes has shown that Singapore has the highest recorded median internet speed in the world at 207.61 MBPS.
Story image
Robotic Process Automation / RPA
Micro Focus unveils Data Center Automation for SaaS delivery
MicroFocus has released Data Center Automation (DCA) for software-as-a-service (SaaS) delivery, offering more cost-effective vulnerability risk and IT compliance management.
Story image
Global investment in data centers more than doubled in 2021
DLA Piper's latest global survey finds the total investment in data center infrastructure worldwide rose from USD $24.4 billion in 2020 to USD $53.8 billion in 2021.
Story image
Intel unveils new investments for data center sustainability
Intel has announced two new investments, continuing its efforts to create more sustainable data center technology.
Story image
Secure access service edge / SASE
Cisco unveils new cloud-managed networking offerings
Cisco has announced new cloud management capabilities that offer a unified experience across the Cisco Meraki, Cisco Catalyst and Cisco Nexus portfolios.
Story image
Tech job moves - Forcepoint, Malwarebytes, SolarWinds & VMware
We round up all job appointments from May 13-20, 2022, in one place to keep you updated with the latest from across the tech industries.
Story image
Artificial Intelligence
Databricks announces new offering for Unity Catalog
Databricks has significantly expanded data governance capabilities on the lakehouse by unveiling data lineage for Unity Catalog.
Story image
Daikin and SP Group to build new energy efficient district cooling system
The project, set to be complete by 2025, will create a system with a cooling capacity of up to 36,000 refrigerant tonnes (RT). 
Story image
SAS Viya on Microsoft Azure to deliver 204% return - study
The Forrester Total Economic Impact study finds SAS Viya on Microsoft Azure brings a 204% return on investment over three years.
Story image
Amazon Web Services / AWS
Qualtrics goes live on AWS Cloud Infrastructure in Japan
Organisations across Japan will now be able to access the Qualtrics XM/OS platform locally via data centre in the AWS Asia Pacific (Tokyo) region.
Story image
Public Cloud
Public cloud services revenues top $400 billion in 2021
"For the next several years, leading cloud providers will play a critical role in helping enterprises navigate the current storms of disruption."
Story image
SnapLogic launches Accelerator for Amazon HealthLake
SnapLogic has launched Accelerator to allow healthcare and life sciences organisations to turn raw data into healthcare-related insights and actions.
Story image
Microsoft, Cloudian partnership offers data center flexibility
Cloudian’s HyperStore object storage platform is now integrated and validated to work with Microsoft SQ Server 2022, offering more flexible and scalable data centers.
Story image
SolarWinds IT Trends Report highlights increased cloud complexity for businesses
SolarWinds' new IT Trends report has signalled a significant shift in the way businesses are dealing with hybrid cloud and infrastructure.
Story image
Vertiv introduces line of redundant power transfer switches
Vertiv has introduced Vertiv Geist Rack Transfer Switch (RTS), a new line of transfer switches that provides redundant power to single-corded devices.
Story image
Viettel IDC deploys Cloudian Hyperstore object storage for enhanced cloud solutions
Cloudian has announced that its Hyperstore object storage has been deployed by Vietnam telco Viettel IDC, citing the technology’s flexibility, multi-tenancy and ransomware protection as significant advantages.
Story image
Juniper expands SASE offering with data loss prevention capabilities
Juniper has announced the expansion of its SASE offering with the addition of cloud access security broker (CASB) and data loss prevention (DLP) capabilities.
Story image
Equinix and PGIM Real Estate open data centre in Sydney
Equinix and PGIM Real Estate, the real estate investment and financing arm of PGIM, have announced the first xScale data centre in Sydney, named SY9x.
Story image
Preparing for the digital decade with the right workforce strategies
For a decade that started under the pall of the pandemic, the 2020s is poised to end with a bang with the digital economy swelling to a high across the world.
Story image
Secureworks researches new threat to Elasticsearch databases
Researchers from Secureworks' Counter Threat Unit have identified indexes of multiple internet-facing Elasticsearch databases replaced with a ransom note.
Story image
Huawei unveils next-generation sustainable data centers
Huawei says its next-generation data centers will be powered by PowerPOD 3.0, which reduces the footprint by 40% and cuts the energy consumption by 70%.