DataCenterNews Asia logo
Specialist data center news for Asia
Story image

Azure outage postmortem: Microsoft reveals what happened and why

By Ashton Young
Wed 12 Sep 2018
FYI, this story is more than a year old

Since last week headlines around the world have been painted with headlines shouting about the disruption to Microsoft’s services after a severe weather event knocked one of its data centers offline.

Essentially, the cause was blamed on high storms in the Texas area that resulted in power swells and ultimately ended in the temporary demise of one of the company’s South Central US data centers in San Antonio.

In a recent blog post, Microsoft Azure DevOps director of engineering Buck Hodges has released a ‘postmortem’ of what went down, why it happened, and what the company is doing to prevent similar incidents in the future.

“First, I want to apologize for the very long VSTS [now called Azure DevOps] outage for our customers hosted in the affected region and the impact it had on customers globally,” says Hodges.

“This incident was unprecedented for us. It was the longest outage for VSTS customers in our seven-year history. I've talked to customers through Twitter, email, and by phone whose teams lost a day or more of productivity. We let our customers down. It was a painful experience, and for that I apologize.”

The Azure status report reveals the data center switched from utility power to generator power following the power swells caused by the lightning, however, the mechanical cooling systems were also a victim of the power swells despite having surge suppressors in place.

While the data center was able to continue operating for a period of time, temperatures soon exceed safe operational thresholds which initiated an automated shutdown. While this blackout is an initiative to preserve infrastructure and data integrity, in this case temperatures rose so quickly that some hardware was damaged before it could be shut down.

Many asked why didn’t VSTS simply fail over to a different region.

We never want to lose any customer data. A key part of our data protection strategy is to store data in two regions using Azure SQL DB Point-in-time Restore (PITR) backups and Azure Geo-redundant Storage (GRS),” says Hodges.

“This enables us to replicate data within the same geography while respecting data sovereignty.Only Azure Storage can decide to fail over GRS storage accounts. If Azure Storage had failed over during this outage and there was data loss, we would still have waited on recovery to avoid data loss.

“Azure Storage provides two options for recovery in the event of an outage: wait for recovery or access data from a read-only secondary copy. Using read-only storage would degrade critical services like Git/TFVC and Build to the point of not being usable since code could neither be checked in nor the output of builds be saved (and thus not deployed). Additionally, failing over to the backed up DBs, once the backups were restored, would have resulting in data loss due to the latency of the backups.”

Hodges says the team is now in the process of making a number of changes based on the learnings from the outage, including:

  1. In supported geographies, move services into regions with Azure Availability Zones to be resilient to data center failures within a region.
  2. Explore possible solutions for asynchronous replication across regions
  3. Regularly exercise fail over across regions for VSTS services using our own organization.
  4. Add redundancy for our internal tooling to be available in more than one region.
  5. Fixed the regression in Dashboards where failed calls to Marketplace made Dashboards unavailable.
  6. Review circuit breakers for service-to-service calls to ensure correct scoping (surfaced in the calls to the User service)
  7. Review gaps in our current fault injection testing exposed by this incident.

“I apologize again for the very long disruption from this incident,” concludes Hodges.

Related stories
Top stories
Story image
Sustainability
AirTrunk boosts Japan presence with West Tokyo data center
AirTrunk is planning to build TOK2, a new hyperscale data center in Japan which will strengthen the company’s presence in the country.
Story image
Sustainability
RDA and MVGX partner for sustainable data center development
Red Dot Analytics (RDA) and MetaVerse Green Exchange (MVGX) have entered a strategic partnership to make Singapore's data center development and operations more sustainable.
Story image
Sustainability
Siemens showcases new automated solutions for data centers
Siemens has implemented new automated solutions and AI in the Baltic region's largest data center, providing insight into the future of data center management.
Story image
Power / Energy
Keysight Technologies introduces new next-gen DPT solution
Keysight Technologies has announced its new next-generation Double-Pulse Tester (DPT) with the PD1550A Advanced Dynamic Power Device Analyser.
Story image
Tech Data
Tech Data to use Pluribus Networks’ cloud solutions in APAC
Tech Data says using Pluribus Networks' Unified Cloud Fabric solution will be a "game-changer" for its data center infrastructure customers and partners.
Story image
Digital Transformation
Multiplex, NEXTDC making strong progress on S3 data centre
Multiplex has made a significant achievement on Stage 1 of NEXTDC’s S3 data centre, ‘topping out’ the structure in the Artarmon on Sydney’s lower North Shore.
Story image
APAC
Odaseva expands in APAC and UK with more security features
Odaseva, a data platform for Salesforce, is establishing new headquarters in London as well as a new data center in India.
Story image
BitTitan
Why tenant consolidation is critical to cloud success
Consolidating tenants can improve cost management, security and engagement after a flurry of reactive activity following the widespread shift to remote operations.
Story image
Cloud
Colt connectivity with AWS increases services in Asia
Colt Technology Services expands cloud connectivity to AWS Direct Connect Hosted services, with speeds of up to 10 Gbps in Asia.
Story image
Data Center
Tier III Ready Datacenter solutions shortlisted for major awards
"These designs will accelerate data center clients' own Tier III certification, reduce the cost, and fast-track their time to market."
Story image
Red Hat
Red Hat expands capabilities to provide streamlined application development in cloud
"Application development is undergoing significant change and developers need tools to support this transformation."
Story image
Disaster Recovery
Kacific launches emergency connectivity offering, CommsBox
Kacific has announced the release of a new emergency connectivity offering designed to rapidly provide broadband service in emergency or disaster zones.
Story image
SaaS
Cisco reveals new tech, intends to prevent network issues
Cisco has revealed new technology intended to mitigate costly disruptions by aiding IT teams in learning, predicting and planning.
Story image
Sustainability
AyalaLand and FLOW partner for data center development
AyalaLand Logistics Holdings Corp (ALLHC) and FLOW Digital Infrastructure have entered into a framework agreement to bolster the development of carrier-neutral data centers in the Philippines.
Story image
Kacific
Kacific launches service to combat enterprise power outages
Kacific Broadband Satellites Group has launched Enterprise Backup, a new service intended to protect organisations against frequent power outages.
Story image
Artificial Intelligence
International Space Station experiments with HPE’s edge and AI solutions
Hewlett Packard Enterprise drives innovation at the extreme edge on the International Space Station, with 24 completed experiments.
Story image
Sustainability
NTT launches IoT Services for Sustainability offering
"We know what actions are needed to build a more sustainable future and have a robust suite of technologies available to help deliver this impact."
Story image
Storage
Energy storage demand momentum continues, says BYD
BYD has announced an expansion of its production capacities and will deliver 250,000 units of its energy storage system, BYD Battery-Box Premium.
Story image
Data Center
CBRE finds record levels of investment in APAC data centers
CBRE's new report finds direct investment in the sector more than doubled in 2021, surpassing investment volumes for the past four years combined
Story image
Akamai
Akamai announces new products across security, computing
Akamai has announced a series of new products and updates to existing products across its security and compute product lines, including its entry into the infrastructure as a service (IaaS) market.
Story image
Data Center
Fujitsu AU signs PPA to offset 40% of NSW data centre load
The agreement marks a key step for Fujitsu Australia in decarbonising its operations and providing lower-emissions services to its customers.
Story image
Sustainability
ABB unlocks sustainable energy upgrades for data centers
ABB says its new microgrid solutions will get data centers ready for the green revolution and use their own energy sources with a reduced carbon footprint.
Story image
Surveillance
Genetec launches new enclosure management system for data centers
Genetec has released a new enclosure management solution that will give data centers the ability to secure, monitor and manage access to racks and cabinets remotely.
Exabeam
Find out how a behavioural analytics-driven approach can transform security operations with the new Exabeam commissioned Forrester study.
Link image
Story image
Hyperscale
Seagate, Phison partnership to improve data center offering
Seagate and Phison have announced plans to expand their SSD portfolio to assist data center customers in reducing total cost ownership (TCO).
Story image
Data Center
Preventing downtime costs and damage with Distributed Infrastructure Management
Distributed Infrastructure Management (DIM) can often be a lifeline for many enterprises that work with highly critical ICT infrastructure and power sources.
Story image
Sustainability
Power at the edge: the role of data centers in sustainability
The Singaporean moratorium on new data center projects was recently lifted, with one of the conditions being an increased focus on power efficiency and sustainability.
Story image
Sustainability
YTL unveils development of solar-powered data center campus
YTL Power (YTL) has announced the development of a 500MW data center campus in Johor, the first data center park in Malaysia to be powered by solar energy.
Story image
Sustainability
Grasping the opportunity to rethink the metrics of a sustainable data centre
A data centre traditionally has two distinct operations teams: the Facility Operations team, and the IT Operations team. Collaboration between them is the key to defining, measuring, and delivering long-term efficiency and sustainability improvements.
Story image
Hyperscale
SpaceDC partners with Aofei for data center sales in Asia
SpaceDC has partnered with Aofei Data International to sell Aofei's data centers, CDN and SDN in China.
Story image
Cloud
Four major announcements from Google's Data Cloud Summit
Multiple data cloud innovations have been unveiled at the Google Cloud summit, allowing customers to work with unlimited data across all workloads.
Story image
Data Center
Sime Darby Berhad to use Equinix APAC data centers
Equinix has expanded its digital infrastructure services, including its International Business Exchange data centers, to Sime Darby Berhad
Story image
Sustainability
Video: 10 Minute IT Jams - SoftIron CMO on Data Center Sustainability
In a special Power/Energy feature week presentation, we are joined by SoftIron CMO Andrew Moloney.
Story image
Cybersecurity
A10 Networks finds over 15 million DDoS weapons in 2021
A10 Networks notes that in the 2H 2021 reporting period, its security research team tracked more than 15.4 million Distributed Denial-of-Service (DDoS) weapons.
Story image
ABI Research
70% better 5G power consumption with hardware optimisation
ABI Research has found that hardware optimisation promises up to 70% improvement in 5G power consumption as networks reach scale in many developed nations.
Story image
Infrastructure
Report - Data investment the key to better business growth
New research from Digital Realty has revealed that almost half (47%) of IT leaders globally believe their business investment in data systems and infrastructure is a key obstacle or concern.
Story image
Infrastructure
Kyndryl and CDC Data Centres enter multi-year agreement
Kyndryl and CDC Data Centres have announced an agreement, under which Kyndryl will design and manage a customised environment within CDC's facilities.
Story image
Microsoft
Microsoft unveils adaptive accessories for disability access
Microsoft is introducing an expansive Inclusive Tech Lab to give people with disabilities greater access to technology through new software features and adaptive accessories.
Story image
Data Center
Equinix enters Africa, closing US$320 million acquisition of MainOne
The completion of the acquisition augments Equinix's long-term strategy to become a leading African carrier-neutral digital infrastructure company.
Story image
Government
NEXTDC, NT govt unveil data centre development plans
NEXTDC and the Northern Territory government have shared detailed plans surrounding the development of a new data centre in Darwin.
Story image
Databricks
Databricks grows in APAC market, expands into Korea
Databricks officially launches a local office in Seoul, Korea, building on existing partnerships with Cloocus, Megazone and the Weverse Company
Story image
SD-WAN
Orange moves Siemens AG’s entire operations to a SD-WAN
Orange Business Services has migrated Siemens AG's entire global operations, 1168 sites across 94 countries, to a SD-WAN
Softiron
For every 10PB of storage run on HyperDrive vs. comparable alternatives, an estimated 6,656 tonnes of CO₂ are saved by reduced energy consumption alone over its lifespan. That’s the equivalent of taking nearly 1,500 cars off the road for a year.
Link image
Story image
Telstra
Telstra expands business offerings in the Philippines
The expansion aims to offer more choice for customers and enhance connectivity into the Philippines, and within the country.