DataCenterNews Asia Pacific logo
Specialist data center news for Asia Pacific
Story image

How memory failure prediction keeps data centers and the digital economy up and running

FYI, this story is more than a year old

It was approximately four years ago when I wrote in an industry publication, “Today's data centers are the modern equivalent of railroad infrastructure and the world's business rides upon its rails.”

Looking back, one can't help but think how understated, if not quaint, the idea seems now.

Just consider the historic surge in digital services we've witnessed as global populations were forced to work, study, socialize, conduct retail transactions, entertain themselves and even meet with healthcare providers, all from home. As Microsoft CEO Satya Nadella famously said roughly sixty days into the global health crisis, “We've seen two years' worth of digital transformation in two months.”

Lest we forget, all that streaming and social media, video conferencing, cloud collaboration platforms, eCommerce, telehealth and online gaming rely on highly-available data centers as well as reliable server hardware. Forget railroad tracks. The data center, now rightly classified by governments worldwide as essential critical infrastructure, has become for business and society what oxygen is for the ultramarathon runner.

The critical difference is that we're presently in a race where no clear finish line has yet to emerge, as company after company have announced they won't be reopening their offices until mid-2021, at the earliest. Some lockdowns have returned, and much of our collective professional and personal lives remain virtual. More than ever, our data centers, and the hardware that resides there, need to stay online so that the digital economy stays up and running. 

The breath of business continuity

According to the Uptime Institute's 2020 data center survey, “outages are occurring with disturbing frequency, and bigger outages are becoming more damaging and expensive” than in previous years.

In 2020, a greater percentage of outages cost more than $1 million (now, nearly one in six, compared to one in 10, as in 2019), and a greater percentage cost between $100,000 and $1 million (28% in 2019 vs 40% in 2020).

As one of the top-three hardware failures that occur in data centers, memory failures have a direct impact on server reliability. Moreover, a memory failure can have a devastating effect without giving data center operators an early enough warning of a future outage in order to take preemptive action.

Using machine learning to analyze real-time memory health data makes it possible to predict such failures ahead of time. A method of data analysis that automates analytical model building, machine learning uses algorithms that iteratively learn from data, thus allowing computers to find hidden insights without being explicitly programmed on where to look for them.

The ability to analyze real-time memory health data and avert memory failures ultimately translates to a better experience for customers. This is especially so for organizations such as online services platforms and cloud service providers, which rely heavily on server hardware reliability, availability and serviceability. These are the very types of businesses that are experiencing soaring demand today.

By deploying a memory failure prediction solution in their data center and integrating it into their existing management systems, IT staff can analyze their server memory failures, reduce downtime, and improve their current Dual Inline Memory Module (DIMM) replacement policies.

Such a memory failure prediction solution uses machine learning to analyze server memory errors down to the DIMM, bank, column, row, and cell levels to generate memory health scores for each DIMM. Changes in the health score over time can signal issues well before impact, giving enough lead-time to move a workload and or take other actions.

To get a better picture of just how the memory health score is generated, it's essential to understand that the memory failure prediction engine is placed in firmware and receives alerts when memory errors occur. When servers have a burst of errors in a specific memory region, the DIMM Health Assessment Model (DHAM) is checked to assess if the affected DIMM's health score needs to be modified. If so, then the score is changed accordingly and passed on to the baseboard management controller (BMC). This monitoring technique has been extremely useful, resulting in strong ROI, as several case studies have documented.

Memory failure prediction in action

In one case study, 'Intel Memory Failure Prediction Improves Reliability at Meituan', a Beijing company, whose online platform connects consumers with local businesses, monitored the health of the memory modules of its servers by integrating the memory failure prediction solution into their existing data center management solution.

The initial test deployment indicated that if the company deployed the solution across its full server network, server crashes caused by hardware failures could be reduced by up to 40%, which ultimately would deliver a better experience for hundreds of millions of its customers and local vendors.

In another Intel case study, 'Intel Memory Failure Prediction at Tencent', a leading China-based cloud solutions provider test deployed the memory failure prediction solution across thousands of its servers to reduce downtime caused by server memory failures. The memory failure prediction solution deployment resulted in improved memory reliability due to predictions based on the capture of micro-level memory failure information from the operating system's error detection and correction (EDAC) driver, which stores historical memory error logs.

The memory failure prediction solution also gave the cloud service provider's IT staff enough information to proactively address potential memory issues, and replace failing DIMMs before they reach a terminal stage and cause server failures, thus reducing downtime.

The cloud provider's test deployment of the memory failure prediction solution indicated a five-fold improvement on DIMM level failure prediction. If the company were to deploy the memory failure prediction solution across its entire data center portfolio, they would improve the effectiveness of server reliability aware workload management and decrease the percentage of uncorrectable errors (UEs), thereby significantly reducing downtime.

Online retailing and cloud technologies have significantly disrupted the retail and consumer goods vertical, leading to increased adoption of cloud computing. Moreover, as world events drive companies to accelerate their digital transformation initiatives practically overnight, ResearchandMarkets projects the global cloud computing market size will increase at a compound annual growth rate (CAGR) of 17.5%, surging from $371.4 billion in 2020 to $832.1 billion by 2025.

As cloud providers and retailers, along with financial services, IT, telecom, media firms and more navigate the ‘next normal', maintaining data center uptime — the very breath of business continuity — has never been so business-critical.

Related stories
Top stories
Story image
Data Protection
iseek acquires south Australian data centre, YourDC
The acquisition broadens iseek’s co-location, cloud, and connectivity offering to seven data centres across Brisbane, Northern Queensland, Sydney, and Adelaide.
Story image
Microsoft
Putting the data horse ahead of the technology cart
There’s one question I often ask myself: Are we over-indexing on thinking about technology rather than data?
Story image
Artificial Intelligence
Versa announces partnership with Nabiq to deliver 5G services
Versa’s VOS enables a unique approach to 5G edge solutions by combining virtual machines with SASE multi-tenancy to enable 5G UPF data plane
Story image
Data analytics
COVID-19 relief innovation takes 2022 SAS Hackathon crown
In COVID-19’s wake, more than 287,000 MSMEs joined JakPreneur, a collaborative government platform that links entrepreneurs and stakeholders
Story image
Data Centre Cooling
The world is heating up, but data centres should keep their cool
With the world heating up, the challenge of keeping data centres cool becomes more complex, expensive and power intensive.
Story image
Sustainable IT
Equinix partners NUS to use hydrogen tech in data centres
The partners will develop hydrogen fuel technologies for green data centres in tropical climates, and for use in Equinix’s global network.
Story image
IT Automation
Juniper Networks announces expansion of Apstra Software with Apstra Freeform
The newly announced Apstra Freeform technology will give customers the ability to manage and automate operations for data centers regardless of the architecture.
Story image
Software-as-a-Service
Honeywell launches Data Center Suite for business outcomes
Honeywell has launched its Data Center Suite, a portfolio of outcome-based software offerings to help data centre managers and owners.
Story image
Data center
Australia’s data centre pioneer still leading after 22 years
We look at the fascinating success of Macquarie data centre's over its 22 year life span and how they continue to innovate in a highly contested sector.
Story image
Data Protection
iseek secures Queensland Government data centre contract
iseek secures the Queensland Government's core network data centre as-a-service contract after a competitive procurement process undertaken by the CITEC.
Story image
Sustainable IT
Empyrion DC announces 40MW green data center in South Korea
Empyrion DC has announced it is developing a 40MW green data center in Gangnam, Seoul, South Korea (GDC).
Story image
Digital Transformation
Nanyang Technological University Singapore builds digital brand presence
Leveraging the customisation features of Sitefinity DX, non-technical users could upload content and create design pages and boost work productivity. 
Story image
Network Infrastructure
Vertiv launches solutions to better manage edge computing
Vertiv has introduced new power and cooling solutions for the edge of the network, including the addition of lithium-ion models to a leading on-line UPS family.
Story image
Melbourne
Equinix invests $23m to expand ME2 data centre in Melbourne
Equinix has completed the second phase expansion of its ME2 International Business Exchange data centre, located in Port Melbourne.
Story image
Hyperscale
Growth in hyperscale data centres to increase shortage of IT workers
New Zealand's tech worker capacity is set to come under increasing pressure as the number of hyperscale data centres grows.
Story image
Multi-cloud
VMware advances multi-cloud management with VMware Aria
Managing apps and infrastructure in a multi-cloud, especially public cloud, and multi-technology environment is complex.
Story image
Startup
Zetaris is changing the way we think about data virtualisation
Zetaris was launched on the Microsoft Marketplace and Ingram Micro Cloud Marketplace in Australia in 2020 and has since expanded into nine global markets.
AWS Marketplace
Whitepaper: A practical guide for mitigating risk in today’s modern applications
Link image
Story image
Gartner
SnapLogic named Visionary in two Magic Quadrant categories
SnapLogic has announced that it is the only iPaaS (Integrated Platform as a Service) vendor to be named a Visionary in two Magic Quadrant categories.
Story image
Storage
DCI Data Centers breaks ground on AKL02 center
DCI Data Centers has commenced construction on Auckland's largest data center.
Story image
Storage
Seagate announces next gen advanced storage arrays
The new Exos X systems feature up to twice the performance of the previous generation and enhanced enterprise-class durability, the company states.
Story image
Data
Talend announces support for Amazon Redshift Serverless
Talend has announced its support for Amazon Redshift Serverless, with the company saying the integration reinforces its commitment and leadership in supporting businesses.
Aws Marketplace
Learn how to implement a backup and recovery plan for a new generation of Kubernetes-based modern applications
Link image
Story image
Cloud
SoftIron announces its newest flagship offering, HyperCloud
SoftIron has announced HyperCloud, the world's first full turnkey, completely integrated and supported Intelligent Cloud Fabric and the company's newest flagship offering.
Story image
Microsoft
VMware extends collaboration with Microsoft for enterprise workloads in Azure
Mutual customers will have the choice to purchase Azure VMware Solution through the VMware Cloud Universal program.
Story image
Data Centre Maintenance / Management
Schneider Electric backs new Leading Edge data centre in Australia
As a result of the new project, regional Australian businesses and communities will likely have greater access to distributed cloud networks.
Story image
Software Defined Wide Area Network
Axiata, Versa Networks partner for enterprise SASE in Asia
Axiata has partnered with Versa Networks to deliver Secure Access Service Edge (SASE) technology to rapidly digitalising Asian enterprises.
Story image
Google Cloud Platform
Google Cloud to open first cloud region in NZ - among others
Google Cloud has announced plans to bring three new cloud regions, one each in New Zealand, Malaysia and Thailand.
Story image
Sustainable IT
New report calls for tighter guidelines on data centre sustainability
A new Cushman & Wakefield report is calling for water consumption and carbon emissions to be measured in addition to power usage.
Story image
Digital Transformation
NTT launches its Cyberjaya 6 data center in Malaysia
NTT expands its hyperscaler footprint in Malaysia with its sixth data center facility, supporting the growing digital economy.
Story image
Superloop
Stellar financial result after major strategic moves by Superloop
We get a glimpse under the hood at the financial results from 2022 for the connectivity giant Superloop.
Story image
Firewall
Fortinet unveils compact firewall for hyperscale data centres, 5G networks
"Fortinet’s dedication to pushing the boundaries of what is possible in security performance has yielded the most powerful compact firewall yet."
Story image
Update
InterSystems releases updates to its IRIS data platform
Provider of next-generation solutions InterSystems has announced a series of new releases to its award-winning InterSystems IRIS data platform.
Story image
IT infrastructure
Bentley Systems announces finalists for the 2022 Going Digital Awards in Infrastructure
The company says that this annual awards program honours the work of Bentley software users who are advancing infrastructure design, construction, and operations throughout the world.
Story image
Digital Transformation
NEXTDC opens $1b+ next gen sovereign data centre in Sydney
Australian data centre as a service provider has officially opened S3, its largest Sydney development to date. 
Story image
Data analytics
Srisawan Hospital to enhance patient experience with InterSystems TrakCare
The new Srisawan Hospital in Bangkok has chosen InterSystems TrakCare to help create enhanced patient experiences and promote further digital engagement.
Story image
Machine learning
Oracle announces MySQL HeatWave for Amazon Web Services
MySQL HeatWave is a service that combines OLTP, analytics, machine learning, and machine learning-based automation. 
Story image
Optical Networking
NEC predicts AON as a next-generation infrastructure
NEC's open optical transmission devices support multi-vendor configurations, allowing customers to procure and combine equipment from multiple vendors.
Story image
5G
Worldwide 5G mobile data traffic exploding - report
"With 5G, there is a wider range of deployment scenarios, forcing vendors to provide comprehensive solutions to support every need."
Story image
Software-as-a-Service
Cloudera launches all-in-one data lakehouse cloud service
CDP One makes it faster, easier and less risky for businesses to move to the cloud and migrate existing workloads to a modern data architecture.
Story image
Edge Computing
NTT launches Edge-as-a-Service to accelerate automation
"Minimum latency, maximum processing power, and global coverage are exactly what enterprises need to accelerate their digital transformation journeys.”