DataCenterNews Asia logo
Specialist data center news for Asia
Story image

How memory failure prediction keeps data centers and the digital economy up and running

By Contributor
Thu 3 Dec 2020
FYI, this story is more than a year old

Article by Intel general manager of data center management solutions Jeff Klaus.
 

It was approximately four years ago when I wrote in an industry publication, “Today’s data centers are the modern equivalent of railroad infrastructure and the world’s business rides upon its rails.” 

Looking back, one can’t help but think how understated, if not quaint, the idea seems now. 

Just consider the historic surge in digital services we’ve witnessed as global populations were forced to work, study, socialize, conduct retail transactions, entertain themselves and even meet with healthcare providers, all from home. As Microsoft CEO Satya Nadella famously said roughly sixty days into the global health crisis, “We’ve seen two years’ worth of digital transformation in two months.” 

Lest we forget, all that streaming and social media, video conferencing, cloud collaboration platforms, eCommerce, telehealth and online gaming rely on highly-available data centers as well as reliable server hardware. Forget railroad tracks. The data center, now rightly classified by governments worldwide as essential critical infrastructure, has become for business and society what oxygen is for the ultramarathon runner. 

The critical difference is that we’re presently in a race where no clear finish line has yet to emerge, as company after company have announced they won’t be reopening their offices until mid-2021, at the earliest. Some lockdowns have returned, and much of our collective professional and personal lives remain virtual. More than ever, our data centers, and the hardware that resides there, need to stay online so that the digital economy stays up and running. 
 

The breath of business continuity

According to the Uptime Institute’s 2020 data center survey, “outages are occurring with disturbing frequency, and bigger outages are becoming more damaging and expensive” than in previous years.

In 2020, a greater percentage of outages cost more than $1 million (now, nearly one in six, compared to one in 10, as in 2019), and a greater percentage cost between $100,000 and $1 million (28% in 2019 vs 40% in 2020).

As one of the top-three hardware failures that occur in data centers, memory failures have a direct impact on server reliability. Moreover, a memory failure can have a devastating effect without giving data center operators an early enough warning of a future outage in order to take preemptive action.

Using machine learning to analyze real-time memory health data makes it possible to predict such failures ahead of time. A method of data analysis that automates analytical model building, machine learning uses algorithms that iteratively learn from data, thus allowing computers to find hidden insights without being explicitly programmed on where to look for them. 

The ability to analyze real-time memory health data and avert memory failures ultimately translates to a better experience for customers. This is especially so for organizations such as online services platforms and cloud service providers, which rely heavily on server hardware reliability, availability and serviceability. These are the very types of businesses that are experiencing soaring demand today.

By deploying a memory failure prediction solution in their data center and integrating it into their existing management systems, IT staff can analyze their server memory failures, reduce downtime, and improve their current Dual Inline Memory Module (DIMM) replacement policies. 

Such a memory failure prediction solution uses machine learning to analyze server memory errors down to the DIMM, bank, column, row, and cell levels to generate memory health scores for each DIMM. Changes in the health score over time can signal issues well before impact, giving enough lead-time to move a workload and or take other actions. 

To get a better picture of just how the memory health score is generated, it’s essential to understand that the memory failure prediction engine is placed in firmware and receives alerts when memory errors occur. When servers have a burst of errors in a specific memory region, the DIMM Health Assessment Model (DHAM) is checked to assess if the affected DIMM’s health score needs to be modified. If so, then the score is changed accordingly and passed on to the baseboard management controller (BMC). This monitoring technique has been extremely useful, resulting in strong ROI, as several case studies have documented.
 

Memory failure prediction in action

In one case study, 'Intel Memory Failure Prediction Improves Reliability at Meituan', a Beijing company, whose online platform connects consumers with local businesses, monitored the health of the memory modules of its servers by integrating the memory failure prediction solution into their existing data center management solution. 

The initial test deployment indicated that if the company deployed the solution across its full server network, server crashes caused by hardware failures could be reduced by up to 40%, which ultimately would deliver a better experience for hundreds of millions of its customers and local vendors.

In another Intel case study, 'Intel Memory Failure Prediction at Tencent', a leading China-based cloud solutions provider test deployed the memory failure prediction solution across thousands of its servers to reduce downtime caused by server memory failures. The memory failure prediction solution deployment resulted in improved memory reliability due to predictions based on the capture of micro-level memory failure information from the operating system’s error detection and correction (EDAC) driver, which stores historical memory error logs. 

The memory failure prediction solution also gave the cloud service provider’s IT staff enough information to proactively address potential memory issues, and replace failing DIMMs before they reach a terminal stage and cause server failures, thus reducing downtime.

The cloud provider’s test deployment of the memory failure prediction solution indicated a five-fold improvement on DIMM level failure prediction. If the company were to deploy the memory failure prediction solution across its entire data center portfolio, they would improve the effectiveness of server reliability aware workload management and decrease the percentage of uncorrectable errors (UEs), thereby significantly reducing downtime.

Online retailing and cloud technologies have significantly disrupted the retail and consumer goods vertical, leading to increased adoption of cloud computing. Moreover, as world events drive companies to accelerate their digital transformation initiatives practically overnight, ResearchandMarkets projects the global cloud computing market size will increase at a compound annual growth rate (CAGR) of 17.5%, surging from $371.4 billion in 2020 to $832.1 billion by 2025. 

As cloud providers and retailers, along with financial services, IT, telecom, media firms and more navigate the ‘next normal’, maintaining data center uptime — the very breath of business continuity — has never been so business-critical.

Related stories
Top stories
Story image
Employment
Tech job moves - Forcepoint, Malwarebytes, SolarWinds & VMware
We round up all job appointments from May 13-20, 2022, in one place to keep you updated with the latest from across the tech industries.
Story image
Sustainability
Intel unveils new investments for data center sustainability
Intel has announced two new investments, continuing its efforts to create more sustainable data center technology.
Story image
Digital Transformation
The Huawei APAC conference kicks off with digital transformation
More than 1500 people from across APAC have gathered for the Huawei APAC Digital Innovation Congress to explore the future of digital innovation.
Story image
Sustainability
Legrand unveils Nexpand, a data center cabinet platform
Legrand has unveiled a new data center cabinet platform, Nexpand, to offer the necessary scalability and future-proof architecture for digital transformation.
Story image
Cybersecurity
The 'A-B-C' of effective application security
Software applications have been a key tool for businesses for decades, but the way they are designed and operated has changed during the past few years.
Story image
Data Center
Tier III Ready Datacenter solutions shortlisted for major awards
"These designs will accelerate data center clients' own Tier III certification, reduce the cost, and fast-track their time to market."
Story image
SaaS
Cisco reveals new tech, intends to prevent network issues
Cisco has revealed new technology intended to mitigate costly disruptions by aiding IT teams in learning, predicting and planning.
Story image
Digital Transformation
EdgeConneX enters Indonesia, plans for data center campus
EdgeConnex has announced it is expanding its presence in Asia with the acquisition of GTN Data Center in Indonesia.
Story image
Microsoft
SAS Viya on Microsoft Azure to deliver 204% return - study
The Forrester Total Economic Impact study finds SAS Viya on Microsoft Azure brings a 204% return on investment over three years.
Story image
Data Center
Digital Edge to build South Korea's largest commercial data center
The project will be the largest commercial data center project in South Korea with total IT power of 120MW and a capital investment of more than KWR$1 trillion.
Story image
Power / Energy
DigitalBridge makes $30 million equity investment in LEDC
Leading Edge Data Centres (LEDC) has announced it has secured an AUD$30 million equity investment in its regional edge network from an affiliate of DigitalBridge Group, DigitalBridge.
Story image
Sustainability
Daikin and SP Group to build new energy efficient district cooling system
The project, set to be complete by 2025, will create a system with a cooling capacity of up to 36,000 refrigerant tonnes (RT). 
Story image
Microsoft
Microsoft unveils adaptive accessories for disability access
Microsoft is introducing an expansive Inclusive Tech Lab to give people with disabilities greater access to technology through new software features and adaptive accessories.
Story image
Telstra
Telstra expands business offerings in the Philippines
The expansion aims to offer more choice for customers and enhance connectivity into the Philippines, and within the country.
Story image
Databricks
Databricks grows in APAC market, expands into Korea
Databricks officially launches a local office in Seoul, Korea, building on existing partnerships with Cloocus, Megazone and the Weverse Company
Story image
Cable
New high-performance cable in the works for Asia
A new high-performance submarine cable is being built to enhance connectivity between Hong Kong, China and Southeast Asia.
Story image
Power / Energy
Keysight Technologies introduces new next-gen DPT solution
Keysight Technologies has announced its new next-generation Double-Pulse Tester (DPT) with the PD1550A Advanced Dynamic Power Device Analyser.
Story image
Sisense
Data and analytics could be key to higher selling prices in APAC
Sisense's latest report has found that almost half of data professionals in APAC think customised data and analytics can create better selling prices for their products.
Story image
Data Center
Sime Darby Berhad to use Equinix APAC data centers
Equinix has expanded its digital infrastructure services, including its International Business Exchange data centers, to Sime Darby Berhad
Story image
Tech Data
Tech Data to use Pluribus Networks’ cloud solutions in APAC
Tech Data says using Pluribus Networks' Unified Cloud Fabric solution will be a "game-changer" for its data center infrastructure customers and partners.
Story image
Sustainability
NTT launches IoT Services for Sustainability offering
"We know what actions are needed to build a more sustainable future and have a robust suite of technologies available to help deliver this impact."
Story image
Research
New strategies for cloud-native attacks - Aqua Security
New research from Aqua Security reveals attackers are using more sophisticated techniques to target cloud-native environments.
Story image
BitTitan
Why tenant consolidation is critical to cloud success
Consolidating tenants can improve cost management, security and engagement after a flurry of reactive activity following the widespread shift to remote operations.
Story image
Akamai
Akamai announces new products across security, computing
Akamai has announced a series of new products and updates to existing products across its security and compute product lines, including its entry into the infrastructure as a service (IaaS) market.
Story image
Storage
Energy storage demand momentum continues, says BYD
BYD has announced an expansion of its production capacities and will deliver 250,000 units of its energy storage system, BYD Battery-Box Premium.
Story image
Colocation
Digital Edge chooses Nortek’s StatePoint for new data center
Digital Edge will use Nortek's StatePoint liquid cooling technology in its new data center, the first commercial colocation operator in Asia to do so.
Story image
Data Center
Preventing downtime costs and damage with Distributed Infrastructure Management
Distributed Infrastructure Management (DIM) can often be a lifeline for many enterprises that work with highly critical ICT infrastructure and power sources.
Story image
Hyperscale
SpaceDC partners with Aofei for data center sales in Asia
SpaceDC has partnered with Aofei Data International to sell Aofei's data centers, CDN and SDN in China.
Story image
Sustainability
YTL unveils development of solar-powered data center campus
YTL Power (YTL) has announced the development of a 500MW data center campus in Johor, the first data center park in Malaysia to be powered by solar energy.
Story image
Sustainability
ABB unlocks sustainable energy upgrades for data centers
ABB says its new microgrid solutions will get data centers ready for the green revolution and use their own energy sources with a reduced carbon footprint.
Story image
Sustainability
Siemens showcases new automated solutions for data centers
Siemens has implemented new automated solutions and AI in the Baltic region's largest data center, providing insight into the future of data center management.
Story image
Data Center
CBRE finds record levels of investment in APAC data centers
CBRE's new report finds direct investment in the sector more than doubled in 2021, surpassing investment volumes for the past four years combined
Story image
Sustainability
RDA and MVGX partner for sustainable data center development
Red Dot Analytics (RDA) and MetaVerse Green Exchange (MVGX) have entered a strategic partnership to make Singapore's data center development and operations more sustainable.
Story image
Infrastructure
Report - Data investment the key to better business growth
New research from Digital Realty has revealed that almost half (47%) of IT leaders globally believe their business investment in data systems and infrastructure is a key obstacle or concern.
Exabeam
Find out how a behavioural analytics-driven approach can transform security operations with the new Exabeam commissioned Forrester study.
Link image
Story image
SD-WAN
Orange moves Siemens AG’s entire operations to a SD-WAN
Orange Business Services has migrated Siemens AG's entire global operations, 1168 sites across 94 countries, to a SD-WAN
Story image
Talend
Talend introduces new data health solutions for businesses
Talend has announced its latest version of Talend Data Fabric, with the release of Talend Trust Score enabling data teams to establish a foundation for data health.
Story image
APAC
Odaseva expands in APAC and UK with more security features
Odaseva, a data platform for Salesforce, is establishing new headquarters in London as well as a new data center in India.
Story image
Sustainability
AyalaLand and FLOW partner for data center development
AyalaLand Logistics Holdings Corp (ALLHC) and FLOW Digital Infrastructure have entered into a framework agreement to bolster the development of carrier-neutral data centers in the Philippines.
Story image
Digital Transformation
Multiplex, NEXTDC making strong progress on S3 data centre
Multiplex has made a significant achievement on Stage 1 of NEXTDC’s S3 data centre, ‘topping out’ the structure in the Artarmon on Sydney’s lower North Shore.
Story image
Disaster Recovery
Kacific launches emergency connectivity offering, CommsBox
Kacific has announced the release of a new emergency connectivity offering designed to rapidly provide broadband service in emergency or disaster zones.
Story image
Data Center
Equinix enters Africa, closing US$320 million acquisition of MainOne
The completion of the acquisition augments Equinix's long-term strategy to become a leading African carrier-neutral digital infrastructure company.
Story image
Cybersecurity
A10 Networks finds over 15 million DDoS weapons in 2021
A10 Networks notes that in the 2H 2021 reporting period, its security research team tracked more than 15.4 million Distributed Denial-of-Service (DDoS) weapons.
Story image
Cloud
Colt connectivity with AWS increases services in Asia
Colt Technology Services expands cloud connectivity to AWS Direct Connect Hosted services, with speeds of up to 10 Gbps in Asia.