DataCenterNews Asia logo
Specialist data center news for Asia
Partner content
Story image

Using AI to boost data centre uptime and lower TCO

By Contributor
Tue 9 Feb 2021
FYI, this story is more than a year old

Article by Intel Corporation Data Center Management Solutions senior application engineer George Clement, and Zachary Bobroff. 
 

Hardware failures are all too common in large-scale data centers and cloud service infrastructure, and these failures can cause service level agreement (SLA) violations and severe loss of revenue. 

Memory failures are among the most critical hardware failures that occur in data centers today, notorious for severely impacting system reliability, availability, and serviceability (RAS). These failures can be caused by a wide range of factors beyond normal use, including manufacturing defects, and extreme environmental or operating conditions. 

While commonly accepted techniques such as error correcting code (ECC) and correctable errors threshold-based predictive failure analysis (PFA) help overcome some correctable errors with dual inline memory module (DIMM), they have cost, reliability, coverage, and performance implications. 

A burst in the number of correctable errors could result in the performance degradation of a server and even denial-of-service. Furthermore, ECC and correctable errors threshold-based PFA cannot help to overcome uncorrectable errors — the catastrophic failures that typically result in crashes.

Intel Memory Failure Prediction (Intel MFP) is the ideal solution for organizations that rely heavily on server reliability, availability, and serviceability. Predicting future memory failures before they occur has become critical for today’s data centers. By analysing historical data to predict potential catastrophic events, Intel MFP predicts memory failure events before they happen. 

The solution features several innovative and original capabilities. It predicts micro-level failures in rows, columns and cells based on historical data, using a low-overhead online learning method to improve its prediction accuracy and avoid interfering with critical compute tasks. 

This also enables Intel MFP to generate an estimated memory health score for proactive memory failure management, allowing users to take actions accordingly. Intel MFP is vendor-agnostic, and works in conjunction with other data center management solutions, including Intel Data center Manager (Intel DCM). 
 

Reduce memory failure-related server crashes by 40%

In a case study with Tencent, initial collaborative testing of the Intel MFP algorithm showed quick results with a five-fold reduction in memory failure and system downtime. The same partner also extended this support by leveraging intelligent avoidance of failing memory at the operating system level until that memory module was replaced. 

In a similar case study with Meituan, the company saw a 40% reduction in server crashes caused by memory errors. The company monitored the health of the memory modules of their servers by integrating Intel MFP into their existing data center management solution. By analysing data that was previously collected by their data center management software, they were able to generate prediction scores for each DRAM module, and then take appropriate action to maintain their SLAs and maximize service uptimes.

Armed with a new capability, Intel worked with AMI, a global leader in powering, managing and securing the world's connected digital infrastructure through its BIOS, BMC and security solutions, and determined to expand this support to the rest of the industry.

Because capturing and analysing memory errors requires a close relationship between both the UEFI and BMC firmware, AMI worked to make Intel MFP easy to adopt into existing and future server platforms. 

As errors are captured, they are recorded by the BIOS and certain metadata information is then passed to the BMC firmware. The BMC firmware then takes this metadata and runs it through the Intel MFP engine to calculate a health score for the memory module. As new errors are detected, the AMI solution tracks the health score of each memory module and exposes the result for analysis by system administrators. 

AMI’s default implementation provides the current memory module health score information in the Web UI for the BMC and exposes the same memory health score information via RESTful APIs following DMTF Redfish standards. 

The RESTful APIs allow for easy integration with existing data center management software. However, for those data centers less inclined to integrate with their own software, AMI offers a data management tool called AMI Composer, developed to be fully compliant with the Intel Rack Scale Design and DMTF Redfish standards, which will aggregate all information and provide it through a single web-based dashboard.
 

Immediate benefits for data centers and cloud service providers 

Of course, when creating a machine learning algorithm, it is never actually complete. The current Intel MFP model supports DDR4 memory modules running on platforms with Intel Xeon Scalable processors, and Intel continues to collect more information regarding memory errors and failing memory modules to improve models. 

Additionally, when new memory module types are introduced to the industry or improvements to existing technologies are rolled out, Intel MFP will support them. 

Most importantly, all updates will be properly analyzed for inclusion in the MFP model so that as Intel updates the MFP model, AMI will provide easy-to-implement updates to the existing technologies provided to industry partners.

For data centers and cloud service providers, the benefits of adding Intel MFP support in Aptio V UEFI Firmware and MegaRAC BMC Firmware are clear and immediate. Data center SLAs are improved. DIMM failure rates are reduced through proactive memory health evaluation and enhanced memory page offlining policies. 

And, most importantly, higher DIMM performance and reliability optimizes workload and virtual machine (VM) migration decision-making to boost efficiency and flexibility while reducing total cost of ownership.

For companies looking to take advantage of Intel MFP on systems with AMI Aptio V UEFI BIOS and MegaRAC BMC firmware, they are advised to ask their system manufacturer to include the AMI with Intel MFP option pack for MegaRAC BMC Firmware and AMI with Intel Memory Failure Prediction eModule for Aptio UEFI Firmware. 

Related stories
Top stories
Story image
Macquarie Data Centres
Macquarie deal to pioneer CO2-cutting data centre tech in Australia
Macquarie Data Centres has signed a multi-year deal with ResetData, an Australian first provider using Submer data centre technology. 
Story image
Migration
SNP unveils next generation of CrystalBridge software platform
Data is a key pillar of every customer-centric organisation, as it relies on agile decisions to become increasingly sustainable and intelligent.
Story image
Cybersecurity
Zscaler launches co-located data centres in Canberra and Auckland
The investment will offer public and private sector enterprises greater resilience in support of their zero trust cybersecurity posture.
Story image
Hybrid Cloud
HPE GreenLake advances hybrid cloud experience with new services
"The innovations unveiled today further build on our vision to provide the market with an unmatched platform to spur innovation and drive transformation.”
Story image
Sustainability
Aligned Data Centers increases sustainability-linked loan
Aligned Data Centers has increased its sustainability-linked loan from $375 million to $1.75 billion to speed up the next phase of its strategic growth.
Story image
Vietnam
Viettel IDC deploys Cloudian Hyperstore object storage for enhanced cloud solutions
Cloudian has announced that its Hyperstore object storage has been deployed by Vietnam telco Viettel IDC, citing the technology’s flexibility, multi-tenancy and ransomware protection as significant advantages.
Story image
Cloud
Boomi surpasses 20,000 customers. Sets record for the iPaaS space
Boomi has announced it has surpassed the 20,000 customer mark, setting the record for the largest customer base among iPaaS vendors.
Story image
Cloud
SnapLogic improves Intelligent Integration Platform
SnapLogic has released new features and improvements to its Intelligent Integration Platform, which will allow IT, data and business teams to make select processes faster and more straightforward.
Story image
Sustainability
Huawei unveils next-generation sustainable data centers
Huawei says its next-generation data centers will be powered by PowerPOD 3.0, which reduces the footprint by 40% and cuts the energy consumption by 70%.
Story image
Microsoft
Microsoft, Cloudian partnership offers data center flexibility
Cloudian’s HyperStore object storage platform is now integrated and validated to work with Microsoft SQ Server 2022, offering more flexible and scalable data centers.
Story image
Partnerships
Thailand announces launch of the Thailand 5G Alliance
It will promote collaboration between the public and private sector, through companies such as Huawei, to commercially drive Thailand's 5G development.
Story image
Sustainability
SoftIron named global leader for efficient DC infrastructure solutions
SoftIron has been named a global leader for supplying energy-efficient data infrastructure solutions for core-to-edge data centers after an assessment by Earth Capital Ltd.
Story image
Robotic Process Automation / RPA
Micro Focus unveils Data Center Automation for SaaS delivery
MicroFocus has released Data Center Automation (DCA) for software-as-a-service (SaaS) delivery, offering more cost-effective vulnerability risk and IT compliance management.
Story image
Expansion
Colt Technology expands into South Korea data center market
Colt Technology Services has expanded its network into the South Korean market, offering the country’s businesses cost-effective, low latency connectivity.
Story image
Sustainability
Legrand unveils Nexpand, a data center cabinet platform
Legrand has unveiled a new data center cabinet platform, Nexpand, to offer the necessary scalability and future-proof architecture for digital transformation.
Story image
Sustainability
Daikin and SP Group to build new energy efficient district cooling system
The project, set to be complete by 2025, will create a system with a cooling capacity of up to 36,000 refrigerant tonnes (RT). 
Story image
Sustainability
Intel unveils new investments for data center sustainability
Intel has announced two new investments, continuing its efforts to create more sustainable data center technology.
Story image
Data Science
Neo4j announces service delivery alliance with Deloitte
Neo4j has announced a service delivery alliance with Deloitte Consulting Southeast Asia for a range of services to customers within the region.
Story image
Colocation
Digital Edge chooses Nortek’s StatePoint for new data center
Digital Edge will use Nortek's StatePoint liquid cooling technology in its new data center, the first commercial colocation operator in Asia to do so.
Story image
Cloud
Talend introduces new data health solutions for businesses
Talend has announced its latest version of Talend Data Fabric, with the release of Talend Trust Score enabling data teams to establish a foundation for data health.
Story image
Secure access service edge / SASE
Cisco unveils new cloud-managed networking offerings
Cisco has announced new cloud management capabilities that offer a unified experience across the Cisco Meraki, Cisco Catalyst and Cisco Nexus portfolios.
Story image
CASB
Juniper expands SASE offering with data loss prevention capabilities
Juniper has announced the expansion of its SASE offering with the addition of cloud access security broker (CASB) and data loss prevention (DLP) capabilities.
Story image
Healthcare
SnapLogic launches Accelerator for Amazon HealthLake
SnapLogic has launched Accelerator to allow healthcare and life sciences organisations to turn raw data into healthcare-related insights and actions.
Story image
Broadband
Singapore found to have the speediest internet rates in the world
New research from BanklessTimes has shown that Singapore has the highest recorded median internet speed in the world at 207.61 MBPS.
Story image
Infosys
Preparing for the digital decade with the right workforce strategies
For a decade that started under the pall of the pandemic, the 2020s is poised to end with a bang with the digital economy swelling to a high across the world.
Story image
Cybersecurity
Kaspersky opens three new centers to boost data management
Cybersecurity company Kaspersky has opened three new Transparency Centers, one in Japan, the second in Singapore and the third in the United States.
Story image
Sustainability
Evolution Data Centres reveals target of 20 tonnes of CO2 per GWh
Evolution Data Centres launches their new Sustainable Data Centre Charter, which includes targets like only 20 tonnes of carbon emissions per GWh by 2030.
Story image
Infrastructure
SolarWinds IT Trends Report highlights increased cloud complexity for businesses
SolarWinds' new IT Trends report has signalled a significant shift in the way businesses are dealing with hybrid cloud and infrastructure.
Story image
Cybersecurity
Secureworks researches new threat to Elasticsearch databases
Researchers from Secureworks' Counter Threat Unit have identified indexes of multiple internet-facing Elasticsearch databases replaced with a ransom note.
Story image
Infrastructure
Oracle Cloud Infrastructure expands distributed cloud services
“Distributed cloud is the next evolution of cloud computing, and provides customers with more flexibility and control in how they deploy cloud resources."
Story image
Sustainability
AirTrunk boosts Japan presence with West Tokyo data center
AirTrunk is planning to build TOK2, a new hyperscale data center in Japan which will strengthen the company’s presence in the country.
Story image
Infrastructure
New Uptime analysis highlights worsening downtime costs and consequences
New data from Uptime Institute has found that downtime costs and consequences are worsening as those involved in data infrastructure fail to find ways to curb outages.
Story image
Cloud
Vertiv introduces line of redundant power transfer switches
Vertiv has introduced Vertiv Geist Rack Transfer Switch (RTS), a new line of transfer switches that provides redundant power to single-corded devices.
Story image
Cloud
QuSecure partners with DataBridge Sites to showcase platform
QuSecure has partnered with DataBridge Sites to showcase its Quantum-as-a-Service (QaaS) orchestration platform, QuProtect.
Story image
Amazon Web Services / AWS
Qualtrics goes live on AWS Cloud Infrastructure in Japan
Organisations across Japan will now be able to access the Qualtrics XM/OS platform locally via data centre in the AWS Asia Pacific (Tokyo) region.
Story image
Digital Transformation
The Huawei APAC conference kicks off with digital transformation
More than 1500 people from across APAC have gathered for the Huawei APAC Digital Innovation Congress to explore the future of digital innovation.
Story image
Employment
Tech job moves - Forcepoint, Malwarebytes, SolarWinds & VMware
We round up all job appointments from May 13-20, 2022, in one place to keep you updated with the latest from across the tech industries.
Story image
Sydney
Equinix and PGIM Real Estate open data centre in Sydney
Equinix and PGIM Real Estate, the real estate investment and financing arm of PGIM, have announced the first xScale data centre in Sydney, named SY9x.
Story image
Infrastructure
Global investment in data centers more than doubled in 2021
DLA Piper's latest global survey finds the total investment in data center infrastructure worldwide rose from USD $24.4 billion in 2020 to USD $53.8 billion in 2021.
Story image
Cloud
Cisco Live showcases new offerings in its first hybrid event
Cisco Live 2022 has seen Cisco executives and customers take the stage to present a range of discussions in the company’s first-ever hybrid event.
Story image
Cloud
Cloudflare outage in 19 data centers worldwide due to own error
Cloudflare says its outage for 19 of its data centers yesterday was because of a change in a long-running project to increase resilience in its busiest locations.
Story image
Artificial Intelligence
Databricks announces new offering for Unity Catalog
Databricks has significantly expanded data governance capabilities on the lakehouse by unveiling data lineage for Unity Catalog.
Story image
Public Cloud
Public cloud services revenues top $400 billion in 2021
"For the next several years, leading cloud providers will play a critical role in helping enterprises navigate the current storms of disruption."
Story image
Microsoft
SAS Viya on Microsoft Azure to deliver 204% return - study
The Forrester Total Economic Impact study finds SAS Viya on Microsoft Azure brings a 204% return on investment over three years.