dcn-as logo
Story image

The ghost in the machine that maximizes uptime

01 Apr 2020

Article by Intel Data Center Management Solutions general manager Jeff Klaus.

China is the largest eCommerce market in the world, projected to reach USD $1.1 trillion in 2023, up from $572 billion in 2017, according to Statista. The combined forces of the country’s robust economic growth, a rapidly emerging middle class, a large population of computer literate consumers, and the proliferation of smartphones are among the major factors driving the sale of physical goods and services via digital channels.

Headquartered in Beijing, China, Meituan-Dianping (Meituan) offers an online delivery and social commerce platform whose apps connect consumers with local businesses for food delivery, groceries, restaurant recommendations, hotel bookings, movie tickets, bike sharing, and health and fitness products and services. 

Named to Fast Company's top-50 list of the world’s most innovative companies based in China last year, Meituan is keenly focused on developing new ways to make its delivery platform more cost-effective and efficient. The firm also leverages a wealth of data to assist local businesses to find new opportunities in the marketplace, such as determining where they might expand with new restaurants or retail locations.  Like any thriving eCommerce company that strives to remain competitive and successful, Meituan must be able to rely on the health of its data center infrastructure.

Staying online and open for business

A Ponemon Institute study found that the average cost of an outage in the eCommerce sector is $758,000. However, for an eCommerce business, downtime can result in collateral damage that goes far beyond the immediate loss of revenue. Brand reputation, productivity, and search engine optimization (SEO) ranking can also suffer in the wake of an outage. 

Imagine pulling down the shade and locking the front door of your local business just as a customer arrives at the threshold. In effect, that’s just what occurs at an eCommerce company experiencing a downtime event when customers can’t access their app.

Memory failures are one of the top three hardware failures that occur in data centers today. Intel Memory Failure Prediction (Intel MFP) is an ideal solution for organizations operating online services platforms like Meituan, as well as cloud service providers relying heavily on server hardware reliability, availability and serviceability. Intel MFP helps to significantly reduce memory failure events by analyzing data and then predicting catastrophic events before they happen.

Recently, Meituan deployed Intel MFP in a test environment containing several thousands of servers based on Intel Xeon Scalable Processors to help improve the performance and reliability of its server memory, which is essential to a data analytics computing environment. Intel MFP uses machine learning to analyze server memory errors down to the Dual Inline Memory Module (DIMM), bank, column, row, and cell levels to generate a memory health score, which can be used to predict potential failures. 

Maintaining SLAs and maximizing uptime

Meituan monitored the health of the memory modules of their servers by integrating Intel MFP into their existing data center management solution. By analyzing data that was previously collected by their data center management software, they were able to generate prediction scores for each Dynamic Random Access Memory (DRAM) module, and then take appropriate action to maintain their service level agreements (SLAs) and maximize their service uptimes. 

Intel MFP generated memory health scores that assisted Meituan to make memory reliability-aware decisions in workload scheduling, such as migrating the critical tasks running on distressed servers to other servers, providing ample time to take actions and avoid critical application crashes. Moreover, by analyzing memory errors and predicting potential memory failures before they happen, Intel MFP helped improve DIMM replace strategy.

Intel MFP can also help Meituan optimize OS page offlining. When there is a burst in the number of errors in a specific memory region, that region is soon likely to break down. By detecting this early, Intel MFP can suggest disabling faulty memory pages, preventing them from being used again, and thus reducing the risk of uncorrectable errors. Page offlining has become critical for large-scale data centers. 

By integrating Intel MFP into data center management solution, Meituan was able to analyze the health of the memory of the servers in its test environment. This helped Meituan to predict failures before they happen and make informed decisions such as using page offlining and migrating workloads and tasks to other servers. 

The initial Intel MFP test deployment indicated that if Meituan deployed the solution across its full server network, server crashes caused by hardware failures could be reduced by up to 40%, delivering a better experience for hundreds of millions of its customers and local vendors. 

Worldwide, eCommerce sales amounted to $3.53 trillion last year and are projected to grow to $6.54 trillion in 2022. While desktop PCs remain the most popular device for placing online orders, mobile devices, particularly smartphones, are quickly closing the gap. In fact, in China, four out of five eCommerce dollars were generated from mobile devices last year, according to a report by eMarketer. 

As the consumers of Asia and the rest of the world continue on the path towards unfettered mobility, and as the IoT becomes more and more integrated with the apps that increasingly affect how we live and do business, the demands on data center infrastructure will only become greater. 

Hence, for organizations operating online services platforms that depend on maximum availability, solutions such as Intel MFP, which provides real-time visibility into server memory health and can predict catastrophic server memory failures before they happen, will become critical to their ongoing innovation and expansion. 

Story image
NetApp and Google Cloud bring greater file storage opportunities to Singapore
With this latest announcement, NetApp and Google Cloud are focused on helping organisations in Singapore boost application and business performance with shared file storage on Google Cloud. More
Link image
Virtual Tour: A brand new data center in Jakarta
SpaceDC has announced the opening of JAK2, a 25.45 MW ID01 campus and colocation data center in Jakarta. You are invited to the virtual launch on 4 November 2020, where you will be given a tour of the 1.3 PUE rated facility.More
Story image
Innovation Summit East Asia 2020: The significance of sustainability and resilience to business success
Schneider Electric's Innovation Summit East Asia 2020 on 4 November will leverage the experience of experts to deliver informative sessions on how to stay agile by utilising resilient and sustainable strategies.More
Story image
Nokia to migrate all on-prem IT infrastructure to Google Cloud
Nokia’s infrastructure and applications will operate in the public cloud or in a software-as-a-service model from now, and the company expects the extensive migration to take between 18 and 24 months to complete.More
Story image
Aruba dreams big with two more data centres planned
ruba’s global data centre network is undergoing an expansion that will add two more facilities and upgrades to the company’s hydroelectric and photovoltaic plants in Italy.More
Story image
IDC: Google a ‘Leader’ in cloud data analytics
IDC MarketScape notes that Google Cloud is built for cloud native agile and outcome-based digital innovation and product development.More