Story image

The limitations of an outdated remote data center monitoring system

28 Feb 17

Data center monitoring services have been around for over 10 years. Over that period of time, many of these systems have not been updated to reflect changing data center technologies.

As a result, the lives of systems administrators have become more complex and maintaining data center uptime has become more of a challenge.

When compared to the systems of 10 years ago, modern data center power and cooling infrastructure has become more intelligent.

With more built-in data points, these systems produce, on average, 300% more alarm notifications than they did in the past. Therefore, data center staffs have to deal with much more alarm support “busy work”.

The whole point of monitoring data centers is to reduce the risk of downtime by identifying and addressing a state change before an uptime-threatening incident occurs.

This becomes a challenge when alarm fatigue overwhelms the staff, when no unified monitoring platform exists (i.e., individual power and cooling devices have their own native management solution), and when administrators find themselves having to contact various vendor customer support lines for help.

Traditional remote monitoring is not an online service and therefore it cannot provide real-time monitoring. Instead these older systems produce intermittent status updates, oftentimes via email. New digital remote monitoring systems are connected to a data center, usually through a gateway.

Therefore, these new systems can employ IT services such as cloud storage and data analytics to help system administrators cope with the vast increase of equipment performance data.

Simplicity = efficiency

New on-line monitoring systems simplify system administrator work because they employ big data analytics and machine learning techniques.

Big data analytics are supported by software tools that process the monitoring system data so that decisions can be made on which actions to take. Big data analytics are required when data volumes increase, when data becomes unstructured (i.e. data variety like emails, free-form text fields, or trouble tickets) and when data is processed in real-time.

Machine learning is related to data analytics in that it uses data to make predictions. However, it also improves the overall support model by factoring in results from previous learning. That means the monitoring system gets smarter over time.

These tools also streamline how data center operators manage systems uptime. In the case of a data center remote monitoring service, event processing and prioritization of alarms can be much more efficiently managed. Network Operation Center (NOC) experts can notify and guide systems operators during an event that triggers multiple alarms. Alarm consolidation can convert multiple alarms from the same device into a single incident.

Since so many data center operators now use mobile devices as a common interface into systems, automatic trouble ticket generation can be provided through a mobile app which can track incidents via live chats and instant messages. Contextual alarms can provide administrators with useful information like the origin of the problem (e.g. data center X, data hall Y, rack 15C), who’s involved, the number of alarms generated, and what to check first.

Event correlation and root cause analysis can be performed which evaluates multiple alarms, deduces possible causes, and proposes possible solutions. This correlation process, performed by domain experts in a NOC, can be combined with machine learning so that future downtime incidences can be avoided.

Data centers are on a path to become more reliable and efficient through the use digital remote monitoring. However, this can only happen with platforms that interpret and leverage the data generated by the physical infrastructure in a data center. 

Article by Victor Avelar, Schneider Electric Data Center Blog

Is Supermicro innocent? 3rd party test finds no malicious hardware
One of the larger scandals within IT circles took place this year with Bloomberg firing shots at Supermicro - now Supermicro is firing back.
Record revenues from servers selling like hot cakes
The relentless demand for data has resulted in another robust quarter for the global server market with impressive growth.
Opinion: Critical data centre operations is just like F1
Schneider's David Gentry believes critical data centre operations share many parallels to a formula 1 race car team.
MulteFire announces industrial IoT network specification
The specification aims to deliver robust wireless network capabilities for Industrial IoT and enterprises.
Google Cloud, Palo Alto Networks extend partnership
Google Cloud and Palo Alto Networks have extended their partnership to include more security features and customer support for all major public clouds.
DigiCert conquers Google's distrust of Symantec certs
“This could have been an extremely disruptive event to online commerce," comments DigiCert CEO John Merrill. 
Schneider Electric's bets for the 2019 data centre industry
From IT and telco merging to the renaissance of liquid cooling, here are the company's top predictions for the year ahead.
China to usurp Europe in becoming AI research world leader
A new study has found China is outpacing Europe and the US in terms of AI research output and growth.