Story image

The limitations of an outdated remote data center monitoring system

28 Feb 2017

Data center monitoring services have been around for over 10 years. Over that period of time, many of these systems have not been updated to reflect changing data center technologies.

As a result, the lives of systems administrators have become more complex and maintaining data center uptime has become more of a challenge.

When compared to the systems of 10 years ago, modern data center power and cooling infrastructure has become more intelligent.

With more built-in data points, these systems produce, on average, 300% more alarm notifications than they did in the past. Therefore, data center staffs have to deal with much more alarm support “busy work”.

The whole point of monitoring data centers is to reduce the risk of downtime by identifying and addressing a state change before an uptime-threatening incident occurs.

This becomes a challenge when alarm fatigue overwhelms the staff, when no unified monitoring platform exists (i.e., individual power and cooling devices have their own native management solution), and when administrators find themselves having to contact various vendor customer support lines for help.

Traditional remote monitoring is not an online service and therefore it cannot provide real-time monitoring. Instead these older systems produce intermittent status updates, oftentimes via email. New digital remote monitoring systems are connected to a data center, usually through a gateway.

Therefore, these new systems can employ IT services such as cloud storage and data analytics to help system administrators cope with the vast increase of equipment performance data.

Simplicity = efficiency

New on-line monitoring systems simplify system administrator work because they employ big data analytics and machine learning techniques.

Big data analytics are supported by software tools that process the monitoring system data so that decisions can be made on which actions to take. Big data analytics are required when data volumes increase, when data becomes unstructured (i.e. data variety like emails, free-form text fields, or trouble tickets) and when data is processed in real-time.

Machine learning is related to data analytics in that it uses data to make predictions. However, it also improves the overall support model by factoring in results from previous learning. That means the monitoring system gets smarter over time.

These tools also streamline how data center operators manage systems uptime. In the case of a data center remote monitoring service, event processing and prioritization of alarms can be much more efficiently managed. Network Operation Center (NOC) experts can notify and guide systems operators during an event that triggers multiple alarms. Alarm consolidation can convert multiple alarms from the same device into a single incident.

Since so many data center operators now use mobile devices as a common interface into systems, automatic trouble ticket generation can be provided through a mobile app which can track incidents via live chats and instant messages. Contextual alarms can provide administrators with useful information like the origin of the problem (e.g. data center X, data hall Y, rack 15C), who’s involved, the number of alarms generated, and what to check first.

Event correlation and root cause analysis can be performed which evaluates multiple alarms, deduces possible causes, and proposes possible solutions. This correlation process, performed by domain experts in a NOC, can be combined with machine learning so that future downtime incidences can be avoided.

Data centers are on a path to become more reliable and efficient through the use digital remote monitoring. However, this can only happen with platforms that interpret and leverage the data generated by the physical infrastructure in a data center. 

Article by Victor Avelar, Schneider Electric Data Center Blog

Inspur announces AI edge computing server with NVIDIA GPUs
“The dynamic nature and rapid expansion of AI workloads require an adaptive and optimised set of hardware, software and services for developers to utilise as they build their own solutions."
Cohesity and Softbank partner to offer data services in Japan
The joint venture asserts it will enable Japanese enterprises to back up, store, manage and derive insights from all of their secondary data and applications.
ADLINK and Charles announce multi-access pole-mounted edge AI solution
The new solution is a compact low profile pole or wall mountable unit based on an integration of ADLINK’s latest AI Edge Server MECS-7210 and Charles’ SC102 Micro Edge Enclosure. 
How Dell EMC and NVIDIA aim to simplify the AI data centre
Businesses are realising they need AI at scale, and so enterprise IT teams are increasingly inserting themselves into their company’s AI agenda. 
Huawei commits to Hong Kong with new cloud tech
Hosting its 2019 Cloud Summit in Hong Kong, Huawei announced it is throwing significant investment into region.
Time to build tech on the automobile, not the horse and cart
Nutanix’s Jeff Smith believes one of the core problems of businesses struggling to digitally ‘transform’ lies in the infrastructure they use, the data centre.
Cloud providers increasingly jumping into gaming market
Aa number of major cloud service providers are uniquely placed to capitalise on the lucrative cloud gaming market.
Intel building US’s first exascale supercomputer
Intel and the Department of Energy are building potentially the world’s first exascale supercomputer, capable of a quintillion calculations per second.