Story image

Lessons learned from running the world’s largest data centers

15 Jan 18

While managing facility operations for large data centers certainly takes specialized skills in a range of disciplines, the more you do it, the better you get at it.

Given that Schneider Electric has more than 800 people managing facility operations for some 100 large data centers around the globe, it’s fair to say we’ve learned a great deal.

In fact, I recently viewed a webinar that a colleague of mine presented on the topic, “Lessons Learned from Running the World’s Largest Data Centers.” 

In this post, I’ll pass along at least a few of those lessons (and invite you to check out the webinar for the rest).

Most of the lessons we’ve learned fall into one of five general categories:

  • Competency
  • Standardization
  • Risk management
  • Tracking and reporting
  • Operation and maintenance costs


In terms of competency, the main issue is that most companies have expertise that lies in areas other than managing data centers, a topic we covered in this previous post.

That’s as it should be.

If you’re in, say, retail, healthcare or manufacturing, your expertise lies in those areas; the data center is merely a supporting function.

But it’s an issue if you want to run the data center using internal employees, because you don’t have a large workforce to pull from. I’ve been to conferences where entire panels have been dedicated to the issue of training millennials in data center operations. Universities are only now starting programs to address the issue.

As a result, we routinely see companies with data center infrastructure management (DCIM) and other tools installed, but they’re not using them to their full extent – because they simply don’t have the appropriate expertise.


With respect to standardization, companies tend to run into trouble after mergers and acquisitions, or if they experience rapid growth.

They wind up with a series of data centers, with no common set of standards in terms of how to operate them.

No matter if you’ve got two data centers or 20, you need to share learnings among all of them.

Schneider Electric’s standards and procedures are best in class in part because we are diligent about sharing what we learn in operating each one of the 100 or so that we operate. We use those learnings to continually update our processes and procedures so when a problem occurs, we have sound emergency procedures in place to follow.

They should include back-out procedures to follow in the event something unexpected happens after a data center change – to prevent the issue from getting worse.

Risk management

Such procedures are closely related to the risk management topic. One of the big lessons here is to have a full-system approach to data center management.

If you need to take a component out of service to perform maintenance, for example, you need to first understand the impact and dependencies of that component with respect to the rest of the data center.

Doing so requires a thorough understanding of the data center.

For any data center we manage, Schneider Electric likes to get in on the construction phase, or as close to it as possible.

That way we can gain a thorough understanding of the architectural drawings, piping, wiring and so forth – all of which is knowledge that helps mitigate the risk that goes into operating a data center.

Tracking and reporting

Tracking and reporting is an area that gets overlooked far too often, leading to wasted operational costs.

With proper tracking and reporting, you should be able to identify stranded IT capacity – that old rack of servers over in the corner, for example, that nobody is really sure still serves a purpose. (We’ve all seen those, right?) 

Reclaiming that capacity can help you stave off a data center expansion by getting more out of the space you’ve already got.

Operation and maintenance costs

Which leads to the final area, operation and maintenance costs.

We’ve learned plenty of lessons in how to keep these costs down, like using condition-based and predictive maintenance to replace components only when they really need it, as opposed to when some schedule says they do. 

And if you effectively track your assets (see previous point), then you can start determining which ones require the most maintenance – and potentially save money by replacing them. 

Article by Anthony DeSpirito, Schneider Electric Data Center Blog 

Data centre cybersecurity actions that most people overlook
Schneider’s Steven Carlini discusses ways to improve data centre cybersecurity that most people don’t think of until it’s too late.
Alibaba Cloud showcases commitment to Hong Kong
The company’s service capability in Hong Kong has doubled since it established its first data centre in the city in 2014.
5 tips to reduce data centre transceiver costs
Keysight Technologies' Nicole Faubert shares her advice on how organisations can significantly reduce test time and cost of next-generation transceivers.
The new world of edge data centre management
Schneider Electric’s Kim Povlsen debates whether the data centre as we know it today will soon cease to exist.
Can it be trusted? Huawei’s founder speaks out
Ren Zhengfei spoke candidly in a recent media roundtable about security, 5G, his daughter’s detainment, the USA, and the West’s perception of Huawei.
SUSE partners with Intel and SAP to accelerate IT transformation
SUSE announced support for Intel Optane DC persistent memory with SAP HANA.
Inspur uses L11 rack level integration to deploy 10,000 nodes in 8 hours
Inspur recently delivered a shipment of rack scale servers of more than 10,000 nodes to the Baidu Beijing Shunyi data center within 8 hours.
How HCI helps enterprises stay on top of data regulations
Increasing data protection requirements will supposedly drive the demand for Hyper-Converged Infrastructure solutions across the globe.