A Ponemon report found human error accounts for nearly one quarter of all unplanned data center downtime, which Gartner says costs the average company $300,000 per hour.
To reduce the amount of human error in data center management, we would do well to learn lessons from what may seem an unlikely source: the U.S. Navy, and in particular nuclear submarines.
How nuclear subs relate to data center skills
While a nuclear submarine may seem like a completely different beast from a data center, the similarities in how they should be managed are striking and many.
A nuclear sub contains a nuclear reactor plant, a steam plant, electrical and cooling plants, auxiliary systems and more – all stuffed into the back half of the sub.
You can imagine the complexity that goes into such a vessel, yet the Navy has succeeded in minimising human error in the environment by implementing detailed processes and policies – and ensuring they are consistently followed. In addition, multiple levels of system redundancy and interlocks exist, with a back-up system to the back-up system in many cases.
Still, whenever humans are involved, you can't completely eliminate the possibility of human error.
In the Navy's case, what it can do is put an intense focus on the people serving on board. It starts with a competitive selection process followed by 15 months of training before a sailor arrives on board. Once on board, an intense training and qualification process continues indefinitely. Learning never stops.
Apply nuclear sub lessons to data center jobs
Data Centers today need to be operated with this same kind of mission-critical mentality, and thus data center facilities managers should follow many of the same principles as the Navy.
It starts with hiring the right people. Schneider Electric makes no secret about the fact that it seeks out military veterans for its Data Center Facility Operations group, the folks who run some of the world's largest data centers.
We've found military veterans have the right background for success in data center careers. They understand the importance of having well-documented processes and procedures, and following them religiously.
In the data center, that means having standard operating procedures (SOPs) for everyday operations and methods of procedure (MOPs) for conducting maintenance routines.
Having an emergency operation procedure (EOP) that is easy to memorise and readily available is also priceless in a time of crisis.
Data Center personnel must know exactly how to stabilise a data center should a generator not start or if a breaker unexpectedly trips.
The U.S. Navy has formal training around the methodical sharing of information using status boards, change control processes, and documentation of all maintenance.
These are all sound practices for running any mission-critical facility, including a data center.
Finally, data center personnel, like the sailors on those nuclear subs, should always be learning.
Continuing education via on-the-job training as well as formal schooling and periodic drills are imperative to minimising human error and fostering continued process improvements.
That's why Schneider Electric has a formal Critical Environment Technician (CET) training programme in place for the folks who run our customers' data centers.
They learn data center skills including how to effectively use advanced monitoring and management tools such as EcoStruxure IT to ensure data center uptime.
The programme is also crucial for employee retention, which is a big issue in the data center realm; so long as employees are learning, they tend to want to stay.