Google's data center security and design best practices

Tue, 4th Oct 2016

FYI, this story is more than a year old

Google's focus on security and protection of data is a key design criteria. Our physical security features a layered security model, including safeguards like custom-designed electronic access cards, alarms, vehicle access barriers, perimeter fencing, metal detectors and biometrics. The data center floor features laser beam intrusion detection.

Our data centers are monitored 24/7 by high-resolution interior and exterior cameras that can detect and track intruders. Access logs, activity records and camera footage are available in case an incident occurs.

Data centers are also routinely patrolled by experienced security guards who have undergone rigorous background checks and training (look closely and you can see a couple of them in this 360 degree data center tour).

As you get closer to the data center floor, security measures increase. Access to the data center floor is only possible via a security corridor which implements multi-factor access control using security badges and biometrics. Only approved employees with specific roles may enter. Less than one percent of Google employees will ever set foot in one of our data centers.

We employ a very strict end-to-end chain of custody for storage, tracking everything from cradle to grave, from the first time a HD goes into a machine until it's verified clean/erased or destroyed. Information security and physical security go hand-in-hand. Data is most vulnerable to unauthorised access as it travels across the Internet or within networks.

For this reason, securing data in transit is a high priority for Google. Data traveling between a customer's device and Google is encrypted using HTTPS/TLS (Transport Layer Security). Google was the first major cloud provider to enable HTTPS/TLS by default.

We build our own hardware and monitoring systems

Google servers don't include unnecessary components such as video cards, chipsets or peripheral connectors, which can introduce vulnerabilities. Our production servers run a custom-designed operating system (OS) based on a stripped-down and hardened version of Linux.

Google's servers and their OS are designed for the sole purpose of providing Google services. Server resources are dynamically allocated, allowing for flexibility in growth and the ability to adapt quickly and efficiently, adding or reallocating resources based on customer demand.

For these teams to be successful they must have advanced, real-time visibility into the status and functionality of our infrastructure. As you might know, Google is obsessed with data, which is a bit of an understatement.

To aid our teams, we've built monitoring and controls systems for all functional areas, from the servers, storage and networking systems, to the electrical distribution, mechanical cooling systems and security systems. We're monitoring all aspects of performance and operations from “chip to chiller.

Using machine learning to optimize data center operations

To help in this endeavor, we're using our machine learning / deep learning algorithms for data center operations. As you can imagine, our data centers are large and complex, with electrical, mechanical and controls systems all working together to deliver optimal performance.

Because of the sheer number of interactions and possible settings for these systems, it's impossible for mere mortals to visualise how best to optimise the data center in real time. However, it's fairly trivial for computers to crunch through these possible scenarios and find the optimal settings.

Over the past couple years we've developed this algorithm and trained it with billions of data points from our sites all over the world. We now use this machine learning model to help visualize the data so the operations teams can set up the data center electrical and cooling plants for the optimal, most efficient performance on any given day considering up to 19 independent variables that affect performance. This helps the team identify discontinuities or efficiency inflection points that aren't intuitive.

Powered by renewable energy

On the energy side, we're committed to powering our infrastructure with renewable energy. We're the world's largest private investor in renewable energy. To date we've invested more than $2 billion in renewable energy Power Purchase Agreements.

These PPA's are very important because (1) we're buying the entire output of wind and solar farms for long periods, typically 10-20 years, (2) these wind farms are on the same power grids as our data centers, and (3) wind farms and data centers sharing power grids gives the project developer the financial commitment they need to get the project built, so we know our investment is adding renewable power to the grid that wouldn't otherwise have been added.

For cooling, we've redesigned our fundamental cooling technology on average about every 12-18 months. Along the way, we've developed and pioneered innovations in water-based cooling systems such as seawater cooling, industrial canal water cooling, recycled / grey water cooling, stormwater capture and reuse, rainwater harvesting and thermal energy storage.

We've designed data centers that don't use water-based solutions, instead using 100% outside air cooling. The point is there's no "one size fits all" model here. Each data center is designed for the highest performance and highest efficiency for that specific location.

Google employees operate our data centers, not third parties

The norm in the industry is for the design and building contractor to drop off a set of owners manuals and drawings along with the keys to the front door and wish the operator of the data center good luck! All too often these operations teams aren't employed by the owner, but rather an outsourced low-bidder. This is not the case at Google.

Our employees manage and operate our data centers. If there's one certainty in data center operations, it's that problems and faults will always happen in the middle of the night - typically on Sundays - when nobody else is around to help :-)

Engineering + operation teams are combined

We also take a different approach to the people we hire and how they run our data centers. Our engineers and operations professionals come from very diverse backgrounds but all have a common trait - they're systems thinkers. Many of our team members come from mission critical environments, like the Navy nuclear submarine program, where mistakes can be catastrophic - they understand how systems interact together.

Further, we've built regional site teams at all our data center campuses comprised of the engineers responsible for the design and construction, working side-by-side with the operations teams. Together these integrated teams are responsible for building capacity, commissioning the systems and providing 7x24 operations. This gives us an unparalleled level of ownership of our infrastructure.

Article by Joe Kava, VP, Data Center Operations, Google