Story image

Google’s scalable supercomputers now publicly available

08 May 2019

In what it says is a bid to accelerate the largest-scale machine learning (ML) applications deployed today, Google has opened up its supercomputers.

The global tech giant has created silicon chips called Tensor Processing Units (TPUs), which when assembled into multi-rack ML supercomputers called Cloud TPU Pods can complete ML workloads in minutes or hours that previously took days or weeks on other systems.

Now, Google Cloud TPU v2 Pods and Cloud TPU v3 Pods are publicly available in beta to help ML researchers, engineers, and data scientists iterate faster and train more capable machine learning models.

“Google Cloud is committed to providing a full spectrum of ML accelerators, including both Cloud GPUs and Cloud TPUs. Cloud TPUs offer highly competitive performance and cost, often training cutting-edge deep learning models faster while delivering significant savings,” says Google Brain Team Cloud TPUs senior product manager Zak Stone.

The benefits for ML teams building complex models and training on large data sets, Stone says, include shorter time to insight, higher accuracy, frequent model updates, and rapid prototyping.

“While some custom silicon chips can only perform a single function, TPUs are fully programmable, which means that Cloud TPU Pods can accelerate a wide range of state-of-the-art ML workloads, including many of the most popular deep learning models,” says Stone.

“Cloud TPU customers see significant speed-ups in workloads spanning visual product search, financial modeling, energy production, and other areas. In a recent case study, Recursion Pharmaceuticals iteratively tests the viability of synthesized molecules to treat rare illnesses. What took over 24 hours to train on their on-prem cluster completed in only 15 minutes on a Cloud TPU Pod.”

According to Stone, a single Cloud TPU Pod can contain more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network. The TPU software stack then uses this mesh network to enable many racks of machines to be programmed as a single, giant ML supercomputer via a variety of flexible, high-level APIs.

“The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance, and each one delivers more than 100 petaFLOPs of computing power. In terms of raw mathematical operations per second, a Cloud TPU v3 Pod is comparable with a top 5 supercomputer worldwide (though it operates at lower numerical precision),” says Stone.

“It’s also possible to use smaller sections of Cloud TPU Pods called ‘slices.’ We often see ML teams develop their initial models on individual Cloud TPU devices (which are generally available) and then expand to progressively larger Cloud TPU Pod slices via both data parallelism and model parallelism to achieve greater training speed and model scale.”

Atos develops edge server with security in mind
The BullSequana Edge server is able to securely manage and process IoT data close to the source of data generation so that it is treated immediately.
Sony and Microsoft to explore strategic partnership
“Our partnership brings the power of Azure and Azure AI to Sony."
Google puts Huawei on the Android naughty list
Google has apparently suspended Huawei’s licence to use the full Android platform, according to media reports.
Fujitsu and Veeam partner to offer simplified backup and recovery
This new partnership promises the increased availability of data and faster recovery from disasters and unplanned system downtime.
AAEON wins edge accolades at COMPUTEX 2019
AI edge and IoT network solutions manufacturer AAEON has picked up two accolades at the COMPUTEX d&i Awards 2019.
AI driving 'unprecedented' M&A growth
Breakthroughs in artificial intelligence are causing ‘unprecedented’ growth for mergers and acquisitions, as companies grapple for their share of an AI market that will be worth $190 billion by 2025.
Chorus partners with Nlyte, expands edge data centre offerings
Chorus announced today that it is going ahead with expanding its Chorus EdgeCentre Colocation product to three sites across New Zealand.
Schneider shares advice for solving edge computing challenges
Schneider Electric has shared the findings of a new whitepaper that delves into the issues of deploying IT at the edge.