DataCenterNews Asia Pacific - Specialist news for cloud & data center decision-makers
Story image

Google Cloud unveils AI-focused updates to Kubernetes Engine

Thu, 10th Apr 2025

Google Cloud has revealed a series of enhancements to its Google Kubernetes Engine (GKE), aimed at empowering platform teams to effectively manage artificial intelligence (AI) workloads at scale.

These improvements feature the Cluster Director for GKE, which is now generally available and allows deployment and management of large clusters of accelerated virtual machines (VMs) functioning as a single unit. Gabe Monroy, Vice President of Cloud Runtimes at Google, addressed the significance of these enhancements, recognising that adapting to the new AI era may be demanding. He stated, "The age of AI is now. In fact, the global AI infrastructure market is on track to increase to more than USD $200 billion by 2028."

Monroy further indicated that existing Kubernetes expertise can be capitalised upon in managing these AI-driven operations. "You don't need to start from scratch. In fact, you're well on your way — your Kubernetes skills and investments aren't just relevant, they're your AI superpower," he explained.

In response to growing AI model sizes and their processing demands, the Cluster Director enables the orchestration of clusters through standard Kubernetes application programming interfaces (APIs). It supports the deployment of models across multiple hosts and operates extensive clusters of Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) as a single unit. Additionally, faulty clusters are automatically repaired by the system to ensure performance continuity.

On the AI inference front, new capabilities have been introduced. The GKE Inference Quickstart, now in public preview, aids in selecting AI models by providing benchmarked profiles that include infrastructure and accelerator configurations, along with essential resources required for model performance.

The GKE Inference Gateway, also in public preview, is designed to optimise model performance through intelligent routing and load balancing for AI inference, thereby reducing serving costs by up to 30%, tail latency by up to 60%, and increasing throughput by up to 40%.

GKE Autopilot has also received performance upgrades, aiming to address the common issue of resource over-provisioning. The enhancements aim to improve pod scheduling, scaling reaction times, and capacity right-sizing, built on exclusive hardware capabilities available within Google Cloud. These updates ensure that cluster capacity is adjusted in line with workload demand, thereby enhancing the efficiency of resource utilisation.

Recognising the need for enhanced troubleshooting tools, Google has announced the private preview of Gemini Cloud Assist Investigations. This feature provides AI-powered assistance aimed at diagnosing pod and cluster issues, mitigating delays caused by debugging and contributing to a smoother innovation process.

Monroy highlighted the utility of GKE in serving Google's own AI services and its potential to meet similar demands in external enterprises. "At Google, we use GKE to power our leading AI services — including Vertex AI — at scale, relying on the same technologies and best practices that we're sharing with you today."

RayTurbo on GKE, an optimised version of the Ray open-source framework, is also set to launch, delivering enhanced data processing speeds and resource efficiency for AI/ML engineers on Kubernetes clusters. Google has worked in partnership with Anyscale to deliver this offering, continuing its commitment to advancing the capabilities of Kubernetes as an effective platform for AI operations.

These updates underline Kubernetes' role as a vital infrastructure platform for AI, aligning with trends where significant entities such as IBM, Meta, NVIDIA, and Spotify are leveraging Kubernetes for their AI and machine learning workloads.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X