DataCenterNews Asia Pacific - Specialist news for cloud & data center decision-makers
Enterprise datacenter sleek server racks nvidia gpus energy efficient

VAST Data unveils AI-native storage for gigascale inference

Tue, 6th Jan 2026

VAST Data has unveiled a new AI inference architecture that underpins Nvidia's Inference Context Memory Storage Platform. The move is designed to support long-lived, agent-based AI applications and large-scale inference deployments.

VAST Data is positioning the design as a new class of AI-native storage infrastructure for what it describes as "gigascale" inference. The architecture runs the VAST AI Operating System directly on Nvidia BlueField-4 data processing units (DPUs) and utilises Nvidia Spectrum-X Ethernet networking for data movement. Furthermore, VAST said the platform focuses on the way inference systems store and share key-value cache data, which holds the context of AI conversations and reasoning processes. It is designed to accelerate access to this cache, enable context sharing across multiple compute nodes, and improve power efficiency in dense AI environments.

As AI models shift from single, stateless prompts towards longer, multi-turn dialogue and collaboration between multiple agents, data handling has begun to dominate overall performance. The company said performance now depends increasingly on how well inference history is stored, restored, reused, extended and shared under sustained load, rather than on raw GPU compute throughput alone.


New data path

VAST is rebuilding the inference data path by running its AI Operating System software natively on Nvidia BlueField-4 DPUs within GPU servers. The same software also runs in a dedicated data node architecture. The company said this embeds data services closer to where inference runs and removes traditional client-server contention.

The design seeks to eliminate extra data copies and network hops that can slow the delivery of the first output token as simultaneous workloads increase. VAST is combining this approach with its Disaggregated Shared-Everything architecture, which presents a shared and globally coherent context namespace to each host.

The firm stated that this avoids the coordination overhead that can cause bottlenecks at scale. The architecture creates a direct, parallel path from GPU memory to persistent NVMe storage over RDMA-based fabrics, ensuring that access remains predictable as the volume of users and agents increases.

John Mao, Vice President, Global Technology Alliances at VAST Data, said the shift in focus is changing how customers should think about AI infrastructure.

"Inference is becoming a memory system, not a compute job. The winners won't be the clusters with the most raw compute - they'll be the ones that can move, share, and govern context at line rate," said John Mao, Vice President, Global Technology Alliances at VAST Data "Continuity is the new performance frontier. If context isn't available on demand, GPUs idle and economics collapse. With the VAST AI Operating System on NVIDIA BlueField-4, we're turning context into shared infrastructure - fast by default, policy-driven when needed, and built to stay predictable as agentic AI scales."

Mao said VAST aims to treat context as an infrastructure resource that multiple AI agents and services can use, rather than a transient artefact that resides only in local GPU memory.


Policy and control

VAST is targeting AI-focused organisations and enterprises that are deploying Nvidia-based AI factories and moving from early experiments into production services. These customers often have regulatory, security and operational requirements that affect how they store and handle inference context.

The company stated that customers increasingly require policy control, isolation, auditability, lifecycle management, and optional protection for context data. At the same time, they seek a key-value cache that remains fast enough to function as a shared system resource without the need for constant rebuilding or duplication.

According to VAST, its AI Operating System provides these data services as an integrated part of the platform. The firm said this can reduce repeated cache rebuilds, lower idle GPU time and improve infrastructure efficiency as context sizes grow and more concurrent sessions run on the same infrastructure.

Kevin Deierling, Senior Vice President of Networking at Nvidia, said the change in how AI systems are used has implications for the underlying infrastructure.

"Context is the fuel of thinking. Just like humans that write things down to remember them, AI agents need to save their work so they can reuse what they've learned," said Kevin Deierling, Senior Vice President of Networking, NVIDIA. "Multi-turn and multi-user inferencing fundamentally transforms how context memory is managed at scale. VAST Data AI OS with NVIDIA BlueField-4 enables the NVIDIA Inference Context Memory Storage Platform and a coherent data plane designed for sustained throughput and predictable performance as agentic workloads scale."

VAST intends to showcase its approach to AI and data infrastructure at its inaugural user conference, VAST Forward, aking place from 24–26 February 2026 in Salt Lake City. The event will feature technical sessions, hands-on labs, and certification programmes for both customers and partners.