Pliops & Zilliz partner to scale enterprise AI with affordable RAG
Pliops has announced a collaboration with Zilliz to enable affordable, large-scale Retrieval-Augmented Generation (RAG) for enterprise AI, targeting databases at multi-billion scale while aiming to keep infrastructure costs down.
The partnership combines Pliops' LightningAI hardware architecture with the Milvus vector database from Zilliz, providing performance and scalability enhancements to accelerate data retrieval for generative AI workloads.
Enterprise AI demands
Enterprises are increasingly dependent on vector search and large-context retrieval to power next-generation AI applications. Scaling such workloads to billions of vectors often presents barriers due to high memory requirements and associated costs. The joint work between Pliops and Zilliz is focused on overcoming these challenges with hardware-enabled vector search and efficient memory usage, bringing the prospect of scalable AI inference within reach for more organisations.
Technology integration
Milvus, an open-source vector database from Zilliz, is used extensively for distributed and cloud-native vector search across vast AI datasets. Enhancements announced as part of the collaboration include the addition of multi-tier storage support, key-value (KV) mapping for efficient caching, and a dual-tier architecture that separates ultra-fast 'hot' flash storage from cost-efficient, globally distributed 'cold' storage using the S3 interface. These advances are intended to support performance optimisation and cost efficiency at scale.
The roadmap includes introducing a Near Compute Storage (NCS) layer, providing shared 'hot storage' capacity between compute nodes and object storage. Pliops and Zilliz expect that combining Milvus with the LightningAI architecture will unlock larger context windows for AI models while reducing both infrastructure and memory costs.
KV-cache offload
Central to these advancements is hardware-accelerated KV-Cache Offload, a feature that supports efficient retrieval and inference for large language models and other AI applications. By moving key-value processing closer to storage hardware, the solution enables more users per GPU and larger models without the typical memory limitations.
Executive perspectives
"LightningAI is designed to make AI inference scalable and affordable. Partnering with Zilliz brings the best of storage and retrieval intelligence together," said Ido Bukspan, CEO, Pliops.
Charles Xie, Founder and CEO of Zilliz, commented:
"Pliops' LightningAI introduces a breakthrough approach to scaling GenAI inference, and integrating it with Milvus unlocks truly massive context retrieval at a fraction of the traditional cost. As the creators of Milvus, we're committed to advancing what's possible in vector search. This collaboration gives enterprises a clear path to run larger models, access more knowledge, and deliver faster AI experiences - all without the memory limitations that have constrained GenAI until now," said Xie.
Ecosystem expansion
Pliops is expanding its AI ecosystem through integration with server providers such as Viking Enterprise Solutions and Giga Computing. These integrations make use of the LightningAI memory architecture in turnkey platforms for KV-cache offload and large language model inference. By coupling Pliops' hardware-accelerated approaches with advanced server systems, organisations can deploy generative AI workloads in both data centre and edge environments more efficiently.
Open source and availability
Pliops and Zilliz will publish technical details and a request for comment (RFC) about the Near Compute Storage enhancements for Milvus via open-source channels. The progress of this initiative will be shared with the developer and enterprise communities as part of their broader push to increase accessibility and efficiency in AI infrastructure.