DataCenterNews Asia Pacific - Specialist news for cloud & data center decision-makers
Global secure low latency ai data centers cloud edge network

Arrcus links AINF with NVIDIA stack for AI inference

Wed, 18th Mar 2026

Arrcus has integrated its Arrcus Inference Network Fabric (AINF) with several parts of NVIDIA's AI infrastructure, aiming to improve how organisations route and secure AI inference traffic across edge, data centre, and cloud environments.

The integration spans the NVIDIA Dynamo framework, NVIDIA BlueField-3 data processing units (DPUs), NVIDIA Spectrum-X Ethernet networking, and NVIDIA GPU platforms. Arrcus describes the combined stack as an inference fabric that applies policies as requests enter the network, then routes them to the most suitable site and path.

Interest in inference infrastructure has grown as organisations shift from centralised AI training to dispersed, user-facing deployments. These deployments often span multiple regions and execution locations, amplifying the impact of network latency and site-level capacity constraints.

Inference shift

Arrcus is targeting workloads it calls "Physical and Agentic AI applications". It argues that agentic workflows increasingly involve many inference calls across models and tools as part of a single task. This pattern raises the need for fast request handling and traffic management that accounts for model availability, priority, and policy restrictions.

Shekar Ayyar, chairman and CEO of Arrcus, said networking is becoming central to the economics of large-scale inference.

"AI is entering its inference era, where networking becomes the control plane for performance and economics. By integrating AINF with NVIDIA AI technologies, we are enabling operators and enterprises to intelligently route inference traffic, maximize GPU utilisation and deliver real-time AI services at global scale."

Arrcus positions AINF as a control layer that determines which model should handle a request and where it should run. It also emphasises policy enforcement at the point of network entry, linking it to geo-aware routing and data-sovereignty controls across jurisdictions.

How it works

The design splits global routing decisions from within-site balancing. AINF works with NVIDIA Dynamo, which handles site-local load balancing for large language model inference, while AINF manages routing across a distributed set of sites.

In this setup, AINF selects the site for an inference request, and Dynamo selects the replica within that site. Arrcus says this is intended to improve compute utilisation across large deployments by aligning routing decisions with real-time conditions.

Arrcus says NVIDIA Dynamo provides telemetry signals, including queue depth, KV-cache pressure, and replica health. AINF ingests those signals via a "Site Agent" into its control plane, then computes model-aware routing decisions across sites based on factors such as model availability, service tier, geofencing policies, site capacity, and network health.

Arrcus also describes AINF as a conductor for agentic AI workflows, using intelligent classifiers running on NVIDIA AI infrastructure to select a best-fit model and route requests in real time. It says the approach supports lightweight models at the edge as well as centralised execution for more complex reasoning workloads.

Security and transport

The integration also covers security for traffic that crosses multiple locations. AINF integrates with NVIDIA BlueField-3 DPUs, which Arrcus says can encrypt traffic at line rate without consuming host CPU resources. It positions this as relevant for multi-site inference where policies require traffic to remain encrypted while maintaining low latency.

On the switching and network side, Arrcus highlighted NVIDIA Spectrum-4 Ethernet switches and NVIDIA GPU platforms as part of an end-to-end design. The focus is on traffic steering across WAN and multi-site environments, where inference requests may move between edge locations, regional data centres, and cloud sites.

Arrcus links the integration to requirements, including ultra-low latency, geo-aware routing, model selection, efficient GPU utilisation, and multi-site connectivity. It also highlights latency-sensitive use cases such as voice, video, and gaming, where differentiated routing and prioritisation can be applied.

Operator interest

Lightstorm, which operates network infrastructure in Asia-Pacific, was cited as an example of an operator planning for distributed inference. It pointed to the operational challenge of maintaining low-latency connectivity over long distances across the region.

"AI inferencing at scale across Asia‐Pacific demands reliable, low‐latency connectivity across vast WAN distances. Lightstorm is enabling hyperscalers, neoclouds and enterprises with the network foundation required for this shift, and by leveraging Arrcus' AINF solution powered by NVIDIA, we're excited to make real‐time, large‐scale inferencing into a deployable reality in the region," said Amajit Gupta, group CEO and managing director of Lightstorm.

Arrcus is demonstrating the Inference Network Fabric and the NVIDIA integration at NVIDIA GTC, where it is presenting the product alongside NVIDIA's AI-Grid.