AWS launches Trn3 UltraServers to boost AI speed & cut costs
AWS has unveiled Amazon EC2 Trn3 UltraServers, powered by its new Trainium3 artificial intelligence chip based on 3nm technology. The launch targets organisations seeking greater performance in AI model training and inference, while reducing associated costs and energy consumption.
Chip performance
The Trainium3 chip delivers up to 4.4 times more compute performance and up to 4 times higher energy efficiency than its predecessor, Trainium2. Each Trn3 UltraServer can house up to 144 Trainium3 chips and achieve up to 362 FP8 petaFLOPs. The system features nearly four times more memory bandwidth, facilitating the rapid processing needed for larger and more complex AI models.
Trainium3 includes improvements in chip architecture, interconnects, and memory subsystems. These upgrades are designed to remove bottlenecks typical in large-scale AI training work. AWS reports energy efficiency gains of 40% compared to earlier models, which is expected to reduce operational costs and the carbon footprint in data centres.
Data throughput
The Trn3 UltraServer uses a vertically integrated approach from chip through to software layer, intended to address communication delays seen in distributed computing. Its new NeuronSwitch-v1 component doubles the bandwidth available within each UltraServer, while the Neuron Fabric networking limits communication latencies between chips to under 10 microseconds.
These networking improvements serve AI applications where low-latency responses are critical, such as reinforcement learning and advanced agentic systems. According to AWS, Trn3 UltraServers in internal benchmarking delivered throughput and responsiveness up to four times greater than the previous generation.
Scalability options
For projects requiring further scale, EC2 UltraClusters 3.0 can connect thousands of Trn3 UltraServers and support up to one million Trainium chips-a tenfold increase compared to the previous generation. This capacity enables organisations to work with trillion-token datasets and serve millions of concurrent AI inference requests.
Practical deployments
Several customers are already adopting Trainium systems. Businesses such as Anthropic, Karakuri, Metagenomics, Neto.ai, Ricoh, and Splashmusic are reporting reductions in AI training costs of up to 50% with Trainium-based infrastructure. Amazon Bedrock, AWS's managed service for foundation models, has deployed production workloads on Trainium3 systems.
Decart, an AI company focused on generative video and image models, has used Trainium3 to deliver four times faster frame generation at half the cost of using GPUs. AWS also supported Anthropic's recent AI model training by connecting over 500,000 Trainium2 chips to form what is described as the world's largest AI compute cluster to date.
Future plans
Work is underway on the next generation chip, Trainium4, which is aimed at further elevating processing and memory performance. AWS states that Trainium4 will offer at least six times the processing power (FP4), three times the FP8 performance, and four times more memory bandwidth, supporting more demanding training and inference jobs.
Trainium4 is planned to integrate with NVIDIA NVLink Fusion, allowing resource sharing among Trainium and GPU-based systems, and supporting joint deployment within common server racks. This will enable customers to mix and match AI infrastructure resources for different project requirements.
"Trainium3 enables us to train larger models faster, serve more users, and reduce costs-all of which is critical as generative AI adoption accelerates across industries," said Swami Sivasubramanian, Vice President of Data and AI, AWS.