CDNA 3Data Center

MI325X GPU

The AMD Instinct MI325X is a CDNA 3-based data-center accelerator built on a 5 nm process, featuring 256 GB of HBM3E memory (6 TB/s peak bandwidth), 304 compute units (19,456 stream processors), and 1,216 matrix cores. It supports PCIe Gen 5 and AMD Infinity Fabric for coherent multi-GPU scaling, and is optimized for large-model AI training/inference as well as HPC workloads.

VRAM 256GB
Tensor Cores 1,216
Process 5nm
From
$2.00/hr
across 2 providers
MI325X GPU

Cloud Pricing

Cheapest on Vultr 6% below avg
ProviderGPUsPrice / hrUpdatedSource
1× GPU
$2.00
4/2/2026
1× GPU
$2.25
4/7/2026
Direct from providerVia marketplace

Prices updated daily. Last check: 4/8/2026

Performance

FP16
1300 TFLOPS
BF16
1307.4 TFLOPS
FP8
2614.9 TFLOPS
Bandwidth
6000 GB/s

Strengths & Limitations

  • 256 GB HBM3E memory capacity supports large model training and inference without memory constraints
  • 6 TB/s memory bandwidth enables efficient data movement for memory-intensive workloads
  • 2.61 PFLOPs FP8 performance with structured sparsity support reaching 5.22 PFLOPs
  • AMD ROCm ecosystem provides open-source software stack without vendor lock-in
  • Infinity Fabric interconnect with 8 links at 128 GB/s enables multi-GPU scaling
  • OAM form factor designed for high-density server deployments
  • Native FP64 support with 81.7 TFLOPs performance for scientific computing workloads
  • 1000W peak power consumption requires robust data center power and cooling infrastructure
  • AMD ROCm ecosystem has fewer pre-optimized AI frameworks compared to NVIDIA CUDA
  • Passive cooling design limits deployment options to servers with adequate airflow systems
  • 54V UBB external power connectors require compatible server power delivery
  • CDNA 3 architecture lacks hardware support for some newer AI model architectures

Key Features

AMD CDNA 3 Architecture
HBM3E Memory Technology
AMD ROCm Software Stack
AMD Infinity Architecture
Structured Sparsity Support
Infinity Fabric Interconnect
Multi-Precision Compute Support
OAM Form Factor Design

About MI325X

The AMD MI325X is a server-focused compute accelerator built on AMD's CDNA 3 architecture, manufactured using TSMC's 5nm/6nm FinFET process technology. Positioned as AMD's response to high-end data center AI workloads, the MI325X features 256 GB of HBM3E memory and delivers 1.3 PFLOPs of FP16 performance. The GPU comes in an OAM (Open Accelerator Module) form factor with passive cooling design, targeting enterprise deployments where thermal management and rack density are critical considerations. Key technical specifications include 1,216 compute units, 6 TB/s of memory bandwidth delivered through an 8,192-bit memory interface, and support for multiple precision formats including FP64, FP32, FP16, and FP8. The MI325X achieves 2.61 PFLOPs of FP8 performance, extending to 5.22 PFLOPs when utilizing structured sparsity optimizations. Interconnect capabilities include PCIe 5.0 x16 and AMD's Infinity Fabric with 8 links operating at 128 GB/s, enabling multi-GPU scaling in distributed computing environments. In cloud deployments, the MI325X targets large-scale AI training workloads, high-performance computing applications, and inference tasks requiring substantial memory capacity. The 256 GB HBM3E memory configuration makes it suitable for processing large language models and datasets that exceed the memory limits of consumer-oriented GPUs, while the 1000W power envelope requires appropriate data center infrastructure and cooling solutions.

Common Use Cases

The MI325X is designed for large-scale AI training and inference workloads that require substantial memory capacity, making it suitable for training large language models, computer vision networks, and other memory-intensive AI applications. The 256 GB HBM3E memory enables processing of datasets and models that exceed typical GPU memory limits, while the high FP8 and FP16 performance supports both training and inference phases. Scientific computing and high-performance computing workloads benefit from the native FP64 support delivering 81.7 TFLOPs, making it applicable for computational fluid dynamics, molecular modeling, and financial simulations. The Infinity Fabric interconnect and OAM form factor make it well-suited for multi-GPU deployments in enterprise data centers requiring high computational density.

Full Specifications

Hardware

Manufacturer
AMD
Architecture
CDNA 3
Tensor Cores
1,216
Compute Units
304
Generation
Gen 3
Process Node
5nm
Max Power
1000W

Memory & Performance

VRAM
256GB
Memory Interface
8192-bit
Memory Bandwidth
6000 GB/s
FP16
1300 TFLOPS
BF16
1307.4 TFLOPS
FP8
2614.9 TFLOPS
Release
2024

Frequently Asked Questions

How much does a MI325X cost per hour in the cloud?

MI325X pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the MI325X best used for?

The MI325X excels at large-scale AI training and inference workloads requiring substantial memory capacity, scientific computing applications needing FP64 precision, and high-performance computing tasks. The 256 GB HBM3E memory makes it particularly suitable for large language model training and processing datasets that exceed typical GPU memory limits.

How does the MI325X compare to NVIDIA's H100 for AI workloads?

The MI325X offers 256 GB of HBM3E memory compared to the H100's 80 GB, providing advantages for memory-intensive workloads. Both GPUs target similar AI training and inference applications, but the MI325X uses AMD's ROCm ecosystem while the H100 leverages NVIDIA's CUDA platform. Performance characteristics vary by workload, with the MI325X delivering 2.61 PFLOPs FP8 performance and the H100 offering different optimization profiles.