MI325X GPU
The AMD Instinct MI325X is a CDNA 3-based data-center accelerator built on a 5 nm process, featuring 256 GB of HBM3E memory (6 TB/s peak bandwidth), 304 compute units (19,456 stream processors), and 1,216 matrix cores. It supports PCIe Gen 5 and AMD Infinity Fabric for coherent multi-GPU scaling, and is optimized for large-model AI training/inference as well as HPC workloads.

Cloud Pricing
Cheapest on Vultr — 6% below avgPrices updated daily. Last check: 4/8/2026
Performance
Strengths & Limitations
- 256 GB HBM3E memory capacity supports large model training and inference without memory constraints
- 6 TB/s memory bandwidth enables efficient data movement for memory-intensive workloads
- 2.61 PFLOPs FP8 performance with structured sparsity support reaching 5.22 PFLOPs
- AMD ROCm ecosystem provides open-source software stack without vendor lock-in
- Infinity Fabric interconnect with 8 links at 128 GB/s enables multi-GPU scaling
- OAM form factor designed for high-density server deployments
- Native FP64 support with 81.7 TFLOPs performance for scientific computing workloads
- 1000W peak power consumption requires robust data center power and cooling infrastructure
- AMD ROCm ecosystem has fewer pre-optimized AI frameworks compared to NVIDIA CUDA
- Passive cooling design limits deployment options to servers with adequate airflow systems
- 54V UBB external power connectors require compatible server power delivery
- CDNA 3 architecture lacks hardware support for some newer AI model architectures
Key Features
About MI325X
Common Use Cases
The MI325X is designed for large-scale AI training and inference workloads that require substantial memory capacity, making it suitable for training large language models, computer vision networks, and other memory-intensive AI applications. The 256 GB HBM3E memory enables processing of datasets and models that exceed typical GPU memory limits, while the high FP8 and FP16 performance supports both training and inference phases. Scientific computing and high-performance computing workloads benefit from the native FP64 support delivering 81.7 TFLOPs, making it applicable for computational fluid dynamics, molecular modeling, and financial simulations. The Infinity Fabric interconnect and OAM form factor make it well-suited for multi-GPU deployments in enterprise data centers requiring high computational density.
Full Specifications
Hardware
- Manufacturer
- AMD
- Architecture
- CDNA 3
- Tensor Cores
- 1,216
- Compute Units
- 304
- Generation
- Gen 3
- Process Node
- 5nm
- Max Power
- 1000W
Memory & Performance
- VRAM
- 256GB
- Memory Interface
- 8192-bit
- Memory Bandwidth
- 6000 GB/s
- FP16
- 1300 TFLOPS
- BF16
- 1307.4 TFLOPS
- FP8
- 2614.9 TFLOPS
- Release
- 2024
Frequently Asked Questions
How much does a MI325X cost per hour in the cloud?
MI325X pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the MI325X best used for?
The MI325X excels at large-scale AI training and inference workloads requiring substantial memory capacity, scientific computing applications needing FP64 precision, and high-performance computing tasks. The 256 GB HBM3E memory makes it particularly suitable for large language model training and processing datasets that exceed typical GPU memory limits.
How does the MI325X compare to NVIDIA's H100 for AI workloads?
The MI325X offers 256 GB of HBM3E memory compared to the H100's 80 GB, providing advantages for memory-intensive workloads. Both GPUs target similar AI training and inference applications, but the MI325X uses AMD's ROCm ecosystem while the H100 leverages NVIDIA's CUDA platform. Performance characteristics vary by workload, with the MI325X delivering 2.61 PFLOPs FP8 performance and the H100 offering different optimization profiles.