CDNA 3Data Center

MI325X GPU

The AMD Instinct MI325X is a CDNA 3-based data-center accelerator built on a 5 nm process, featuring 256 GB of HBM3E memory (6 TB/s peak bandwidth), 304 compute units (19,456 stream processors), and 1,216 matrix cores. It supports PCIe Gen 5 and AMD Infinity Fabric for coherent multi-GPU scaling, and is optimized for large-model AI training/inference as well as HPC workloads.

VRAM 256GB

Tensor Cores 1,216

Process 5nm

From

$2.00/hr

across 2 providers

Compare Prices Specs →

Cloud Pricing

Cheapest on Vultr — 4% below avg

Provider	Config	Price / hr	Updated	Source
Vultr	1×8×	$2.00/hr	5/21/2026
TensorWave	1×	$2.25/hr	5/12/2026

Direct from providerVia marketplace

Prices updated daily. Last check: May 23, 2026

Performance

FP16

1300 TFLOPS

BF16

1307.4 TFLOPS

FP8

2614.9 TFLOPS

Bandwidth

6000 GB/s

Strengths & Limitations

Strengths

256 GB HBM3E memory capacity supports large model training and inference without memory constraints
6 TB/s memory bandwidth enables efficient data movement for memory-intensive workloads
2.61 PFLOPs FP8 performance with structured sparsity support reaching 5.22 PFLOPs
AMD ROCm ecosystem provides open-source software stack without vendor lock-in
Infinity Fabric interconnect with 8 links at 128 GB/s enables multi-GPU scaling
OAM form factor designed for high-density server deployments
Native FP64 support with 81.7 TFLOPs performance for scientific computing workloads

Limitations

1000W peak power consumption requires robust data center power and cooling infrastructure
AMD ROCm ecosystem has fewer pre-optimized AI frameworks compared to NVIDIA CUDA
Passive cooling design limits deployment options to servers with adequate airflow systems
54V UBB external power connectors require compatible server power delivery
CDNA 3 architecture lacks hardware support for some newer AI model architectures

Key Features

•AMD CDNA 3 Architecture

•HBM3E Memory Technology

•AMD ROCm Software Stack

•AMD Infinity Architecture

•Structured Sparsity Support

•Infinity Fabric Interconnect

•Multi-Precision Compute Support

•OAM Form Factor Design

About MI325X

The AMD MI325X is a server-focused compute accelerator built on AMD's CDNA 3 architecture, manufactured using TSMC's 5nm/6nm FinFET process technology. Positioned as AMD's response to high-end data center AI workloads, the MI325X features 256 GB of HBM3E memory and delivers 1.3 PFLOPs of FP16 performance. The GPU comes in an OAM (Open Accelerator Module) form factor with passive cooling design, targeting enterprise deployments where thermal management and rack density are critical considerations. Key technical specifications include 1,216 compute units, 6 TB/s of memory bandwidth delivered through an 8,192-bit memory interface, and support for multiple precision formats including FP64, FP32, FP16, and FP8. The MI325X achieves 2.61 PFLOPs of FP8 performance, extending to 5.22 PFLOPs when utilizing structured sparsity optimizations. Interconnect capabilities include PCIe 5.0 x16 and AMD's Infinity Fabric with 8 links operating at 128 GB/s, enabling multi-GPU scaling in distributed computing environments. In cloud deployments, the MI325X targets large-scale AI training workloads, high-performance computing applications, and inference tasks requiring substantial memory capacity. The 256 GB HBM3E memory configuration makes it suitable for processing large language models and datasets that exceed the memory limits of consumer-oriented GPUs, while the 1000W power envelope requires appropriate data center infrastructure and cooling solutions.

Common Use Cases

The MI325X is designed for large-scale AI training and inference workloads that require substantial memory capacity, making it suitable for training large language models, computer vision networks, and other memory-intensive AI applications. The 256 GB HBM3E memory enables processing of datasets and models that exceed typical GPU memory limits, while the high FP8 and FP16 performance supports both training and inference phases. Scientific computing and high-performance computing workloads benefit from the native FP64 support delivering 81.7 TFLOPs, making it applicable for computational fluid dynamics, molecular modeling, and financial simulations. The Infinity Fabric interconnect and OAM form factor make it well-suited for multi-GPU deployments in enterprise data centers requiring high computational density.

Full Specifications

Hardware

Manufacturer: AMD
Architecture: CDNA 3
Tensor Cores: 1,216
Compute Units: 304
Generation: Gen 3
Process Node: 5nm
Max Power: 1000W

Memory & Performance

VRAM: 256GB
Memory Interface: 8192-bit
Memory Bandwidth: 6000 GB/s
FP16: 1300 TFLOPS
BF16: 1307.4 TFLOPS
FP8: 2614.9 TFLOPS
Release: 2024

0 →1 →

Frequently Asked Questions

How much does a MI325X cost per hour in the cloud?

MI325X pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the MI325X best used for?

The MI325X excels at large-scale AI training and inference workloads requiring substantial memory capacity, scientific computing applications needing FP64 precision, and high-performance computing tasks. The 256 GB HBM3E memory makes it particularly suitable for large language model training and processing datasets that exceed typical GPU memory limits.

How does the MI325X compare to NVIDIA's H100 for AI workloads?

The MI325X offers 256 GB of HBM3E memory compared to the H100's 80 GB, providing advantages for memory-intensive workloads. Both GPUs target similar AI training and inference applications, but the MI325X uses AMD's ROCm ecosystem while the H100 leverages NVIDIA's CUDA platform. Performance characteristics vary by workload, with the MI325X delivering 2.61 PFLOPs FP8 performance and the H100 offering different optimization profiles.