ultraData Center

MI355X GPU

The AMD Instinct MI355X is the next-generation CDNA 4 accelerator with 288GB HBM3E memory, targeting AI training and large-scale inference.

VRAM 288GB

TDP 1000W

From

$2.59/hr

across 3 providers

Compare Prices Specs →

Cloud Pricing

Cheapest on Vultr — 38% below avg

Provider	Config	Price / hr	Updated
Vultr	1×8×	$2.59/hr	5/21/2026
TensorWave	1×	$2.95/hr	5/12/2026
Oracle Cloud	1×	$8.60/hr	5/23/2026

Direct from providerVia marketplace

Prices updated daily. Last check: May 23, 2026

Performance

FP16

1800 TFLOPS

Bandwidth

8000 GB/s

Strengths & Limitations

Strengths

288GB HBM3E memory capacity supports large model training without partitioning
8TB/s memory bandwidth enables efficient data movement for memory-intensive workloads
MXFP4 support delivers 10.1 PetaFLOPs peak performance for AI inference
Enhanced sparse matrix calculation performance optimizes certain AI and scientific workloads
FP64 performance of 78.6 TFLOPs serves HPC applications requiring double precision
7 Infinity Fabric links enable high-bandwidth multi-GPU scaling
CDNA 4 architecture provides dedicated AI and HPC optimizations

Limitations

1400W typical board power requires substantial cooling infrastructure
OAM form factor limits deployment to specialized server platforms
AMD ROCm ecosystem has fewer pre-optimized frameworks compared to CUDA
Ultra-tier pricing makes it cost-prohibitive for smaller workloads
Limited availability in cloud providers compared to NVIDIA alternatives

Key Features

•4th Gen AMD CDNA architecture

•288GB HBM3E memory

•MXFP6 and MXFP4 datatype support

•Enhanced sparse matrix calculations

•7 Infinity Fabric links

•PCIe 5.0 x16 connectivity

•OAM module form factor

•ROCm software stack compatibility

About MI355X

The AMD MI355X is a datacenter accelerator built on the 4th generation CDNA architecture, positioned as AMD's ultra-high-performance GPU for AI and HPC workloads. As part of the MI350 series released in 2025, it represents AMD's answer to competing ultra-tier accelerators in the enterprise market. The MI355X features 288GB of HBM3E memory with 8TB/s of bandwidth, designed for workloads requiring substantial memory capacity and throughput. Key technical specifications include peak FP16 performance of 1.8 PetaFLOPs, FP32 matrix performance of 157.3 TFLOPs, and FP64 performance of 78.6 TFLOPs. The GPU supports expanded MXFP6 and MXFP4 datatypes, with MXFP4 performance reaching 10.1 PetaFLOPs. The architecture includes enhanced sparse matrix calculation capabilities and connects via PCIe 5.0 x16 with 7 Infinity Fabric links for multi-GPU scaling. In cloud deployments, the MI355X targets large-scale AI training, particularly for transformer models and other memory-intensive neural networks where the 288GB memory capacity provides advantages. The high memory bandwidth and sparse matrix optimizations make it suitable for both dense and sparse AI workloads, as well as memory-bound HPC simulations requiring substantial compute precision.

Common Use Cases

The MI355X is designed for large-scale AI training and inference workloads that benefit from its 288GB memory capacity, including training transformer models, large language models, and computer vision networks that exceed the memory limits of smaller accelerators. The 8TB/s memory bandwidth and MXFP4 support make it effective for high-throughput inference serving. In HPC environments, the 78.6 TFLOPs FP64 performance and sparse matrix optimizations suit computational fluid dynamics, molecular dynamics simulations, and other scientific computing applications requiring both high memory capacity and compute precision.

Full Specifications

Hardware

Manufacturer: AMD
Architecture: CDNA 4
TDP: 1000W

Memory & Performance

VRAM: 288GB
Memory Bandwidth: 8000 GB/s
FP16: 1800 TFLOPS
Release: 2025

Frequently Asked Questions

How much does a MI355X cost per hour in the cloud?

MI355X pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the MI355X best used for?

The MI355X excels at large-scale AI training and inference workloads requiring substantial memory capacity, particularly transformer models and LLMs that benefit from the 288GB HBM3E memory. It also serves HPC applications needing high FP64 performance and memory bandwidth for scientific simulations.

How does the MI355X compare to NVIDIA's H100 for AI workloads?

The MI355X offers 288GB of memory versus the H100's 80GB, providing advantages for large model training. However, the H100 benefits from broader CUDA ecosystem support and more mature AI framework optimizations. Performance comparisons depend on specific workloads and framework optimization levels.