ultraData Center

MI355X GPU

The AMD Instinct MI355X is the next-generation CDNA 4 accelerator with 288GB HBM3E memory, targeting AI training and large-scale inference.

VRAM 288GB
TDP 1000W
From
$2.59/hr
across 3 providers
MI355X GPU

Cloud Pricing

Cheapest on Vultr 11% below avg
ProviderGPUsPrice / hrUpdatedSource
1× GPU
$2.59
4/2/2026
8× GPU
$2.59
4/7/2026
1× GPU
$2.95
4/7/2026
1× GPU
$3.45
4/2/2026
Direct from providerVia marketplace

Prices updated daily. Last check: 4/8/2026

Performance

FP16
1800 TFLOPS
Bandwidth
8000 GB/s

Strengths & Limitations

  • 288GB HBM3E memory capacity supports large model training without partitioning
  • 8TB/s memory bandwidth enables efficient data movement for memory-intensive workloads
  • MXFP4 support delivers 10.1 PetaFLOPs peak performance for AI inference
  • Enhanced sparse matrix calculation performance optimizes certain AI and scientific workloads
  • FP64 performance of 78.6 TFLOPs serves HPC applications requiring double precision
  • 7 Infinity Fabric links enable high-bandwidth multi-GPU scaling
  • CDNA 4 architecture provides dedicated AI and HPC optimizations
  • 1400W typical board power requires substantial cooling infrastructure
  • OAM form factor limits deployment to specialized server platforms
  • AMD ROCm ecosystem has fewer pre-optimized frameworks compared to CUDA
  • Ultra-tier pricing makes it cost-prohibitive for smaller workloads
  • Limited availability in cloud providers compared to NVIDIA alternatives

Key Features

4th Gen AMD CDNA architecture
288GB HBM3E memory
MXFP6 and MXFP4 datatype support
Enhanced sparse matrix calculations
7 Infinity Fabric links
PCIe 5.0 x16 connectivity
OAM module form factor
ROCm software stack compatibility

About MI355X

The AMD MI355X is a datacenter accelerator built on the 4th generation CDNA architecture, positioned as AMD's ultra-high-performance GPU for AI and HPC workloads. As part of the MI350 series released in 2025, it represents AMD's answer to competing ultra-tier accelerators in the enterprise market. The MI355X features 288GB of HBM3E memory with 8TB/s of bandwidth, designed for workloads requiring substantial memory capacity and throughput. Key technical specifications include peak FP16 performance of 1.8 PetaFLOPs, FP32 matrix performance of 157.3 TFLOPs, and FP64 performance of 78.6 TFLOPs. The GPU supports expanded MXFP6 and MXFP4 datatypes, with MXFP4 performance reaching 10.1 PetaFLOPs. The architecture includes enhanced sparse matrix calculation capabilities and connects via PCIe 5.0 x16 with 7 Infinity Fabric links for multi-GPU scaling. In cloud deployments, the MI355X targets large-scale AI training, particularly for transformer models and other memory-intensive neural networks where the 288GB memory capacity provides advantages. The high memory bandwidth and sparse matrix optimizations make it suitable for both dense and sparse AI workloads, as well as memory-bound HPC simulations requiring substantial compute precision.

Common Use Cases

The MI355X is designed for large-scale AI training and inference workloads that benefit from its 288GB memory capacity, including training transformer models, large language models, and computer vision networks that exceed the memory limits of smaller accelerators. The 8TB/s memory bandwidth and MXFP4 support make it effective for high-throughput inference serving. In HPC environments, the 78.6 TFLOPs FP64 performance and sparse matrix optimizations suit computational fluid dynamics, molecular dynamics simulations, and other scientific computing applications requiring both high memory capacity and compute precision.

Full Specifications

Hardware

Manufacturer
AMD
Architecture
CDNA 4
TDP
1000W

Memory & Performance

VRAM
288GB
Memory Bandwidth
8000 GB/s
FP16
1800 TFLOPS
Release
2025

Frequently Asked Questions

How much does a MI355X cost per hour in the cloud?

MI355X pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the MI355X best used for?

The MI355X excels at large-scale AI training and inference workloads requiring substantial memory capacity, particularly transformer models and LLMs that benefit from the 288GB HBM3E memory. It also serves HPC applications needing high FP64 performance and memory bandwidth for scientific simulations.

How does the MI355X compare to NVIDIA's H100 for AI workloads?

The MI355X offers 288GB of memory versus the H100's 80GB, providing advantages for large model training. However, the H100 benefits from broader CUDA ecosystem support and more mature AI framework optimizations. Performance comparisons depend on specific workloads and framework optimization levels.