H200 GPU
The H200 is designed for large-scale AI training and inference, offering improved precision and throughput for transformer models. It handles complex AI workloads with better energy efficiency than previous generations.

Cloud Pricing
Cheapest on Verda — 69% below avgPrices updated daily. Last check: May 12, 2026
Performance
Strengths & Limitations
Strengths
- 141GB HBM3E memory provides substantial capacity for large language models
- 4.8TB/s memory bandwidth enables high-throughput data processing
- 528 fourth-generation Tensor Cores with Transformer Engine acceleration
- NVLink interconnect delivers 900GB/s for multi-GPU scaling
- 990 TFLOPS FP16 performance for AI training and inference workloads
- First GPU implementation of HBM3E memory technology
- 1,979 TOPS INT8 performance for optimized inference deployments
Limitations
- 700W maximum power consumption requires substantial cooling infrastructure
- Previous-generation Hopper architecture compared to newer GPU releases
- High memory capacity may be excessive for smaller AI models
- Limited to PCIe Gen5 connectivity in PCIe form factor variants
- Specialized data center focus limits general-purpose computing applications
Key Features
About H200
Common Use Cases
The H200 is designed for memory-intensive AI workloads, particularly large language model training and inference where the 141GB HBM3E memory capacity enables handling of models that exceed the memory limits of previous generations. Its high memory bandwidth of 4.8TB/s makes it suitable for generative AI applications, recommendation systems, and natural language processing tasks that require rapid access to large datasets. The substantial Tensor Core count and FP16 performance capabilities also position it for AI training workflows, scientific computing applications, and high-performance computing tasks that can leverage its memory subsystem advantages.
Full Specifications
Hardware
- Manufacturer
- NVIDIA
- Architecture
- Hopper
- CUDA Cores
- 16,896
- Tensor Cores
- 528
- Process Node
- 4nm
- TDP
- 700W
Memory & Performance
- VRAM
- 141GB
- Memory Bandwidth
- 4800 GB/s
- FP32
- 67 TFLOPS
- FP16
- 990 TFLOPS
- BF16
- 989.5 TFLOPS
- FP8
- 1979 TFLOPS
- FP64
- 34 TFLOPS
- INT8
- 1979 TOPS
- Release
- 2023
Frequently Asked Questions
How much does an H200 cost per hour in the cloud?
H200 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the H200 best used for?
The H200 excels at memory-intensive AI workloads, particularly large language model training and inference, generative AI applications, and high-performance computing tasks that require substantial memory capacity and bandwidth.
How does the H200 compare to the H100 for LLM inference?
The H200 provides 141GB of HBM3E memory compared to the H100's smaller capacity, along with higher memory bandwidth at 4.8TB/s. This enables the H200 to handle larger language models in a single GPU and deliver improved inference throughput for memory-bound workloads.