Skip to main content
ultra

H200 GPU

The H200 is designed for large-scale AI training and inference, offering improved precision and throughput for transformer models. It handles complex AI workloads with better energy efficiency than previous generations.

VRAM 141GB
CUDA Cores 16,896
Tensor Cores 528
TDP 700W
Process 4nm
From
$1.19/hr
across 21 providers
H200 GPU

Cloud Pricing

Cheapest on Verda 69% below avg
ProviderConfigPrice / hrUpdatedSource
2×4×8×
$1.19/hr
5/12/2026
1×
$1.19/hr
5/12/2026
1×
$1.50/hr
4/26/2026
4×
$2.09/hr
5/12/2026
1×
$2.19/hr
5/11/2026
1×2×4×
$2.29/hr
5/12/2026
4×
$2.31/hr
5/12/2026
8×
$2.31/hr
5/12/2026
1×
$2.40/hr
5/12/2026
8×
$2.48/hr
5/12/2026
1×
$2.70/hr
5/12/2026
1×
$2.80/hr
5/2/2026
1×2×
$2.83/hr
5/12/2026
1×
$2.89/hr4mo
5/6/2026
4×8×
$2.89/hr
5/12/2026
8×
$2.93/hr
4/18/2026
1×8×
$2.99/hr36mo
5/9/2026
1×
$3.06/hr
5/12/2026
1×8×
$3.09/hr24mo
5/9/2026
4×
$3.15/hr
4/21/2026
1×
$3.19/hr2mo
5/6/2026
1×8×
$3.19/hr12mo
5/9/2026
1×
$3.29/hr6mo
5/12/2026
1×8×
$3.29/hr6mo
5/9/2026
2×
$3.39/hr
5/12/2026
1×2×4×8×
$3.39/hr
5/12/2026
1×8×
$3.44/hr
5/12/2026
1×8×
$3.49/hr
5/9/2026
8×
$3.56/hr
5/12/2026
2×
$3.58/hr
4/19/2026
1×2×4×8×
$3.59/hr
5/12/2026
1×
$3.65/hr
5/12/2026
1×
$3.65/hr3mo
5/12/2026
1×
$3.80/hr
5/7/2026
1×
$3.99/hr1mo
5/12/2026
1×
$4.29/hr
5/10/2026
1×
$4.44/hr
4/20/2026
1×2×4×8×
$5.49/hr
5/12/2026
1×
$6.27/hr
5/12/2026
8×
$6.31/hr
4/25/2026
4×
$6.77/hr
5/12/2026
8×
$7.91/hr
5/12/2026
2×
$8.13/hr
5/12/2026
1×
$10.00/hr
5/12/2026
8×
$20.64/hr
5/9/2026
Direct from providerVia marketplace

Prices updated daily. Last check: May 12, 2026

Performance

FP16
990 TFLOPS
FP32
67 TFLOPS
BF16
989.5 TFLOPS
FP8
1979 TFLOPS
INT8
1979 TOPS
Bandwidth
4800 GB/s

Strengths & Limitations

Strengths

  • 141GB HBM3E memory provides substantial capacity for large language models
  • 4.8TB/s memory bandwidth enables high-throughput data processing
  • 528 fourth-generation Tensor Cores with Transformer Engine acceleration
  • NVLink interconnect delivers 900GB/s for multi-GPU scaling
  • 990 TFLOPS FP16 performance for AI training and inference workloads
  • First GPU implementation of HBM3E memory technology
  • 1,979 TOPS INT8 performance for optimized inference deployments

Limitations

  • 700W maximum power consumption requires substantial cooling infrastructure
  • Previous-generation Hopper architecture compared to newer GPU releases
  • High memory capacity may be excessive for smaller AI models
  • Limited to PCIe Gen5 connectivity in PCIe form factor variants
  • Specialized data center focus limits general-purpose computing applications

Key Features

HBM3E Memory Technology
Fourth-Generation Tensor Cores
Transformer Engine with FP8 Support
NVLink Interconnect
PCIe Gen5 Interface
CUDA Compute Capability
Multi-Instance GPU (MIG)
NVIDIA AI Enterprise Support

About H200

The NVIDIA H200 is a data center GPU built on the Hopper architecture, positioned as an enhanced variant of the H100 within NVIDIA's previous-generation lineup. As the first GPU to incorporate HBM3E memory technology, the H200 delivers significantly expanded memory capacity and bandwidth compared to its predecessors. The H200 features 141GB of HBM3E memory with 4.8TB/s of memory bandwidth, paired with 16,896 CUDA cores and 528 fourth-generation Tensor Cores. Key technical differentiators include its substantial memory subsystem, NVLink interconnect technology providing 900GB/s of bandwidth, and Transformer Engine capabilities optimized for large language model workloads. The GPU delivers 990 TFLOPS of FP16 performance and 1,979 TOPS of INT8 throughput, with a maximum power consumption of 700W. In cloud deployments, the H200 serves workloads requiring substantial memory capacity for large AI models, particularly generative AI applications and high-performance computing tasks where memory bandwidth and capacity are limiting factors.

Common Use Cases

The H200 is designed for memory-intensive AI workloads, particularly large language model training and inference where the 141GB HBM3E memory capacity enables handling of models that exceed the memory limits of previous generations. Its high memory bandwidth of 4.8TB/s makes it suitable for generative AI applications, recommendation systems, and natural language processing tasks that require rapid access to large datasets. The substantial Tensor Core count and FP16 performance capabilities also position it for AI training workflows, scientific computing applications, and high-performance computing tasks that can leverage its memory subsystem advantages.

Full Specifications

Hardware

Manufacturer
NVIDIA
Architecture
Hopper
CUDA Cores
16,896
Tensor Cores
528
Process Node
4nm
TDP
700W

Memory & Performance

VRAM
141GB
Memory Bandwidth
4800 GB/s
FP32
67 TFLOPS
FP16
990 TFLOPS
BF16
989.5 TFLOPS
FP8
1979 TFLOPS
FP64
34 TFLOPS
INT8
1979 TOPS
Release
2023

Frequently Asked Questions

How much does an H200 cost per hour in the cloud?

H200 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the H200 best used for?

The H200 excels at memory-intensive AI workloads, particularly large language model training and inference, generative AI applications, and high-performance computing tasks that require substantial memory capacity and bandwidth.

How does the H200 compare to the H100 for LLM inference?

The H200 provides 141GB of HBM3E memory compared to the H100's smaller capacity, along with higher memory bandwidth at 4.8TB/s. This enables the H200 to handle larger language models in a single GPU and deliver improved inference throughput for memory-bound workloads.