GH200 GPU
The GH200 integrates GPU and CPU resources in a single system to improve performance and efficiency for large AI models. It aims to simplify infrastructure by combining processing and memory closer together.

Cloud Pricing
Cheapest on Lambda Labs — 50% below avg| Provider | Config | Price / hr | Updated | Source |
|---|---|---|---|---|
1× | $1.49/hr | 5/13/2026 | ||
1× | $1.64/hr | 5/13/2026 | ||
1× | $1.99/hr | 5/2/2026 | ||
1× | $1.99/hr36mo | 4/25/2026 | ||
1× | $4.23/hr | 5/13/2026 | ||
1× | $6.50/hr | 5/9/2026 |
Prices updated daily. Last check: May 13, 2026
Performance
Strengths & Limitations
Strengths
- 96GB of HBM3/HBM3e memory enables processing of large datasets without memory constraints
- 4 TB/s memory bandwidth supports memory-intensive workloads like large language models
- NVLink-C2C coherent interface at 900 GB/s eliminates CPU-GPU memory transfer overhead
- 528 Tensor cores with Transformer Engine provide hardware acceleration for AI training and inference
- CPU+GPU coherent memory model simplifies programming for heterogeneous computing
- 990 TFLOPS FP16 performance handles demanding neural network computations
- 4nm manufacturing process delivers power efficiency for data center deployments
Limitations
- 700W TDP with 1000W maximum power consumption requires robust cooling infrastructure
- Hopper architecture is a previous generation compared to current GB300 Blackwell Ultra GPUs
- High memory capacity may be overkill for smaller AI models or traditional graphics workloads
- Superchip form factor limits deployment flexibility compared to discrete GPU options
- Premium positioning makes it cost-prohibitive for development or smaller-scale applications
Key Features
About GH200
Common Use Cases
The GH200 is designed for giant-scale AI applications that require processing terabytes of data, particularly large language models, retrieval-augmented generation systems, and graph neural networks where its 96GB unified memory and coherent CPU-GPU architecture eliminate data movement bottlenecks. Its high memory bandwidth of 4 TB/s makes it suitable for HPC simulations, scientific computing, and data analytics workloads that are memory-bound rather than compute-bound. The superchip architecture is particularly effective for applications that benefit from tight CPU-GPU integration, such as complex AI pipelines that combine traditional computing with neural network inference.
Full Specifications
Hardware
- Manufacturer
- NVIDIA
- Architecture
- Hopper
- CUDA Cores
- 16,896
- Tensor Cores
- 528
- Process Node
- 4nm
- TDP
- 700W
- Max Power
- 1000W
Memory & Performance
- VRAM
- 96GB
- Memory Bandwidth
- 4000 GB/s
- FP32
- 67 TFLOPS
- FP16
- 990 TFLOPS
- BF16
- 989.5 TFLOPS
- FP8
- 1979 TFLOPS
- FP64
- 34 TFLOPS
- INT8
- 1979 TOPS
- Release
- 2023
Frequently Asked Questions
How much does a GH200 cost per hour in the cloud?
GH200 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the GH200 best used for?
The GH200 excels at giant-scale AI applications requiring large memory capacity, particularly large language models, retrieval-augmented generation, and graph neural networks. Its 96GB unified memory and coherent CPU-GPU architecture make it ideal for memory-intensive HPC workloads and scientific computing applications that process terabytes of data.
How does the GH200 compare to the H100 for AI workloads?
The GH200 offers significantly more memory (96GB vs 80GB) and integrates a Grace CPU with coherent memory access, eliminating CPU-GPU transfer bottlenecks. While both use Hopper architecture, the GH200's unified memory model and higher memory bandwidth (4 TB/s vs 3.35 TB/s) provide advantages for memory-intensive AI applications, though the H100 may be more cost-effective for compute-bound workloads that don't require the additional memory capacity.