Skip to main content
ultra

GH200 GPU

The GH200 integrates GPU and CPU resources in a single system to improve performance and efficiency for large AI models. It aims to simplify infrastructure by combining processing and memory closer together.

VRAM 96GB
CUDA Cores 16,896
Tensor Cores 528
TDP 700W
Process 4nm
From
$1.49/hr
across 5 providers
GH200 GPU

Cloud Pricing

Cheapest on Lambda Labs 50% below avg
ProviderConfigPrice / hrUpdatedSource
1×
$1.49/hr
5/13/2026
1×
$1.64/hr
5/13/2026
1×
$1.99/hr
5/2/2026
1×
$1.99/hr36mo
4/25/2026
1×
$4.23/hr
5/13/2026
1×
$6.50/hr
5/9/2026
Direct from providerVia marketplace

Prices updated daily. Last check: May 13, 2026

Performance

FP16
990 TFLOPS
FP32
67 TFLOPS
BF16
989.5 TFLOPS
FP8
1979 TFLOPS
INT8
1979 TOPS
Bandwidth
4000 GB/s

Strengths & Limitations

Strengths

  • 96GB of HBM3/HBM3e memory enables processing of large datasets without memory constraints
  • 4 TB/s memory bandwidth supports memory-intensive workloads like large language models
  • NVLink-C2C coherent interface at 900 GB/s eliminates CPU-GPU memory transfer overhead
  • 528 Tensor cores with Transformer Engine provide hardware acceleration for AI training and inference
  • CPU+GPU coherent memory model simplifies programming for heterogeneous computing
  • 990 TFLOPS FP16 performance handles demanding neural network computations
  • 4nm manufacturing process delivers power efficiency for data center deployments

Limitations

  • 700W TDP with 1000W maximum power consumption requires robust cooling infrastructure
  • Hopper architecture is a previous generation compared to current GB300 Blackwell Ultra GPUs
  • High memory capacity may be overkill for smaller AI models or traditional graphics workloads
  • Superchip form factor limits deployment flexibility compared to discrete GPU options
  • Premium positioning makes it cost-prohibitive for development or smaller-scale applications

Key Features

NVIDIA NVLink-C2C coherent interconnect
HBM3 and HBM3e GPU memory
CPU+GPU coherent memory model
Transformer Engine with FP8 support
528 fourth-generation Tensor cores
16,896 CUDA cores
Grace CPU integration
4nm manufacturing process

About GH200

The GH200 is NVIDIA's superchip that combines the Grace CPU with a Hopper architecture GPU, creating a unified compute platform designed for large-scale AI and HPC workloads. As part of NVIDIA's previous-generation Hopper lineup, the GH200 integrates 16,896 CUDA cores and 528 Tensor cores with 96GB of HBM3/HBM3e memory, connected via the NVLink-C2C coherent interface at 900 GB/s. The superchip delivers 4 TB/s of memory bandwidth and features a CPU+GPU coherent memory model that eliminates traditional CPU-GPU memory transfer bottlenecks. With 990 TFLOPS of FP16 performance and 1,979 TOPS for INT8 operations, the GH200 targets memory-intensive AI applications that require processing terabytes of data with minimal latency overhead between compute elements.

Common Use Cases

The GH200 is designed for giant-scale AI applications that require processing terabytes of data, particularly large language models, retrieval-augmented generation systems, and graph neural networks where its 96GB unified memory and coherent CPU-GPU architecture eliminate data movement bottlenecks. Its high memory bandwidth of 4 TB/s makes it suitable for HPC simulations, scientific computing, and data analytics workloads that are memory-bound rather than compute-bound. The superchip architecture is particularly effective for applications that benefit from tight CPU-GPU integration, such as complex AI pipelines that combine traditional computing with neural network inference.

Full Specifications

Hardware

Manufacturer
NVIDIA
Architecture
Hopper
CUDA Cores
16,896
Tensor Cores
528
Process Node
4nm
TDP
700W
Max Power
1000W

Memory & Performance

VRAM
96GB
Memory Bandwidth
4000 GB/s
FP32
67 TFLOPS
FP16
990 TFLOPS
BF16
989.5 TFLOPS
FP8
1979 TFLOPS
FP64
34 TFLOPS
INT8
1979 TOPS
Release
2023

Frequently Asked Questions

How much does a GH200 cost per hour in the cloud?

GH200 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the GH200 best used for?

The GH200 excels at giant-scale AI applications requiring large memory capacity, particularly large language models, retrieval-augmented generation, and graph neural networks. Its 96GB unified memory and coherent CPU-GPU architecture make it ideal for memory-intensive HPC workloads and scientific computing applications that process terabytes of data.

How does the GH200 compare to the H100 for AI workloads?

The GH200 offers significantly more memory (96GB vs 80GB) and integrates a Grace CPU with coherent memory access, eliminating CPU-GPU transfer bottlenecks. While both use Hopper architecture, the GH200's unified memory model and higher memory bandwidth (4 TB/s vs 3.35 TB/s) provide advantages for memory-intensive AI applications, though the H100 may be more cost-effective for compute-bound workloads that don't require the additional memory capacity.