GPU Pricing

Compare hourly rates across 53+ providers and 66+ GPU models

Updated daily

💡Top Picks Right Now

Click to filter table

Price Trends

Moving average across providers (latest price per provider per day)

Cheapest Option
...
Waiting for data...
Most Popular
...
Waiting for data...
Top of the Line
NVIDIA GB300
Waiting for data...
Showing 15 of 56 GPUs (757 price records)
Direct from providerVia marketplace

Provider Details

Select a provider from the table to see information about pricing, features, and more.

GPU Specifications

Pick a GPU from the table to check out its specs and see how it performs.

GPU Selection Guide

Key specs and tiers to help you pick the right cloud GPU for your workload

VRAM (Memory)

The most critical spec for large models. More VRAM means larger models and bigger batch sizes. Entry-level AI work needs 24 GB; serious training and large inference start at 80 GB.

Architecture

Newer architectures deliver more performance per watt. NVIDIA Blackwell (GB200, B200) succeeds Hopper (H100, H200), which succeeded Ampere (A100). AMD CDNA 3/4 (MI300X, MI355X) competes at the high end.

Price-to-Performance

Hourly cost matters, but cost per token or cost per training step matters more. Compare the pricing table above to find the best value for your workload and budget.

Popular GPU Tiers

Entry-Level AI

Good for fine-tuning smaller models, inference, and experimentation.

Professional

Handles mid-size models and moderate training runs.

Enterprise

Current workhorses for large-scale training and high-throughput inference.

Frontier

NVIDIA Blackwell Ultra generation. Maximum memory and compute for frontier model training.

We track pricing across 39 providers. Use the comparison table above to find current rates for any GPU.

Frequently Asked Questions

Common questions about GPU cloud pricing, LLM inference APIs, and specifications

Spot instances offer 60-91% discounts compared to on-demand pricing, but can be interrupted with 30 seconds to 2 minutes notice when capacity is needed. On-demand instances provide guaranteed availability and persistent data storage at a premium. Spot is ideal for training workloads with checkpointing, while on-demand is better for production and critical workloads that can't be interrupted.
VRAM requirements depend on your model size and task. Small models (basic CNNs) need 4-8GB, medium models (BERT, ResNet-50) need 12-16GB, and large language models (70B+ parameters) require 24GB or multiple GPUs. Training typically requires 2-4x more VRAM than inference. You can reduce requirements using quantization, gradient checkpointing, or smaller batch sizes.
H100 (Hopper architecture) is the newest, offering 3-6x faster LLM training than A100, priced at $4-8/hr. A100 (Ampere) provides excellent AI performance for research at $2-4/hr. RTX 4090 is a consumer GPU ideal for prototyping and small-scale ML at $0.18-0.35/hr. H100 and A100 have enterprise features like ECC memory and NVLink for multi-GPU scaling, while RTX 4090 is more cost-effective for individual workloads.
Hidden costs can add 60-80% to your total spend. Watch for: data egress fees ($0.08-$0.12 per GB), storage costs for datasets and checkpoints ($0.10-$0.30 per GB monthly), idle GPU time (teams waste 30-50% on unused instances), cross-region transfer fees, and premium GPU surcharges. Always calculate total cost of ownership, not just hourly rates.
Training requires powerful GPUs (A100, H100) with high VRAM and runs for hours or days, making it suitable for spot instances. Inference uses lighter GPUs (L4, A10, RTX series) with less VRAM and runs continuously, requiring on-demand reliability. Inference only loads 2 consecutive layers at a time, making it more memory-efficient per request.
Hyperscalers offer global availability, enterprise compliance, and integrated ecosystems, but cost 2-3x more with complex setup. Specialized providers like RunPod, Lambda, and CoreWeave are 40-60% cheaper with simpler setup but smaller ecosystems. Choose hyperscalers for enterprise compliance needs; choose specialized providers for cost-efficient ML workloads with simpler requirements.
2025 pricing ranges: H100 at $1.49-$6.98/hr (specialized providers $2-4, hyperscalers $4-8), H200 at $2.15-$6.00/hr, A100 80GB at $0.75-$4.00/hr, RTX 4090 at $0.18-$0.35/hr, and budget GPUs (L4, A10) at $0.33-$1.00/hr. Prices vary significantly by provider, region, and pricing model. Use ComputePrices to compare current rates across all providers.
Key strategies: Right-size instances (use L4/A10 instead of H100 when sufficient), use spot instances with checkpointing for training (60-91% savings), optimize models with quantization and pruning, batch inference requests to improve utilization from 20-30% to 70-80%, auto-shutdown idle instances to eliminate 30-50% waste, and choose specialized providers over hyperscalers for 40-60% savings on comparable hardware.