GPU Pricing

Compare hourly rates across 39+ providers and 56+ GPU models

Updated daily

๐Ÿ’กTop Picks Right Now

Click to filter table
Showing 0 of 0 resultsNo results found. Try adjusting your filters.
Loading...

Provider Details

Select a provider from the table to see information about pricing, features, and more.

GPU Specifications

Pick a GPU from the table to check out its specs and see how it performs.

Frequently Asked Questions

Common questions about GPU cloud pricing and specifications

Spot instances offer 60-91% discounts compared to on-demand pricing, but can be interrupted with 30 seconds to 2 minutes notice when capacity is needed. On-demand instances provide guaranteed availability and persistent data storage at a premium. Spot is ideal for training workloads with checkpointing, while on-demand is better for production and critical workloads that can't be interrupted.
VRAM requirements depend on your model size and task. Small models (basic CNNs) need 4-8GB, medium models (BERT, ResNet-50) need 12-16GB, and large language models (70B+ parameters) require 24GB or multiple GPUs. Training typically requires 2-4x more VRAM than inference. You can reduce requirements using quantization, gradient checkpointing, or smaller batch sizes.
H100 (Hopper architecture) is the newest, offering 3-6x faster LLM training than A100, priced at $4-8/hr. A100 (Ampere) provides excellent AI performance for research at $2-4/hr. RTX 4090 is a consumer GPU ideal for prototyping and small-scale ML at $0.18-0.35/hr. H100 and A100 have enterprise features like ECC memory and NVLink for multi-GPU scaling, while RTX 4090 is more cost-effective for individual workloads.
Hidden costs can add 60-80% to your total spend. Watch for: data egress fees ($0.08-$0.12 per GB), storage costs for datasets and checkpoints ($0.10-$0.30 per GB monthly), idle GPU time (teams waste 30-50% on unused instances), cross-region transfer fees, and premium GPU surcharges. Always calculate total cost of ownership, not just hourly rates.
Training requires powerful GPUs (A100, H100) with high VRAM and runs for hours or days, making it suitable for spot instances. Inference uses lighter GPUs (L4, A10, RTX series) with less VRAM and runs continuously, requiring on-demand reliability. Inference only loads 2 consecutive layers at a time, making it more memory-efficient per request.
Hyperscalers offer global availability, enterprise compliance, and integrated ecosystems, but cost 2-3x more with complex setup. Specialized providers like RunPod, Lambda, and CoreWeave are 40-60% cheaper with simpler setup but smaller ecosystems. Choose hyperscalers for enterprise compliance needs; choose specialized providers for cost-efficient ML workloads with simpler requirements.
2025 pricing ranges: H100 at $1.49-$6.98/hr (specialized providers $2-4, hyperscalers $4-8), H200 at $2.15-$6.00/hr, A100 80GB at $0.75-$4.00/hr, RTX 4090 at $0.18-$0.35/hr, and budget GPUs (L4, A10) at $0.33-$1.00/hr. Prices vary significantly by provider, region, and pricing model. Use ComputePrices to compare current rates across all providers.
Key strategies: Right-size instances (use L4/A10 instead of H100 when sufficient), use spot instances with checkpointing for training (60-91% savings), optimize models with quantization and pruning, batch inference requests to improve utilization from 20-30% to 70-80%, auto-shutdown idle instances to eliminate 30-50% waste, and choose specialized providers over hyperscalers for 40-60% savings on comparable hardware.