GPU Models
Browse 66 GPU models — compare specs, pricing, and cloud availability
H100 SXM
ultra80GB · Hopper · The H100 SXM targets large-scale AI training workloads, particularly for language models up to 70 billion parameters where its 80GB memory capacity and high memory bandwidth prove essential. Its 990 TFLOPS FP16 performance and Transformer Engine make it well-suited for training and fine-tuning transformer-based models, while the substantial CUDA core count supports traditional HPC simulations and scientific computing. The MIG capability enables cloud providers to partition the GPU for multiple concurrent workloads, making it valuable for multi-tenant AI inference serving and development environments.
A100 SXM
ultra80GB · Ampere · The A100 SXM is well-suited for AI training workloads requiring substantial memory capacity, particularly large language models and computer vision tasks that benefit from the 80GB memory configuration. Deep learning inference applications with high throughput requirements can leverage the 312 TFLOPS FP16 performance and 624 TOPS INT8 capability. High-performance computing applications in scientific research, financial modeling, and data analytics benefit from the combination of CUDA cores and memory bandwidth. Multi-tenant cloud environments can utilize MIG technology to partition the GPU into smaller instances, maximizing resource utilization while maintaining workload isolation.
H200
ultra141GB · Hopper · The H200 is designed for memory-intensive AI workloads, particularly large language model training and inference where the 141GB HBM3E memory capacity enables handling of models that exceed the memory limits of previous generations. Its high memory bandwidth of 4.8TB/s makes it suitable for generative AI applications, recommendation systems, and natural language processing tasks that require rapid access to large datasets. The substantial Tensor Core count and FP16 performance capabilities also position it for AI training workflows, scientific computing applications, and high-performance computing tasks that can leverage its memory subsystem advantages.
L40S
high48GB · Ada Lovelace · The L40S is well-suited for organizations requiring combined AI and graphics capabilities in cloud environments. Its 48GB memory capacity and Transformer Engine make it effective for large language model inference, generative AI applications, and medium-scale training workloads. The inclusion of RT Cores and DLSS 3 support enables professional rendering, architectural visualization, and content creation workflows. The GPU's 24/7 data center design makes it appropriate for production AI inference services, while its dual-purpose nature serves environments running NVIDIA Omniverse for collaborative 3D workflows alongside AI applications.
B200
ultra192GB · Blackwell · The B200 is designed for large-scale AI training and inference workloads that require substantial memory capacity and compute throughput. Its 192 GB VRAM makes it suitable for training large language models, processing extensive recommendation system datasets, and running memory-intensive scientific computing applications. The high INT8 performance and FP8 Tensor Core support optimize it for AI inference scenarios, while the substantial FP16 capability handles training workloads effectively. Organizations deploying chatbots, large language models, or complex AI pipelines benefit from the B200's combination of memory capacity and computational performance, particularly when workloads exceed the capabilities of lower-tier accelerators.
RTX A6000
high48GB · Ampere · The RTX A6000 addresses professional workloads requiring substantial memory capacity and mixed graphics-compute capabilities. Its 48GB GDDR6 memory and ECC support make it suitable for large-scale CAD modeling, architectural visualization, and scientific simulation where data integrity matters. The combination of RT cores and Tensor cores enables real-time ray tracing in content creation pipelines alongside AI-accelerated rendering workflows. In machine learning contexts, the substantial memory capacity supports training of moderately-sized models or inference on large models that exceed the memory limits of consumer GPUs, while the NVLink capability allows scaling to 96GB for even larger workloads.
A100 PCIE
ultra40GB · Ampere · The A100 PCIe is suited for AI training workloads requiring substantial memory capacity, particularly for natural language processing models, computer vision training, and recommendation systems that benefit from the 40GB memory buffer. Its Multi-Instance GPU capability makes it effective for inference serving scenarios where multiple smaller models can run simultaneously on partitioned resources. High-performance computing applications including scientific simulations, computational fluid dynamics, and molecular modeling leverage its FP32 and FP64 compute capabilities, while the PCIe form factor ensures compatibility with existing data center infrastructure without requiring specialized NVLink fabric investments.
L40
high40GB · Ada Lovelace · The L40 is well-suited for professional workloads that demand large memory capacity and mixed compute requirements. Its 48GB of ECC memory makes it appropriate for training medium to large AI models, running inference on memory-intensive models, and supporting virtualized workstation environments where multiple users share GPU resources. The combination of RT Cores and substantial VRAM supports 3D rendering, architectural visualization, and content creation workflows. Data science applications benefit from the large memory for processing extensive datasets, while the enterprise-grade design supports 24x7 cloud deployment scenarios requiring reliability and security features.
RTX 6000 Ada
high48GB · Ada Lovelace · The RTX 6000 Ada targets professional visualization, 3D rendering, and content creation workflows that benefit from its 48GB memory capacity and graphics acceleration features. Its combination of CUDA cores, RT Cores, and substantial VRAM makes it well-suited for architectural visualization, product design, video editing, and moderate-scale AI development. The ECC memory support and professional drivers also make it appropriate for technical computing applications requiring data integrity, while the AV1 encoding capabilities support modern video streaming and content creation pipelines.
L4
entry24GB · Ada Lovelace · The L4 is well-suited for AI inference deployments requiring substantial memory capacity within power-constrained environments. Its 24GB memory buffer makes it appropriate for deploying medium-sized language models, computer vision applications processing high-resolution imagery, and video analytics workloads that benefit from keeping large datasets in GPU memory. The low 72-watt power envelope and compact form factor make it suitable for edge AI deployments, telecommunications infrastructure, and cloud providers seeking to maximize inference throughput per rack unit while minimizing cooling costs.
Browse by Category
Explore GPUs grouped by what matters for your workload
By Manufacturer
Filter by GPU maker
By Architecture
Filter by chip generation
By VRAM
Filter by memory capacity
By Performance
Filter by compute tier
By Type
Datacenter vs consumer
Frequently Asked Questions
What is a cloud GPU and how does GPU cloud computing work?
A cloud GPU is a graphics processing unit hosted in a remote datacenter that you can rent by the hour. Instead of purchasing hardware, you access GPU compute through a cloud provider's API or virtual machine. This lets you scale up for training large AI models or running inference workloads, and scale down when you're done — paying only for the time you use.
How many GPU models are available for cloud rental?
We track 66 GPU models across 39 cloud providers. These span NVIDIA, AMD, Intel hardware, covering architectures from Ampere, Blackwell, Hopper, Ada Lovelace, Turing and more. Each GPU has different VRAM, compute throughput, and pricing — use the filters above to narrow down what fits your workload.
What GPU do I need for machine learning training?
It depends on model size. For training models under 7B parameters, a 24 GB GPU like the RTX 4090 or A10 is sufficient. For 7B–70B parameter models, you need 48–80 GB VRAM — the L40S, A100, or H100 are common choices. For training models above 70B parameters, you need multiple ultra-tier GPUs (H100, H200, MI300X) with NVLink interconnects for multi-GPU communication.
What is the difference between NVIDIA, AMD, and Intel GPUs for AI?
NVIDIA dominates with the CUDA ecosystem and widest cloud availability. Their Hopper (H100/H200) and Blackwell (B200/GB200) architectures lead in training throughput. AMD Instinct accelerators (MI300X) compete on VRAM capacity (192 GB) and price-performance using the ROCm software stack. Intel offers Gaudi accelerators optimized for deep learning and Max Series (Xe HPC) for scientific computing, both at competitive price points.
What does GPU architecture mean and why does it matter?
GPU architecture refers to the chip design generation — it determines compute capabilities, supported precision formats, memory type, and power efficiency. For example, NVIDIA's Hopper architecture introduced FP8 Transformer Engines for faster AI training, while Blackwell added FP4 for doubled inference throughput. Newer architectures generally offer better performance per watt and per dollar. Browse by architecture to compare GPUs within the same generation.
How much VRAM do I need?
VRAM determines how large a model you can fit in GPU memory. At FP16 precision, each billion parameters requires roughly 2 GB of VRAM. So a 7B model needs ~14 GB, a 13B model needs ~26 GB, and a 70B model needs ~140 GB. Quantization (INT8, INT4) can halve or quarter these requirements. For inference with quantized models, 24 GB handles up to 30B parameters, 48 GB handles up to 70B, and 80 GB+ is needed for larger models at full precision.
What are GPU performance tiers (entry, mid, high, ultra)?
Performance tiers group GPUs by their compute capability and typical use case. Entry-tier GPUs (T4, RTX 4060) suit prototyping and light inference. Mid-tier GPUs (A10G, RTX 3080) handle production inference and fine-tuning. High-tier GPUs (L40S, RTX 4090) serve demanding inference and medium-scale training. Ultra-tier GPUs (H100, A100 80GB, MI300X) are built for large-scale distributed training and high-throughput serving.
What is the difference between server GPUs and consumer GPUs?
Server GPUs (A100, H100, L40S) use ECC memory for data integrity, support NVLink for multi-GPU scaling, have higher sustained power budgets, and include enterprise driver support. Consumer GPUs (RTX 4090, RTX 3090) use GDDR memory, lack NVLink, and are designed for desktop use — but they can still be cost-effective for inference and smaller training jobs where ECC and multi-GPU interconnects aren't required.