Skip to main content
high

A40 GPU

The A40 combines professional visualization and AI acceleration in a single GPU, supporting virtual workstations and rendering workloads alongside inference tasks. It offers a good balance of memory and compute for mixed graphics and AI use cases.

VRAM 48GB
CUDA Cores 10,752
Tensor Cores 336
TDP 300W
Process 8nm
From
$0.29/hr
across 7 providers
A40 GPU

Cloud Pricing

Cheapest on Vast.ai 72% below avg
ProviderConfigPrice / hrUpdatedSource
2×
$0.29/hr
5/13/2026
1×
$0.35/hr
5/13/2026
1×
$0.39/hr
4/18/2026
1×
$0.40/hr
4/17/2026
1×
$0.41/hr
5/13/2026
1×2×4×8×
$1.10/hr
5/13/2026
1×2×4×8×
$1.21/hr
5/13/2026
1×
$1.51/hr
4/20/2026
1×4×
$1.86/hr
5/13/2026
Direct from providerVia marketplace

Prices updated daily. Last check: May 13, 2026

Performance

FP16
149.7 TFLOPS
FP32
37.4 TFLOPS
INT8
299.3 TOPS
Bandwidth
696 GB/s

Strengths & Limitations

Strengths

  • 48GB GDDR6 memory with ECC enables training of large AI models and complex simulations
  • 336 third-generation Tensor Cores provide 149.7 TFLOPS of FP16 performance for AI workloads
  • Second-generation RT Cores deliver hardware-accelerated ray tracing for professional graphics
  • 696 GB/s memory bandwidth supports memory-intensive applications
  • Third-generation NVLink at 112.5 GB/s enables multi-GPU scaling
  • PCIe Gen 4 support provides modern system compatibility
  • NVIDIA vGPU software support enables virtualization and multi-user scenarios

Limitations

  • 300W power consumption requires robust cooling and power infrastructure
  • Ampere architecture is a previous generation compared to newer Hopper and Blackwell offerings
  • Dual-slot form factor may limit density in space-constrained deployments
  • Lacks specialized features found in newer data center GPUs like Transformer Engine
  • May be overkill for basic inference workloads that don't require 48GB VRAM

Key Features

NVIDIA Ampere Architecture
Third-Generation Tensor Cores
Second-Generation RT Cores
Third-Generation NVIDIA NVLink
NVIDIA vGPU software support
Secure and Measured Boot with Hardware Root of Trust
NVIDIA Mosaic multi-display technology
NVIDIA Quadro Sync technology

About A40

The NVIDIA A40 is a professional workstation GPU based on the Ampere architecture, positioned as a high-performance solution for AI training, data science, and professional graphics workloads. Built on an 8nm manufacturing process, the A40 features 10,752 CUDA cores, 336 third-generation Tensor Cores, and 48GB of GDDR6 memory with ECC support, providing substantial compute and memory resources for demanding applications. The A40 delivers 37.4 TFLOPS of FP32 performance and 149.7 TFLOPS of FP16 performance, with 696 GB/s of memory bandwidth across a 384-bit interface. Key technical differentiators include second-generation RT Cores for ray tracing acceleration, third-generation NVLink interconnect at 112.5 GB/s bidirectional bandwidth, and PCIe Gen 4 support. The GPU operates at a 300W TDP in a dual-slot form factor. In cloud deployments, the A40 serves workloads requiring substantial VRAM capacity and mixed compute requirements, including AI model training with large datasets, professional visualization applications, virtual workstations, and data science workflows that benefit from the combination of traditional compute performance and AI acceleration capabilities.

Common Use Cases

The A40 is well-suited for AI model training and data science workflows that require substantial VRAM capacity, particularly models with large parameter counts or extensive datasets that benefit from the 48GB memory buffer. Professional graphics applications, including CAD, content creation, and scientific visualization, leverage the RT Cores and high memory bandwidth. Virtual workstation deployments benefit from vGPU software support, enabling multiple concurrent users. The combination of traditional compute performance and AI acceleration makes it appropriate for mixed workloads in research environments and development workflows that span both graphics and machine learning requirements.

Full Specifications

Hardware

Manufacturer
NVIDIA
Architecture
Ampere
CUDA Cores
10,752
Tensor Cores
336
RT Cores
84
Process Node
8nm
TDP
300W

Memory & Performance

VRAM
48GB
Memory Interface
384-bit
Memory Bandwidth
696 GB/s
FP32
37.4 TFLOPS
FP16
149.7 TFLOPS
INT8
299.3 TOPS
Release
2020

Frequently Asked Questions

How much does an A40 cost per hour in the cloud?

A40 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the A40 best used for?

The A40 excels at AI model training requiring large memory capacity, professional graphics workloads with ray tracing, virtual workstations, and data science applications. Its 48GB VRAM and Tensor Cores make it particularly suitable for training large neural networks, while RT Cores accelerate professional visualization tasks.

How does the A40 compare to modern data center GPUs like the H100?

The A40 offers 48GB VRAM compared to the H100's 80GB, and lacks the H100's Transformer Engine and fourth-generation Tensor Cores. The H100 provides significantly higher AI performance with specialized features for transformer models, while the A40 combines AI capabilities with professional graphics features like RT Cores that the H100 omits.