A2 GPU
The NVIDIA A2 is an entry-level data center GPU for efficient AI inference at the edge and in compact servers.

Cloud Pricing
Cheapest on RunPod — 73% below avg| Provider | GPUs | Price / hr | Updated | Source |
|---|---|---|---|---|
| 1× GPU | $0.06 | 3/26/2026 | ||
| 1× GPU | $0.07 | 3/22/2026 | ||
| 1× GPU | $0.12 | 3/26/2026 | ||
| 1× GPU | $0.42 | 4/6/2026 | ||
| 2× GPU | $0.42 | 4/5/2026 |
Prices updated daily. Last check: 4/8/2026
Performance
Strengths & Limitations
- Low 60W maximum power consumption enables deployment in power-constrained environments
- Single-slot, low-profile PCIe form factor maximizes server density
- 16GB GDDR6 memory supports models that require substantial memory without stepping up to higher-power GPUs
- 40 second-generation Tensor cores provide hardware acceleration for AI inference workloads
- PCIe Gen4 x8 interface offers modern connectivity standards
- Configurable TDP between 40-60W allows optimization for specific deployment requirements
- 200 GB/s memory bandwidth sufficient for entry-level inference applications
- Limited to 1280 CUDA cores restricts performance for compute-intensive workloads
- 4.5 TFLOPS FP32 performance inadequate for training large models
- Entry-level positioning makes it insufficient for high-throughput inference scenarios
- Released in 2021, representing previous-generation Ampere architecture compared to newer GPU lines
- 60W power budget limits sustained performance under continuous workloads
Key Features
About A2
Common Use Cases
The A2 is optimized for AI inference applications in edge computing environments and scenarios requiring dense GPU deployments. Its 16GB memory capacity and Tensor core acceleration make it suitable for computer vision models, natural language processing inference, and text-to-speech applications that don't require the computational power of higher-tier GPUs. The low power consumption and compact form factor enable deployment in edge servers, retail environments, and distributed inference architectures where space and power are constrained. Organizations running multiple concurrent inference workloads can deploy several A2 GPUs in a single server due to their minimal thermal and power requirements.
Full Specifications
Hardware
- Manufacturer
- NVIDIA
- Architecture
- Ampere
- CUDA Cores
- 1,280
- Tensor Cores
- 40
- TDP
- 60W
Memory & Performance
- VRAM
- 16GB
- Memory Bandwidth
- 200 GB/s
- FP32
- 4.5 TFLOPS
- FP16
- 36 TFLOPS
- Release
- 2021
Frequently Asked Questions
How much does an A2 cost per hour in the cloud?
A2 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the A2 best used for?
The A2 excels at AI inference workloads including computer vision, natural language processing, and text-to-speech applications. Its 16GB memory and low power consumption make it ideal for edge computing scenarios and dense deployment environments where space and power are constrained.
How does the A2 compare to other entry-level inference GPUs?
The A2 offers 16GB GDDR6 memory and 40 Tensor cores in a 60W power envelope, providing more memory capacity than many entry-level options while maintaining a compact single-slot form factor. Its Ampere architecture provides second-generation Tensor cores and PCIe Gen4 connectivity, though newer GPU generations offer improved performance per watt.