ultra

A100 PCIE GPU

The A100 PCIe delivers strong AI training and HPC performance in a standard PCIe form factor, supporting multi-instance GPU partitioning for flexible workload sharing. It enables large-scale AI and scientific computing with improved efficiency over older GPUs.

VRAM 40GB
CUDA Cores 6,912
Tensor Cores 432
TDP 250W
Process 7nm
From
$0.25/hr
across 10 providers
A100 PCIE GPU

Cloud Pricing

Cheapest on Verda 75% below avg
ProviderConfigPrice / hrUpdatedSource
1×8×
$0.25/hr
4/21/2026
1×
$0.29/hr
3/30/2026
1×4×
$0.60/hr
4/21/2026
8×
$0.60/hr
4/21/2026
1×4×8×
$0.69/hr36mo
4/14/2026
1×8×
$0.72/hr
4/21/2026
1×2×4×8×
$0.79/hr24mo
4/14/2026
1×2×4×8×
$0.89/hr12mo
4/14/2026
1×2×4×8×
$0.99/hr6mo
4/14/2026
1×
$1.00/hr
3/31/2026
1×2×4×8×
$1.09/hr
4/14/2026
1×2×4×
$1.19/hr
4/21/2026
2×
$1.20/hr
4/21/2026
1×
$1.29/hr
4/21/2026
1×
$1.45/hr
4/17/2026
1×
$1.50/hr
4/18/2026
1×
$1.50/hr1mo
3/31/2026
1×
$1.50/hr
3/30/2026
1×
$1.50/hr
4/16/2026
2×
$2.08/hr
4/18/2026
1×
$2.40/hr
4/16/2026
Direct from providerVia marketplace

Prices updated daily. Last check: 4/21/2026

Performance

FP16
312 TFLOPS
FP32
19.5 TFLOPS
BF16
312 TFLOPS
INT8
624 TOPS
Bandwidth
1555 GB/s

Strengths & Limitations

  • 40GB HBM2e memory capacity supports large model training and inference workloads
  • Multi-Instance GPU (MIG) technology enables partitioning into up to seven isolated GPU instances
  • Third-generation Tensor Cores with TF32, BF16, FP16, and INT8 precision support
  • 1,555 GB/s memory bandwidth facilitates memory-intensive compute operations
  • PCIe Gen4 form factor provides compatibility with standard server architectures
  • 312 TFLOPS FP16 performance for mixed-precision AI training
  • 250W TDP offers power efficiency relative to compute capability
  • Limited to PCIe Gen4 bandwidth compared to NVLink connectivity in SXM variants
  • 250W power consumption requires adequate cooling and power infrastructure
  • Previous-generation architecture compared to current H100 and GB300 offerings
  • 40GB memory capacity may be insufficient for the largest language models
  • Higher cost per instance compared to consumer GPUs for development workloads

Key Features

Third-generation Tensor Cores with mixed-precision support
Multi-Instance GPU (MIG) with up to 7 GPU partitions
TF32 precision for AI training acceleration
PCIe Gen4 interface with 64 GB/s bandwidth
HBM2e memory with ECC support
NVIDIA NVEnc/NVDec engines
Ampere architecture streaming multiprocessors
CUDA Compute Capability 8.0

About A100 PCIE

The NVIDIA A100 PCIe is a data center GPU built on the Ampere architecture, representing NVIDIA's previous-generation offering for AI training and high-performance computing workloads. As part of the A100 family, this PCIe variant provides accessibility to Ampere's capabilities in standard server configurations without requiring specialized SXM form factors or NVLink fabric infrastructure. The A100 PCIe delivers 40GB of HBM2e memory with 1,555 GB/s of memory bandwidth, paired with 6,912 CUDA cores and 432 third-generation Tensor Cores. Key technical capabilities include Multi-Instance GPU (MIG) technology for resource partitioning, TF32 precision support for AI training acceleration, and PCIe Gen4 connectivity providing 64 GB/s of host interface bandwidth. The GPU operates within a 250W TDP envelope while delivering 312 TFLOPS of FP16 performance and 19.5 TFLOPS of FP32 compute. In cloud deployments, the A100 PCIe serves workloads requiring substantial memory capacity and compute performance, particularly for medium to large-scale AI model training, inference serving, and scientific computing applications where the PCIe form factor's broader server compatibility is advantageous over SXM variants.

Common Use Cases

The A100 PCIe is suited for AI training workloads requiring substantial memory capacity, particularly for natural language processing models, computer vision training, and recommendation systems that benefit from the 40GB memory buffer. Its Multi-Instance GPU capability makes it effective for inference serving scenarios where multiple smaller models can run simultaneously on partitioned resources. High-performance computing applications including scientific simulations, computational fluid dynamics, and molecular modeling leverage its FP32 and FP64 compute capabilities, while the PCIe form factor ensures compatibility with existing data center infrastructure without requiring specialized NVLink fabric investments.

Full Specifications

Hardware

Manufacturer
NVIDIA
Architecture
Ampere
CUDA Cores
6,912
Tensor Cores
432
RT Cores
0
Process Node
7nm
TDP
250W

Memory & Performance

VRAM
40GB
Memory Interface
5120-bit
Memory Bandwidth
1555 GB/s
FP32
19.5 TFLOPS
FP16
312 TFLOPS
BF16
312 TFLOPS
FP64
9.7 TFLOPS
INT8
624 TOPS
Release
2020

Frequently Asked Questions

How much does an A100 PCIe cost per hour in the cloud?

A100 PCIe pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the A100 PCIe best used for?

The A100 PCIe excels at AI model training and inference for medium to large-scale models, particularly those requiring substantial memory capacity up to 40GB. Its Multi-Instance GPU capability makes it effective for serving multiple inference workloads simultaneously, while its compute performance supports scientific computing and high-performance computing applications.

How does the A100 PCIe compare to the A100 SXM variant?

The A100 PCIe operates at 250W TDP compared to the SXM's 400W, resulting in lower peak performance but better power efficiency. The PCIe variant uses standard PCIe Gen4 connectivity (64 GB/s) instead of NVLink (600 GB/s), limiting multi-GPU bandwidth but providing broader server compatibility. Both variants share the same 40GB HBM2e memory capacity and Ampere architecture features.