Tesla T4 GPU
The T4 is a low-power data center GPU optimized for AI inference, offering mixed-precision performance in a compact design. It's commonly used in cloud deployments for cost-efficient scaling of NLP and recommendation systems.

Cloud Pricing
Cheapest on Google Cloud — 70% below avg| Provider | GPUs | Price / hr | Updated | Source |
|---|---|---|---|---|
| 1× GPU | 3mo$0.16 | 3/31/2026 | ||
| 1× GPU | 1mo$0.22 | 3/31/2026 | ||
| 1× GPU | $0.35 | 3/31/2026 | ||
| 1× GPU | $0.53 | 4/8/2026 | ||
| 4× GPU | $0.98 | 4/8/2026 | ||
| 8× GPU | $0.98 | 4/8/2026 |
Prices updated daily. Last check: 4/8/2026
Performance
Strengths & Limitations
- Low 70-watt TDP enables deployment in power-constrained environments
- 16 GB GDDR6 memory provides adequate capacity for moderate model sizes
- Turing Tensor Cores support multi-precision computing including FP16 and INT8
- Dedicated hardware transcoding engines handle video processing workloads
- Single-slot PCIe form factor fits in space-constrained server configurations
- 320 GB/s memory bandwidth supports inference workloads efficiently
- 65 TFLOPS FP16 performance accelerates mixed-precision AI tasks
- Turing architecture lacks modern features found in Hopper and Blackwell generations
- 8.1 TFLOPS FP32 performance insufficient for large-scale training workloads
- 320 tensor cores provide limited throughput for transformer-based models
- No NVLink support restricts multi-GPU scaling capabilities
- PCIe Gen3 x16 interface may bottleneck high-bandwidth applications
Key Features
About Tesla T4
Common Use Cases
The Tesla T4 is well-suited for AI inference workloads that require moderate computational power, such as computer vision applications, natural language processing with smaller models, and real-time recommendation systems. Its dedicated transcoding engines make it effective for video analytics pipelines that combine AI inference with media processing. The 16 GB memory capacity accommodates models up to medium complexity, while the 70-watt power envelope enables deployment in edge computing scenarios or data centers with strict power budgets. Organizations using the T4 typically run inference-focused workloads rather than training, leveraging its multi-precision capabilities for INT8 and FP16 optimized models.
Full Specifications
Hardware
- Manufacturer
- NVIDIA
- Architecture
- Turing
- CUDA Cores
- 2,560
- Tensor Cores
- 320
- RT Cores
- 40
- Process Node
- 12nm
- TDP
- 70W
Memory & Performance
- VRAM
- 16GB
- Memory Interface
- 256-bit
- Memory Bandwidth
- 320 GB/s
- FP32
- 8.1 TFLOPS
- FP16
- 65 TFLOPS
- FP64
- 0.25 TFLOPS
- INT8
- 130 TOPS
- Release
- 2018
Frequently Asked Questions
How much does a Tesla T4 cost per hour in the cloud?
Tesla T4 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the Tesla T4 best used for?
The Tesla T4 excels at AI inference workloads, particularly computer vision, natural language processing with smaller models, and video analytics. Its 70-watt power efficiency makes it ideal for edge computing and cost-conscious deployments where moderate AI acceleration is sufficient.
How does the Tesla T4 compare to newer NVIDIA data center GPUs?
The T4's Turing architecture lacks modern features found in Hopper and Blackwell generations, such as advanced tensor formats and higher memory bandwidth. However, its 70-watt TDP and 16 GB memory make it more power-efficient and cost-effective for inference workloads that don't require the computational density of H100 or GB200 series GPUs.