Skip to main content
high

L40S GPU

The Nvidia L40S provides multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications. Based on the latest Ada Lovelace architecture.

VRAM 48GB
CUDA Cores 18,176
Tensor Cores 568
TDP 350W
Process 4nm
From
$0.32/hr
across 23 providers
L40S GPU

Cloud Pricing

Cheapest on Verda 75% below avg
ProviderConfigPrice / hrUpdatedSource
1×2×4×8×
$0.32/hr
5/3/2026
1×2×4×
$0.40/hr
5/3/2026
1×3×4×5×
$0.48/hr
5/3/2026
1×
$0.50/hr
4/17/2026
1×2×4×
$0.63/hr
4/22/2026
1×
$0.74/hr
5/3/2026
8×
$0.76/hr
5/3/2026
1×2×4×
$0.79/hr
5/3/2026
1×
$0.81/hr
5/3/2026
1×
$0.87/hr
4/18/2026
8×
$0.87/hr
5/3/2026
1×2×
$0.88/hr
5/3/2026
2×4×
$0.88/hr
5/3/2026
1×2×4×8×
$0.89/hr36mo
5/2/2026
1×4×8×
$0.91/hr
5/3/2026
2×
$0.91/hr
5/3/2026
1×
$0.92/hr
5/1/2026
2×
$0.95/hr
4/26/2026
2×4×
$0.97/hr
5/3/2026
1×2×4×8×
$0.99/hr24mo
5/2/2026
1×
$0.99/hr
5/3/2026
2×
$1.05/hr
5/3/2026
1×2×4×8×
$1.09/hr12mo
5/2/2026
2×
$1.16/hr
5/2/2026
1×2×4×8×
$1.19/hr6mo
5/2/2026
2×
$1.22/hr
5/1/2026
1×2×4×8×
$1.29/hr
5/2/2026
1×
$1.30/hr
5/1/2026
3×
$1.43/hr
4/27/2026
8×
$1.45/hr
4/16/2026
1×2×4×8×10×
$1.45/hr
5/3/2026
1×
$1.55/hr
5/1/2026
1×
$1.57/hr
5/3/2026
10×
$1.60/hr
5/3/2026
1×2×4×8×
$1.64/hr
5/3/2026
1×
$1.67/hr36mo
4/25/2026
1×
$1.67/hr
5/2/2026
1×
$1.86/hr
5/3/2026
1×4×8×
$2.10/hr
5/3/2026
8×
$2.25/hr
4/25/2026
1×
$2.27/hr
5/3/2026
4×
$2.41/hr
5/3/2026
4×
$2.62/hr
5/3/2026
1×
$3.50/hr
5/3/2026
1×2×4×8×
$3.51/hr
4/16/2026
8×
$3.77/hr
5/3/2026
Direct from providerVia marketplace

Prices updated daily. Last check: May 3, 2026

Performance

FP16
362.05 TFLOPS
FP32
91.6 TFLOPS
BF16
362.05 TFLOPS
FP8
733 TFLOPS
INT8
733 TOPS
Bandwidth
864 GB/s

Strengths & Limitations

  • Large 48GB GDDR6 memory capacity with ECC supports memory-intensive AI models and datasets
  • Fourth-generation Tensor Cores with Transformer Engine optimize large language model inference performance
  • Third-generation RT Cores deliver 212 TFLOPS ray tracing performance for graphics workloads
  • 1,466 TFLOPS FP8 tensor performance enables efficient AI inference acceleration
  • Ada Lovelace architecture built on 4nm process provides improved power efficiency
  • Dual-slot form factor fits standard server configurations
  • 24/7 data center operation design with secure boot and root of trust security features
  • 350W power consumption requires robust cooling and power infrastructure
  • Workstation-class positioning lacks some enterprise features found in server GPUs like H100
  • Ada Lovelace architecture is older than current-generation Blackwell Ultra designs
  • May be overkill for basic inference tasks that don't require 48GB memory capacity
  • PCIe Gen4 x16 interface may become a bottleneck for high-throughput multi-GPU configurations

Key Features

Fourth-Generation Tensor Cores
Third-Generation RT Cores
Transformer Engine with FP8 support
DLSS 3 acceleration
GDDR6 memory with ECC protection
Secure boot with root of trust technology
CUDA Cores for general compute workloads
PCIe Gen4 x16 interface

About L40S

The NVIDIA L40S is a high-performance data center GPU built on the Ada Lovelace architecture, positioned as a workstation-class accelerator for enterprise AI and visual computing workloads. Released in 2023, it sits below NVIDIA's server-focused H100 and newer GB300 series in the data center hierarchy, targeting organizations that need substantial AI inference capabilities combined with graphics rendering performance in a single card. The L40S features 48GB of GDDR6 memory with ECC protection, 18,176 CUDA cores, and 568 fourth-generation Tensor Cores that deliver 362 TFLOPS of FP16 performance and 1,466 TFLOPS of FP8 tensor performance. Its 864 GB/s memory bandwidth supports memory-intensive workloads, while the inclusion of third-generation RT Cores provides 212 TFLOPS of ray tracing performance. The GPU incorporates the Transformer Engine for optimized large language model processing and supports DLSS 3 for accelerated graphics workloads. In cloud deployments, the L40S serves dual-purpose scenarios where both AI inference and graphics rendering capabilities are needed. Its design for 24/7 data center operations with secure boot and root of trust technology makes it suitable for enterprise environments running mixed workloads including generative AI applications, LLM inference, and professional visualization tasks through platforms like NVIDIA Omniverse.

Common Use Cases

The L40S is well-suited for organizations requiring combined AI and graphics capabilities in cloud environments. Its 48GB memory capacity and Transformer Engine make it effective for large language model inference, generative AI applications, and medium-scale training workloads. The inclusion of RT Cores and DLSS 3 support enables professional rendering, architectural visualization, and content creation workflows. The GPU's 24/7 data center design makes it appropriate for production AI inference services, while its dual-purpose nature serves environments running NVIDIA Omniverse for collaborative 3D workflows alongside AI applications.

Full Specifications

Hardware

Manufacturer
NVIDIA
Architecture
Ada Lovelace
CUDA Cores
18,176
Tensor Cores
568
RT Cores
142
Process Node
4nm
TDP
350W

Memory & Performance

VRAM
48GB
Memory Interface
384-bit
Memory Bandwidth
864 GB/s
FP32
91.6 TFLOPS
FP16
362.05 TFLOPS
BF16
362.05 TFLOPS
FP8
733 TFLOPS
INT8
733 TOPS
Release
2023

Frequently Asked Questions

How much does an L40S cost per hour in the cloud?

L40S pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the L40S best used for?

The L40S excels at combined AI inference and graphics workloads, particularly large language model inference with its 48GB memory and Transformer Engine, generative AI applications, professional rendering with RT Core acceleration, and mixed enterprise workloads requiring both compute and visualization capabilities.

How does the L40S compare to the H100 for AI workloads?

The H100 offers superior AI training performance with HBM3 memory and higher tensor throughput, while the L40S provides a balance of AI inference capabilities and graphics rendering with its RT Cores and DLSS 3 support. The L40S's 48GB GDDR6 memory is sufficient for most inference tasks, while the H100's 80GB HBM3 better serves large-scale training workloads.