L40S GPU
The Nvidia L40S provides multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications. Based on the latest Ada Lovelace architecture.

Cloud Pricing
Cheapest on Verda — 75% below avgPrices updated daily. Last check: May 3, 2026
Performance
Strengths & Limitations
- Large 48GB GDDR6 memory capacity with ECC supports memory-intensive AI models and datasets
- Fourth-generation Tensor Cores with Transformer Engine optimize large language model inference performance
- Third-generation RT Cores deliver 212 TFLOPS ray tracing performance for graphics workloads
- 1,466 TFLOPS FP8 tensor performance enables efficient AI inference acceleration
- Ada Lovelace architecture built on 4nm process provides improved power efficiency
- Dual-slot form factor fits standard server configurations
- 24/7 data center operation design with secure boot and root of trust security features
- 350W power consumption requires robust cooling and power infrastructure
- Workstation-class positioning lacks some enterprise features found in server GPUs like H100
- Ada Lovelace architecture is older than current-generation Blackwell Ultra designs
- May be overkill for basic inference tasks that don't require 48GB memory capacity
- PCIe Gen4 x16 interface may become a bottleneck for high-throughput multi-GPU configurations
Key Features
About L40S
Common Use Cases
The L40S is well-suited for organizations requiring combined AI and graphics capabilities in cloud environments. Its 48GB memory capacity and Transformer Engine make it effective for large language model inference, generative AI applications, and medium-scale training workloads. The inclusion of RT Cores and DLSS 3 support enables professional rendering, architectural visualization, and content creation workflows. The GPU's 24/7 data center design makes it appropriate for production AI inference services, while its dual-purpose nature serves environments running NVIDIA Omniverse for collaborative 3D workflows alongside AI applications.
Full Specifications
Hardware
- Manufacturer
- NVIDIA
- Architecture
- Ada Lovelace
- CUDA Cores
- 18,176
- Tensor Cores
- 568
- RT Cores
- 142
- Process Node
- 4nm
- TDP
- 350W
Memory & Performance
- VRAM
- 48GB
- Memory Interface
- 384-bit
- Memory Bandwidth
- 864 GB/s
- FP32
- 91.6 TFLOPS
- FP16
- 362.05 TFLOPS
- BF16
- 362.05 TFLOPS
- FP8
- 733 TFLOPS
- INT8
- 733 TOPS
- Release
- 2023
Frequently Asked Questions
How much does an L40S cost per hour in the cloud?
L40S pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.
What is the L40S best used for?
The L40S excels at combined AI inference and graphics workloads, particularly large language model inference with its 48GB memory and Transformer Engine, generative AI applications, professional rendering with RT Core acceleration, and mixed enterprise workloads requiring both compute and visualization capabilities.
How does the L40S compare to the H100 for AI workloads?
The H100 offers superior AI training performance with HBM3 memory and higher tensor throughput, while the L40S provides a balance of AI inference capabilities and graphics rendering with its RT Cores and DLSS 3 support. The L40S's 48GB GDDR6 memory is sufficient for most inference tasks, while the H100's 80GB HBM3 better serves large-scale training workloads.