high

L40S GPU

Name: L40S
Brand: NVIDIA
Price: 1.55 USD
Availability: InStock

The Nvidia L40S provides multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications. Based on the latest Ada Lovelace architecture.

VRAM 48GB

CUDA Cores 18,176

Tensor Cores 568

TDP 350W

Process 4nm

From

$0.32/hr

across 23 providers

Compare Prices Specs →

Cloud Pricing

Cheapest on Verda — 75% below avg

Provider	Config	Price / hr	Updated
Verda	1×2×4×8×	$0.32/hr	5/3/2026
RunPod	1×2×4×	$0.40/hr	5/3/2026
TensorDock	1×3×4×5×	$0.48/hr	5/3/2026
Crusoe	1×	$0.50/hr	4/17/2026
IO.NET	1×2×4×	$0.63/hr	4/22/2026
Latitude.sh	1×	$0.74/hr	5/3/2026
Sesterce	8×	$0.76/hr	5/3/2026
RunPod	1×2×4×	$0.79/hr	5/3/2026
Sesterce	1×	$0.81/hr	5/3/2026
Cudo Compute	1×	$0.87/hr	4/18/2026
Latitude.sh	8×	$0.87/hr	5/3/2026
Massed Compute	1×2×	$0.88/hr	5/3/2026
Latitude.sh	2×4×	$0.88/hr	5/3/2026
Civo	1×2×4×8×	$0.89/hr36mo	5/2/2026
Verda	1×4×8×	$0.91/hr	5/3/2026
Verda	2×	$0.91/hr	5/3/2026
Packet AI	1×	$0.92/hr	5/1/2026
TensorDock	2×	$0.95/hr	4/26/2026
Sesterce	2×4×	$0.97/hr	5/3/2026
Civo	1×2×4×8×	$0.99/hr24mo	5/2/2026
Seeweb	1×	$0.99/hr	5/3/2026
Together AI	2×	$1.05/hr	5/3/2026
Civo	1×2×4×8×	$1.09/hr12mo	5/2/2026
AceCloud	2×	$1.16/hr	5/2/2026
Civo	1×2×4×8×	$1.19/hr6mo	5/2/2026
UpCloud	2×	$1.22/hr	5/1/2026
Civo	1×2×4×8×	$1.29/hr	5/2/2026
UpCloud	1×	$1.30/hr	5/1/2026
UpCloud	3×	$1.43/hr	4/27/2026
IO.NET	8×	$1.45/hr	4/16/2026
Crusoe	1×2×4×8×10×	$1.45/hr	5/3/2026
Nebius	1×	$1.55/hr	5/1/2026
DigitalOcean	1×	$1.57/hr	5/3/2026
Sesterce	10×	$1.60/hr	5/3/2026
Scaleway	1×2×4×8×	$1.64/hr	5/3/2026
Vultr	1×	$1.67/hr36mo	4/25/2026
Vultr	1×	$1.67/hr	5/2/2026
Amazon AWS	1×	$1.86/hr	5/3/2026
Together AI	1×4×8×	$2.10/hr	5/3/2026
CoreWeave	8×	$2.25/hr	4/25/2026
AceCloud	1×	$2.27/hr	5/3/2026
AceCloud	4×	$2.41/hr	5/3/2026
Amazon AWS	4×	$2.62/hr	5/3/2026
Oracle Cloud	1×	$3.50/hr	5/3/2026
Replicate	1×2×4×8×	$3.51/hr	4/16/2026
Amazon AWS	8×	$3.77/hr	5/3/2026

Direct from providerVia marketplace

Prices updated daily. Last check: May 3, 2026

Performance

FP16

362.05 TFLOPS

FP32

91.6 TFLOPS

BF16

362.05 TFLOPS

FP8

733 TFLOPS

INT8

733 TOPS

Bandwidth

864 GB/s

Strengths & Limitations

Large 48GB GDDR6 memory capacity with ECC supports memory-intensive AI models and datasets
Fourth-generation Tensor Cores with Transformer Engine optimize large language model inference performance
Third-generation RT Cores deliver 212 TFLOPS ray tracing performance for graphics workloads
1,466 TFLOPS FP8 tensor performance enables efficient AI inference acceleration
Ada Lovelace architecture built on 4nm process provides improved power efficiency
Dual-slot form factor fits standard server configurations
24/7 data center operation design with secure boot and root of trust security features

350W power consumption requires robust cooling and power infrastructure
Workstation-class positioning lacks some enterprise features found in server GPUs like H100
Ada Lovelace architecture is older than current-generation Blackwell Ultra designs
May be overkill for basic inference tasks that don't require 48GB memory capacity
PCIe Gen4 x16 interface may become a bottleneck for high-throughput multi-GPU configurations

Key Features

•Fourth-Generation Tensor Cores

•Third-Generation RT Cores

•Transformer Engine with FP8 support

•DLSS 3 acceleration

•GDDR6 memory with ECC protection

•Secure boot with root of trust technology

•CUDA Cores for general compute workloads

•PCIe Gen4 x16 interface

About L40S

The NVIDIA L40S is a high-performance data center GPU built on the Ada Lovelace architecture, positioned as a workstation-class accelerator for enterprise AI and visual computing workloads. Released in 2023, it sits below NVIDIA's server-focused H100 and newer GB300 series in the data center hierarchy, targeting organizations that need substantial AI inference capabilities combined with graphics rendering performance in a single card. The L40S features 48GB of GDDR6 memory with ECC protection, 18,176 CUDA cores, and 568 fourth-generation Tensor Cores that deliver 362 TFLOPS of FP16 performance and 1,466 TFLOPS of FP8 tensor performance. Its 864 GB/s memory bandwidth supports memory-intensive workloads, while the inclusion of third-generation RT Cores provides 212 TFLOPS of ray tracing performance. The GPU incorporates the Transformer Engine for optimized large language model processing and supports DLSS 3 for accelerated graphics workloads. In cloud deployments, the L40S serves dual-purpose scenarios where both AI inference and graphics rendering capabilities are needed. Its design for 24/7 data center operations with secure boot and root of trust technology makes it suitable for enterprise environments running mixed workloads including generative AI applications, LLM inference, and professional visualization tasks through platforms like NVIDIA Omniverse.

Common Use Cases

The L40S is well-suited for organizations requiring combined AI and graphics capabilities in cloud environments. Its 48GB memory capacity and Transformer Engine make it effective for large language model inference, generative AI applications, and medium-scale training workloads. The inclusion of RT Cores and DLSS 3 support enables professional rendering, architectural visualization, and content creation workflows. The GPU's 24/7 data center design makes it appropriate for production AI inference services, while its dual-purpose nature serves environments running NVIDIA Omniverse for collaborative 3D workflows alongside AI applications.

Full Specifications

Hardware

Manufacturer: NVIDIA
Architecture: Ada Lovelace
CUDA Cores: 18,176
Tensor Cores: 568
RT Cores: 142
Process Node: 4nm
TDP: 350W

Memory & Performance

VRAM: 48GB
Memory Interface: 384-bit
Memory Bandwidth: 864 GB/s
FP32: 91.6 TFLOPS
FP16: 362.05 TFLOPS
BF16: 362.05 TFLOPS
FP8: 733 TFLOPS
INT8: 733 TOPS
Release: 2023

Frequently Asked Questions

How much does an L40S cost per hour in the cloud?

L40S pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the L40S best used for?

The L40S excels at combined AI inference and graphics workloads, particularly large language model inference with its 48GB memory and Transformer Engine, generative AI applications, professional rendering with RT Core acceleration, and mixed enterprise workloads requiring both compute and visualization capabilities.

How does the L40S compare to the H100 for AI workloads?

The H100 offers superior AI training performance with HBM3 memory and higher tensor throughput, while the L40S provides a balance of AI inference capabilities and graphics rendering with its RT Cores and DLSS 3 support. The L40S's 48GB GDDR6 memory is sufficient for most inference tasks, while the H100's 80GB HBM3 better serves large-scale training workloads.