
Modal
Serverless GPUs for AI workloads with per-second billing
Last reviewed May 8, 2026
Modal is a serverless GPU platform that lets developers run Python functions, jobs and inference endpoints on NVIDIA GPUs with per-second billing and scale-to-zero.
We're actively tracking prices for Modal. Check back soon, or browse other providers with current pricing.
Pros & Cons
Advantages
- Serverless model removes idle-instance costs
- Per-second billing across the full GPU range
- Strong fit for inference, batch jobs and ML pipelines
Limitations
- Long-running, fixed-instance training is not the primary use case
- Cold starts and storage limits require some application design
- No bare-metal access; workloads run inside Modal's runtime
Key Features
Serverless GPUs
Run Python functions on NVIDIA GPUs without provisioning instances; cold starts in seconds
Per-Second Billing
Pay for actual GPU runtime at sub-minute granularity, with scale-to-zero by default
Container-Native
Define environments in code, with automatic image building and caching
Wide GPU Catalog
From T4 and L4 through A100, L40S, H100, H200 and B200
Pricing Options
| Option | Details |
|---|---|
| Per-Second GPU Billing | Charged per second of GPU runtime, with scale-to-zero when idle |
| Free Tier | Monthly free credits for experimentation and personal projects |
| Team and Enterprise Plans | Volume commitments and enterprise support for production deployments |
Availability & Support
Regions
Multi-region availability across North America and Europe
Support
Documentation, community forum, and enterprise support for paid plans
Getting Started
- 1
Install the SDK
Run `pip install modal` and authenticate via the CLI
- 2
Define a function
Decorate a Python function with the desired GPU and image specification
- 3
Run or deploy
Invoke locally or deploy as a long-lived endpoint or scheduled job