fal.ai
Serverless inference platform optimized for generative media
Last reviewed May 8, 2026
fal.ai is a serverless inference platform focused on generative media (image, video, audio) with a hosted model catalog and on-demand GPU runtimes for custom deployments.
Available GPUs
Hourly on-demand pricing. Click column headers to sort.
Prices last updated: June 14, 2026
LLM API Pricing
Pay-per-token pricing. Prices shown per 1M tokens.
Prices last updated: June 16, 2026
| Model | Creator | Context | Input/1M | Output/1M | Updated |
|---|---|---|---|---|---|
| Alibaba | — | $0.020/MP | - | 6/16/2026 | |
| Black Forest Labs | — | $0.040/img | - | 6/16/2026 |
Pros & Cons
Advantages
- Strong catalog of generative media models behind a single API
- Per-second billing for serverless GPU deployments
- Specialized inference optimizations for diffusion and audio
Limitations
- Less suited to long-running fixed-instance training
- B200 access requires sales engagement
- Lower-tier consumer GPUs are not part of the catalog
Key Features
Hosted Model Catalog
Production endpoints for image, video and audio models billed per call or per second
Custom GPU Deployments
Run private models on dedicated NVIDIA GPUs with autoscaling and scale-to-zero
Optimized Runtimes
Inference engines tuned for diffusion and audio workloads
Pricing Options
| Option | Details |
|---|---|
| Per-Call Pricing | Hosted model endpoints billed per request or per generated unit |
| Per-Second GPU Pricing | Custom deployments billed per second of GPU runtime, with scale-to-zero |
| Enterprise Contracts | Volume commitments and dedicated capacity for high-throughput customers |
Availability & Support
Regions
Multi-region serverless infrastructure
Support
Documentation, community channels and enterprise support for paid customers
Getting Started
- 1
Create an account
Sign up and generate an API key
- 2
Pick a hosted model or upload your own
Choose from the catalog or define a custom GPU-backed deployment
- 3
Call the API
Invoke endpoints from any language using the REST or SDK clients
Compare Providers
Find the best prices for the same GPUs from other providers