Skip to main content
fal.ai logo

fal.ai

Serverless inference platform optimized for generative media

Inference specialist🇺🇸 US

Last reviewed May 8, 2026

fal.ai is a serverless inference platform focused on generative media (image, video, audio) with a hosted model catalog and on-demand GPU runtimes for custom deployments.

GPU Models
6
From / hour
$0.99
LLM Models
3
From / 1M input

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: June 25, 2026

Pricing
GPU Model
Memory
GPUs
Price / hr
Updated
Source
A100 PCIE40GB
1×
$0.990/hr
6/7/2026
B200192GB
1×
$6.25/hr
6/25/2026
H100 SXM80GB
1×
$1.89/hr
6/20/2026
H200141GB
1×
$2.10/hr
6/20/2026
HGX B300288GB
1×
$8.50/hr
6/25/2026
RTX PRO 600096GB
1×
$2.99/hr
6/25/2026

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: June 24, 2026

ModelCreatorContextInput/1MOutput/1MUpdated
Alibaba$0.020/MP-6/24/2026
Alibaba$0.020/MP-6/19/2026
Black Forest Labs$0.040/img-6/24/2026

Pros & Cons

Advantages

  • Strong catalog of generative media models behind a single API
  • Per-second billing for serverless GPU deployments
  • Specialized inference optimizations for diffusion and audio

Limitations

  • Less suited to long-running fixed-instance training
  • B200 access requires sales engagement
  • Lower-tier consumer GPUs are not part of the catalog

Key Features

Hosted Model Catalog

Production endpoints for image, video and audio models billed per call or per second

Custom GPU Deployments

Run private models on dedicated NVIDIA GPUs with autoscaling and scale-to-zero

Optimized Runtimes

Inference engines tuned for diffusion and audio workloads

Pricing Options

OptionDetails
Per-Call PricingHosted model endpoints billed per request or per generated unit
Per-Second GPU PricingCustom deployments billed per second of GPU runtime, with scale-to-zero
Enterprise ContractsVolume commitments and dedicated capacity for high-throughput customers

Availability & Support

Regions

Multi-region serverless infrastructure

Support

Documentation, community channels and enterprise support for paid customers

Getting Started

  1. 1

    Create an account

    Sign up and generate an API key

  2. 2

    Pick a hosted model or upload your own

    Choose from the catalog or define a custom GPU-backed deployment

  3. 3

    Call the API

    Invoke endpoints from any language using the REST or SDK clients

Compare Providers

Find the best prices for the same GPUs from other providers

RunPod logo

RunPod

6 shared GPUs with fal.ai

Nebius logo

Nebius

5 shared GPUs with fal.ai

CoreWeave logo

CoreWeave

5 shared GPUs with fal.ai