Skip to main content
Fireworks AI logo

Fireworks AI

The fastest platform for open-source AI

Inference specialist🇺🇸 USinferenceopen-sourcefast

Last reviewed Mar 14, 2026

Fireworks AI is a high-performance inference platform specializing in open-source generative AI models. Founded by former Meta PyTorch team members, they provide the fastest inference engine for LLMs, vision, and audio models with enterprise-grade reliability.

25
LLM Models
$0.07
From / 1M input

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: June 11, 2026

ModelCreatorContextInput/1MOutput/1MUpdated
Segmind$0.0001/unit-5/21/2026
Playground$0.0001/unit-5/20/2026
Stability AI$0.0001/unit-5/20/2026
Playground$0.0001/unit-5/20/2026
Black Forest Labs$0.0004/unit-5/20/2026
Black Forest Labs$0.0005/unit-5/20/2026
OpenAI$0.0009/min-5/20/2026
Black Forest Labs$0.040/img-5/20/2026
OpenAI128K$0.070$0.3006/11/2026
Black Forest Labs$0.080/img-5/20/2026

Pros & Cons

Advantages

  • Lightning-fast inference with industry-leading response times
  • Easy-to-use API with excellent OpenAI compatibility
  • Wide variety of optimized open-source models
  • Competitive pricing with 50% off cached tokens and batch processing
  • Enterprise reliability with 99.99% uptime SLA
  • Up to 100 fine-tuned models deployable without extra costs

Limitations

  • Limited capacity with some serverless model limits
  • Primarily focused on language models over image/video generation
  • BYOC only available for major enterprise customers
  • Feature-rich interface can have steep learning curve

Key Features

100+ Open-Source Models

Instant access to latest models like Kimi K2.5, DeepSeek V3.2, GLM-5.1, Qwen3.6 Plus, FLUX.1 Kontext Pro, Whisper V3 Large, and more

Blazing Fast Inference

Industry-leading throughput and latency with fast inference engine

Fine-Tuning Suite

SFT, DPO, and reinforcement fine-tuning of models up to 1T+ parameters with LoRA efficiency

OpenAI-Compatible API

Drop-in replacement - just change the base URL for easy migration

On-Demand GPUs

H100, H200, B200, and B300 deployments with per-second billing and autoscaling

Batch Processing

50% discount for async bulk inference workloads

Complete Model Lifecycle

Build, tune, and scale AI models without managing infrastructure

Pricing Options

OptionDetails
Serverless InferencePay-per-token pricing with parameter-based tiers from $0.10 to $0.90 per 1M tokens, plus premium models
Cached tokens50% discount on cached input tokens for supported models
Batch processing50% discount on async bulk inference for both input and output tokens
Fine TuningPer-training-token pricing for SFT, DPO, and reinforcement learning with LoRA and full parameter options
On-demand GPUsPer-second billing for H100, H200, B200, and B300 GPU deployments with no startup charges

Availability & Support

Regions

18+ global regions across 8 cloud providers with multi-region deployments and BYOC support for enterprise

Support

Documentation, Discord community, status page, email support, and dedicated enterprise support with SLAs

Getting Started

  1. 1

    Explore Model Library

    Browse 400+ models at fireworks.ai/models

  2. 2

    Test in Playground

    Experiment with prompts interactively without coding

  3. 3

    Generate API Key

    Create an API key from user settings in your account

  4. 4

    Make first API call

    Use OpenAI-compatible endpoints or Fireworks SDK

  5. 5

    Scale to production

    Transition to on-demand GPU deployments for production workloads