Skip to main content
Replicate logo

Replicate

Run open-source models at scale

Model marketplace🇺🇸 USinferencemodelsmarketplace

Last reviewed Mar 14, 2026

Replicate is a platform for running machine learning models in the cloud, offering thousands of open-source models with simple API access and pay-per-use pricing.

4
GPU Models
$0.81
From / hour
2
LLM Models
$3.00
From / 1M input

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: April 16, 2026

GPU Model
Memory
GPUs
vCPUs
RAM
Price / hr
Updated
Source
A100 SXM80GB
1×2×4×8×
10144 GB
$5.04/hr
4/16/2026
H100 SXM80GB
1×2×4×8×
1372 GB
$5.49/hr
4/16/2026
L40S48GB
1×2×4×8×
1065 GB
$3.51/hr
4/16/2026
Tesla T416GB
1×
416 GB
$0.81/hr
4/16/2026

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: April 6, 2026

ModelCreatorContextInput/1MOutput/1MUpdated
Anthropic200K$3.00$0.0154/6/2026
DeepSeek64K$3.75$0.0103/31/2026

Pros & Cons

Advantages

  • Largest selection of open-source models on one platform
  • Simple pay-per-prediction pricing with no minimum
  • Easy deployment of custom models via Cog
  • Active community contributing new models daily

Limitations

  • Cold start latency for less popular models
  • Pricing can be unpredictable for high-volume use
  • Less optimized than specialized inference providers

Key Features

Vast Model Library

Access thousands of open-source models including LLMs, image generators, and more

Simple API

Consistent REST API across all models with webhooks for async processing

Custom Model Hosting

Deploy your own models using Cog containerization

Serverless Scaling

Automatic scaling with cold-start optimization

Pricing Options

OptionDetails
Pay-per-predictionCharged per model run based on compute time and hardware
Free tierLimited free predictions for new users

Availability & Support

Regions

US-based infrastructure with global CDN

Support

Documentation, Discord community, email support

Getting Started

  1. 1

    Create an account

    Sign up at replicate.com with GitHub or email

  2. 2

    Get API token

    Copy your API token from account settings

  3. 3

    Run a prediction

    Use the API or Python client to run any model

Compare Providers

Find the best prices for the same GPUs from other providers

Amazon AWS logo

Amazon AWS

4 shared GPUs with Replicate

IO.NET logo

IO.NET

4 shared GPUs with Replicate

Microsoft Azure logo

Microsoft Azure

3 shared GPUs with Replicate