
Replicate
Run open-source models at scale
Replicate is a platform for running machine learning models in the cloud, offering thousands of open-source models with simple API access and pay-per-use pricing.
Available GPUs
Hourly on-demand pricing. Click column headers to sort.
Prices last updated: January 28, 2026
Pros & Cons
Advantages
- Largest selection of open-source models on one platform
- Simple pay-per-prediction pricing with no minimum
- Easy deployment of custom models via Cog
- Active community contributing new models daily
Limitations
- Cold start latency for less popular models
- Pricing can be unpredictable for high-volume use
- Less optimized than specialized inference providers
Key Features
Vast Model Library
Access thousands of open-source models including LLMs, image generators, and more
Simple API
Consistent REST API across all models with webhooks for async processing
Custom Model Hosting
Deploy your own models using Cog containerization
Serverless Scaling
Automatic scaling with cold-start optimization
Pricing Options
| Option | Details |
|---|---|
| Pay-per-prediction | Charged per model run based on compute time and hardware |
| Free tier | Limited free predictions for new users |
Availability & Support
Regions
US-based infrastructure with global CDN
Support
Documentation, Discord community, email support
Getting Started
- 1
Create an account
Sign up at replicate.com with GitHub or email
- 2
Get API token
Copy your API token from account settings
- 3
Run a prediction
Use the API or Python client to run any model
Compare Providers
Find the best prices for the same GPUs from other providers