
Together AI
The AI Native Cloud
Last reviewed Mar 14, 2026
Together AI is the AI Native Cloud platform engineered for developers building with open-source and frontier AI models. They provide serverless inference, fine-tuning, and GPU clusters with industry-leading performance optimizations.
Available GPUs
Hourly on-demand pricing. Click column headers to sort.
Prices last updated: April 25, 2026
LLM API Pricing
Pay-per-token pricing. Prices shown per 1M tokens.
Prices last updated: April 25, 2026
| Model | Creator | Context | Input/1M | Output/1M | Updated |
|---|---|---|---|---|---|
| OpenAI | 128K | $0.050 | $0.200 | 4/25/2026 | |
| NVIDIA | 131K | $0.060 | $0.250 | 4/25/2026 | |
| Meta | 128K | $0.060 | $0.060 | 4/25/2026 | |
| 128K | $0.060 | $0.120 | 4/25/2026 | ||
| Alibaba | 256K | $0.100 | $0.150 | 4/25/2026 | |
| Mistral | 32K | $0.100 | $0.300 | 4/25/2026 | |
| Alibaba | 128K | $0.100 | $0.100 | 4/17/2026 | |
| OpenAI | 128K | $0.150 | $0.600 | 4/25/2026 | |
| Alibaba | 262K | $0.150 | $1.50 | 4/25/2026 | |
| Meta | 328K | $0.180 | $0.590 | 4/25/2026 | |
Pros & Cons
Advantages
- 3.5x faster inference and 2.3x faster training than alternatives
- Competitive pricing with 50% batch API discount
- Wide selection of 100+ open-source models
- OpenAI-compatible APIs for easy migration
- Research leadership with FlashAttention contributions
- Global data center network across 25+ cities
Limitations
- Primarily focused on open-source models
- GPU cluster pricing requires custom quotes for reserved capacity
- Smaller ecosystem compared to major cloud providers
Key Features
100+ Open-Source Models
Access to Llama, DeepSeek, Qwen, and other leading open-source models
Serverless Inference
Pay-per-token API with OpenAI-compatible endpoints
Fine-Tuning Platform
LoRA and full fine-tuning with proprietary optimizations
GPU Clusters
Instant self-service or reserved dedicated clusters with H100, H200, B200, GB200, GB300 access
Batch API
50% cost reduction for non-urgent inference workloads
Code Interpreter
Execute LLM-generated code in sandboxed environments
AI Factory
Custom infrastructure at frontier scale
Sandbox
Build development environments for AI
Managed Storage
Store model weights & data securely
Dedicated Inference
Deploy models on custom hardware with guaranteed performance
Evaluations
Measure model quality
Pricing Options
| Option | Details |
|---|---|
| Serverless pay-per-token | Per-token pricing scales based on model size, from small open-source models to 405B parameter frontier models |
| Batch API | 50% discount for non-urgent inference workloads |
| Fine-tuning | Per-token pricing for LoRA and full fine-tuning based on model size and dataset |
| GPU Clusters - On-demand | Hourly GPU pricing for instant self-service clusters |
| GPU Clusters - Reserved | Custom pricing for reserved capacity with significant discounts for longer commitments |
| Dedicated Inference | Single-tenant GPU instances with guaranteed performance |
Availability & Support
Regions
Global data center network across 25+ cities with frontier hardware including GB300, GB200, B200, H200, H100
Support
Documentation, community Discord, email support, and expert support for reserved cluster customers
Getting Started
- 1
Create an account
Sign up at together.ai
- 2
Get API key
Generate an API key from your dashboard
- 3
Choose a model
Browse 100+ models for chat, code, images, video, and audio
- 4
Make API calls
Use OpenAI-compatible endpoints or Together SDK
Compare Providers
Find the best prices for the same GPUs from other providers