Together AI logo

Together AI

The AI Native Cloud

Inference specialist🇺🇸 USinferenceopen-sourcetraining

Last reviewed Mar 14, 2026

Together AI is the AI Native Cloud platform engineered for developers building with open-source and frontier AI models. They provide serverless inference, fine-tuning, and GPU clusters with industry-leading performance optimizations.

6
GPU Models
$1.05
From / hour
47
LLM Models
$0.05
From / 1M input

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: April 25, 2026

Pricing
GPU Model
Memory
GPUs
Price / hr
Updated
Source
A100 SXM80GB
1×2×4×8×
$2.59/hr
4/25/2026
B200192GB
1×2×4×8×
$9.95/hr
4/25/2026
H100 SXM80GB
1×2×4×8×
$3.99/hr
4/25/2026
H200141GB
1×2×4×8×
$5.49/hr
4/25/2026
L4040GB
1×2×4×8×
$1.49/hr
4/25/2026
L40S48GB
1×2×4×8×
$2.10/hr
4/25/2026

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: April 25, 2026

ModelCreatorContextInput/1MOutput/1MUpdated
OpenAI128K$0.050$0.2004/25/2026
NVIDIA131K$0.060$0.2504/25/2026
Meta128K$0.060$0.0604/25/2026
Google128K$0.060$0.1204/25/2026
Alibaba256K$0.100$0.1504/25/2026
Mistral32K$0.100$0.3004/25/2026
Alibaba128K$0.100$0.1004/17/2026
OpenAI128K$0.150$0.6004/25/2026
Alibaba262K$0.150$1.504/25/2026
Meta328K$0.180$0.5904/25/2026

Pros & Cons

Advantages

  • 3.5x faster inference and 2.3x faster training than alternatives
  • Competitive pricing with 50% batch API discount
  • Wide selection of 100+ open-source models
  • OpenAI-compatible APIs for easy migration
  • Research leadership with FlashAttention contributions
  • Global data center network across 25+ cities

Limitations

  • Primarily focused on open-source models
  • GPU cluster pricing requires custom quotes for reserved capacity
  • Smaller ecosystem compared to major cloud providers

Key Features

100+ Open-Source Models

Access to Llama, DeepSeek, Qwen, and other leading open-source models

Serverless Inference

Pay-per-token API with OpenAI-compatible endpoints

Fine-Tuning Platform

LoRA and full fine-tuning with proprietary optimizations

GPU Clusters

Instant self-service or reserved dedicated clusters with H100, H200, B200, GB200, GB300 access

Batch API

50% cost reduction for non-urgent inference workloads

Code Interpreter

Execute LLM-generated code in sandboxed environments

AI Factory

Custom infrastructure at frontier scale

Sandbox

Build development environments for AI

Managed Storage

Store model weights & data securely

Dedicated Inference

Deploy models on custom hardware with guaranteed performance

Evaluations

Measure model quality

Pricing Options

OptionDetails
Serverless pay-per-tokenPer-token pricing scales based on model size, from small open-source models to 405B parameter frontier models
Batch API50% discount for non-urgent inference workloads
Fine-tuningPer-token pricing for LoRA and full fine-tuning based on model size and dataset
GPU Clusters - On-demandHourly GPU pricing for instant self-service clusters
GPU Clusters - ReservedCustom pricing for reserved capacity with significant discounts for longer commitments
Dedicated InferenceSingle-tenant GPU instances with guaranteed performance

Availability & Support

Regions

Global data center network across 25+ cities with frontier hardware including GB300, GB200, B200, H200, H100

Support

Documentation, community Discord, email support, and expert support for reserved cluster customers

Getting Started

  1. 1

    Create an account

    Sign up at together.ai

  2. 2

    Get API key

    Generate an API key from your dashboard

  3. 3

    Choose a model

    Browse 100+ models for chat, code, images, video, and audio

  4. 4

    Make API calls

    Use OpenAI-compatible endpoints or Together SDK

Compare Providers

Find the best prices for the same GPUs from other providers

CoreWeave logo

CoreWeave

6 shared GPUs with Together AI

RunPod logo

RunPod

6 shared GPUs with Together AI

IO.NET logo

IO.NET

6 shared GPUs with Together AI