Together AI

The AI Native Cloud

Inference specialist๐Ÿ‡บ๐Ÿ‡ธ USinferenceopen-sourcetraining

Together AI is the AI Native Cloud platform engineered for developers building with open-source and frontier AI models. They provide serverless inference, fine-tuning, and GPU clusters with industry-leading performance optimizations.

6
GPU Models
$2.10
From / hour

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: March 9, 2026

GPU Modelโ†‘
Memoryโ†‘
Price / hrโ†‘
A100 PCIE40GB$2.40/hr
A100 SXM80GB$2.40/hr
B200192GB$7.49/hr
B200192GB$5.49/hr
H10080GB$3.49/hr
H10080GB$2.69/hr
H200141GB$4.19/hr
H200141GB$3.19/hr
L40S48GB$2.10/hr

Pros & Cons

Advantages

  • 3.5x faster inference and 2.3x faster training than alternatives
  • Competitive pricing with 50% batch API discount
  • Wide selection of 100+ open-source models
  • OpenAI-compatible APIs for easy migration
  • Research leadership with FlashAttention contributions
  • Global data center network across 25+ cities

Limitations

  • Primarily focused on open-source models
  • GPU cluster pricing requires custom quotes for reserved capacity
  • Smaller ecosystem compared to major cloud providers

Key Features

100+ Open-Source Models

Access to Llama, DeepSeek, Qwen, and other leading open-source models

Serverless Inference

Pay-per-token API with OpenAI-compatible endpoints

Fine-Tuning Platform

LoRA and full fine-tuning with proprietary optimizations

GPU Clusters

Instant self-service or reserved dedicated clusters with H100, H200, B200 access

Batch API

50% cost reduction for non-urgent inference workloads

Code Interpreter

Execute LLM-generated code in sandboxed environments

Pricing Options

OptionDetails
Serverless pay-per-tokenStarting at $0.06/1M tokens for small models up to $3.50/1M for 405B models
Batch API50% discount for non-urgent inference workloads
Fine-tuning$0.48-$3.20 per 1M tokens depending on model size
GPU Clusters$2.20-$5.50/hour per GPU for instant clusters, custom pricing for reserved

Availability & Support

Regions

Global data center network across 25+ cities with frontier hardware including GB200, B200, H200, H100

Support

Documentation, community Discord, email support, and expert support for reserved cluster customers

Getting Started

  1. 1

    Create an account

    Sign up at together.ai

  2. 2

    Get API key

    Generate an API key from your dashboard

  3. 3

    Choose a model

    Browse 100+ models for chat, code, images, video, and audio

  4. 4

    Make API calls

    Use OpenAI-compatible endpoints or Together SDK

Compare Providers

Find the best prices for the same GPUs from other providers

RunPod logo

RunPod

6 shared GPUs with Together AI

Vast.ai logo

Vast.ai

6 shared GPUs with Together AI

Civo logo

Civo

6 shared GPUs with Together AI