Loading Comparison
Fetching pricing data and provider information...
Loading Comparison
Fetching pricing data and provider information...
Compare GPU and LLM inference API pricing between Google Cloud and Together AI. Find the best rates for AI training, inference, and ML workloads.
Provider 1
Provider 2
| GPU Model ↑ | Google Cloud Price | Together AI Price | Price Diff ↕ | Sources |
|---|---|---|---|---|
A100 SXM 80GB VRAM • Together AI | Not Available | 2x GPU | — | |
A100 SXM 80GB VRAM • Not Available $1.30/hour 2x GPU configuration Updated: 4/18/2026 ★Best Price | ||||
B200 192GB VRAM • Together AI | Not Available | — | ||
B200 192GB VRAM • | ||||
H100 SXM 80GB VRAM • Together AI | Not Available | 2x GPU | — | |
H100 SXM 80GB VRAM • Not Available $2.00/hour 2x GPU configuration Updated: 4/18/2026 ★Best Price | ||||
H200 141GB VRAM • Together AI | Not Available | — | ||
H200 141GB VRAM • | ||||
L40 40GB VRAM • Together AI | Not Available | — | ||
L40 40GB VRAM • | ||||
L40S 48GB VRAM • Together AI | Not Available | — | ||
L40S 48GB VRAM • | ||||
Tesla T4 16GB VRAM • Google Cloud | Not Available | — | ||
Tesla T4 16GB VRAM • | ||||
Tesla V100 32GB VRAM • Google Cloud | Not Available | — | ||
Tesla V100 32GB VRAM • | ||||
A100 SXM 80GB VRAM • Together AI | Not Available | 2x GPU | — | |
A100 SXM 80GB VRAM • Not Available $1.30/hour 2x GPU configuration Updated: 4/18/2026 ★Best Price | ||||
B200 192GB VRAM • Together AI | Not Available | — | ||
B200 192GB VRAM • | ||||
H100 SXM 80GB VRAM • Together AI | Not Available | 2x GPU | — | |
H100 SXM 80GB VRAM • Not Available $2.00/hour 2x GPU configuration Updated: 4/18/2026 ★Best Price | ||||
H200 141GB VRAM • Together AI | Not Available | — | ||
H200 141GB VRAM • | ||||
L40 40GB VRAM • Together AI | Not Available | — | ||
L40 40GB VRAM • | ||||
L40S 48GB VRAM • Together AI | Not Available | — | ||
L40S 48GB VRAM • | ||||
Tesla T4 16GB VRAM • Google Cloud | Not Available | — | ||
Tesla T4 16GB VRAM • | ||||
Tesla V100 32GB VRAM • Google Cloud | Not Available | — | ||
Tesla V100 32GB VRAM • | ||||
Explore how these providers compare to other popular GPU cloud services
Compare Google Cloud with another leading provider
Compare Google Cloud with another leading provider
Compare Google Cloud with another leading provider
Compare Google Cloud with another leading provider
Compare Google Cloud with another leading provider
Compare Google Cloud with another leading provider
Scalable virtual machines with a wide range of machine types, including GPUs.
Managed Kubernetes service for deploying and managing containerized applications.
Event-driven serverless compute platform.
Fully managed serverless platform for containerized applications.
Unified ML platform for building, deploying, and managing ML models.
Short-lived compute instances at a significant discount, suitable for fault-tolerant workloads.
Access to Llama, DeepSeek, Qwen, and other leading open-source models
Pay-per-token API with OpenAI-compatible endpoints
LoRA and full fine-tuning with proprietary optimizations
Instant self-service or reserved dedicated clusters with H100, H200, B200, GB200, GB300 access
50% cost reduction for non-urgent inference workloads
Execute LLM-generated code in sandboxed environments
Offers customizable virtual machines running in Google's data centers.
Managed Kubernetes service for running containerized applications.
Serverless compute platform for running code in response to events.
Pay for compute capacity per hour or per second, with no long-term commitments.
Automatic discounts for running instances for a significant portion of the month.
Save up to 57% with a 1-year or 3-year commitment to a minimum level of resource usage.
Save up to 80% for fault-tolerant workloads that can be interrupted.
Per-token pricing scales based on model size, from small open-source models to 405B parameter frontier models
50% discount for non-urgent inference workloads
Per-token pricing for LoRA and full fine-tuning based on model size and dataset
Hourly GPU pricing for instant self-service clusters
Custom pricing for reserved capacity with significant discounts for longer commitments
Single-tenant GPU instances with guaranteed performance
Set up a project in the Google Cloud Console.
Set up a billing account to pay for resource usage.
Select Compute Engine, GKE, Cloud Functions, or Cloud Run based on your needs.
Launch a VM instance, configure a Kubernetes cluster, or deploy a function/application.
Use the Cloud Console, command-line tools, or APIs to manage your resources.
Sign up at together.ai
Generate an API key from your dashboard
Browse 100+ models for chat, code, images, video, and audio
Use OpenAI-compatible endpoints or Together SDK
40+ regions and 120+ zones worldwide.
Role-based (free), Standard, Enhanced and Premium support plans. Comprehensive documentation, community forums, and training resources.
Global data center network across 25+ cities with frontier hardware including GB300, GB200, B200, H200, H100
Documentation, community Discord, email support, and expert support for reserved cluster customers