What is LLM inference pricing?

LLM inference pricing refers to the cost of using Large Language Model APIs. Providers charge per token, with separate rates for input tokens (your prompts) and output tokens (model responses). Output tokens typically cost 2-5x more than input tokens.

How do I calculate LLM API costs?

LLM API costs are calculated by multiplying your token usage by the price per million tokens. For example, if you use 1M input tokens at $3/M and 500K output tokens at $15/M, your total cost would be $3 + $7.50 = $10.50.

Which LLM provider is cheapest?

The cheapest LLM provider depends on the model you need. For open-source models like Llama, providers like Together AI, Fireworks AI, and DeepInfra offer competitive rates. For proprietary models like GPT-4 or Claude, you'll typically need to use OpenAI or Anthropic directly.

What is a context window in LLMs?

A context window is the maximum number of tokens an LLM can process in a single request. Larger context windows (128K-1M+ tokens) allow for longer conversations and document analysis but may cost more. Most frontier models now support at least 128K tokens.

LLM Pricing

Compare per-token rates across 18 providers and 217 LLM models

Updated daily

Top Picks Right Now

Click to filter table

Inputs

View

Showing 15 of 173 chat models · 419 records

		Modalities
GPT-OSS-120B 128K ctx	OpenAI	Aa	Jan 2025	$0.125/1M	$0.039 – $0.178	374 t/s	$0.0005	9
Llama 3.3 70B 128K ctx	Meta	Aa	Mar 2024	$0.620/1M	$0.100 – $1.04	-	-	8
GPT-OSS-20B 128K ctx	OpenAI	Aa	Jan 2025	$0.057/1M	$0.029 – $0.075	273 t/s	$0.0005	7
Kimi K2 128K ctx	Moonshot	Aa	—	$0.706/1M	$0.500 – $1.00	38.9 t/s	$0.059	7
GLM-5 128K ctx	Zhipu	Aa	—	$0.865/1M	$0.600 – $1.00	66.6 t/s	$0.029	6
Kimi K2.5 128K ctx	Moonshot	Aa	—	$0.511/1M	$0.400 – $0.600	-	-	6
GLM-4.7 128K ctx	Zhipu	Aa	—	$0.464/1M	$0.070 – $0.866	112 t/s	$0.0036	6
Llama 3.1 8B 128K ctx	Meta	Aa	Dec 2023	$0.118/1M	$0.020 – $0.220	-	-	6
MiniMax M2.7 205K ctx	MiniMax	Aa	—	$0.313/1M	$0.255 – $0.420	121 t/s	$0.0083	6
DeepSeek V4 Pro 1.0M ctx	DeepSeek	Aa	—	$1.42/1M	$0.435 – $2.10	65.4 t/s	$0.013	5
Nemotron Nano 9B v2 131K ctx	NVIDIA	Aa	—	$0.053/1M	$0.040 – $0.063	130 t/s	$0.0012	5
Mistral Small 32K ctx	Mistral	Aa	Sep 2023	$0.280/1M	$0.050 – $1.00	-	-	5
DeepSeek R1 64K ctx	DeepSeek	Aa	Nov 2024	$1.12/1M	$0.200 – $3.00	-	-	5
Qwen 3 Coder 30B 128K ctx	Alibaba	Aa	—	$0.250/1M	$0.070 – $0.500	103 t/s	$0.0026	5
Llama 4 Maverick 17B 128K ctx	Meta	Aa	—	$0.232/1M	$0.150 – $0.350	119 t/s	$0.0050	5

Understanding LLM Pricing

Input vs Output Tokens

LLM APIs charge separately for input tokens (your prompts) and output tokens (model responses). Output tokens are usually 2-5x more expensive than input tokens.

Context Windows

The context window determines how much text a model can process at once. Larger context windows allow for longer conversations and document analysis.

Open vs Proprietary Models

Open-source models (Llama, Mistral) are often cheaper but may require more tuning. Proprietary models (GPT-4, Claude) typically offer better out-of-box performance.

Batch Discounts

Many providers offer 50% discounts for batch/async API usage. Consider batch APIs for non-time-sensitive workloads to reduce costs.

LLM Pricing

Top Picks Right Now

Cheapest Option

Most Popular

Top Performance

Understanding LLM Pricing

Input vs Output Tokens

Context Windows

Open vs Proprietary Models

Batch Discounts

💡Top Picks Right Now

Cheapest Option

Most Popular

Top Performance

Understanding LLM Pricing

Input vs Output Tokens

Context Windows

Open vs Proprietary Models

Batch Discounts

Top Picks Right Now