What is LLM inference pricing?

LLM inference pricing refers to the cost of using Large Language Model APIs. Providers charge per token, with separate rates for input tokens (your prompts) and output tokens (model responses). Output tokens typically cost 2-5x more than input tokens.

How do I calculate LLM API costs?

LLM API costs are calculated by multiplying your token usage by the price per million tokens. For example, if you use 1M input tokens at $3/M and 500K output tokens at $15/M, your total cost would be $3 + $7.50 = $10.50.

Which LLM provider is cheapest?

The cheapest LLM provider depends on the model you need. For open-source models like Llama, providers like Together AI, Fireworks AI, and DeepInfra offer competitive rates. For proprietary models like GPT-4 or Claude, you'll typically need to use OpenAI or Anthropic directly.

What is a context window in LLMs?

A context window is the maximum number of tokens an LLM can process in a single request. Larger context windows (128K-1M+ tokens) allow for longer conversations and document analysis but may cost more. Most frontier models now support at least 128K tokens.

LLM Pricing

Compare per-token rates across 18 providers and 214 LLM models

Updated daily

Top Picks Right Now

Click to filter table

Price Trends

Moving average across providers (per 1M tokens) Input Output

Cheapest Option

BAAI bge-large-en-v1.5

$0.0080 / $0.0080per 1M tokens

Providers: 1

			Modalities
	Llama 3.1 8B 128K ctxDec 2023	Meta	Aa	$0.118/1M	$0.020 – $0.220	-	-	6

Understanding LLM Pricing

Input vs Output Tokens

LLM APIs charge separately for input tokens (your prompts) and output tokens (model responses). Output tokens are usually 2-5x more expensive than input tokens.

Context Windows

The context window determines how much text a model can process at once. Larger context windows allow for longer conversations and document analysis.

Open vs Proprietary Models

Open-source models (Llama, Mistral) are often cheaper but may require more tuning. Proprietary models (GPT-4, Claude) typically offer better out-of-box performance.

Batch Discounts

Many providers offer 50% discounts for batch/async API usage. Consider batch APIs for non-time-sensitive workloads to reduce costs.

LLM Pricing

Top Picks Right Now

Cheapest Option

Most Popular

Top Performance

Price Trends

Understanding LLM Pricing

Input vs Output Tokens

Context Windows

Open vs Proprietary Models

Batch Discounts

💡Top Picks Right Now

Cheapest Option

Most Popular

Top Performance

Price Trends

Understanding LLM Pricing

Input vs Output Tokens

Context Windows

Open vs Proprietary Models

Batch Discounts

Top Picks Right Now