FlagshipMoonshot

Kimi K2

Name: Kimi K2
Availability: InStock
Author: Moonshot

Kimi K2 is Moonshot's flagship text model with 128K token context window, featuring tool calling capabilities and optimized for chat and code generation tasks.

Context 128K

Tier Flagship

Tools Supported

Input from

$0.300 / 1M tokens

across 7 providers

Compare Prices Model Page →

API Pricing

Cheapest on Amazon AWS — 54% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
Amazon AWSBatch	$0.300	$1.25	-	127 t/s	843ms	5/30/2026
Together AI	$0.500	$2.80	-	127 t/s	843ms	5/30/2026
IO.NET	$0.570	$2.30	$0.285	127 t/s	843ms	5/30/2026
OpenRouter	$0.570	$2.30	-	127 t/s	843ms	5/30/2026
Amazon AWS	$0.600	$2.50	-	127 t/s	843ms	5/30/2026
Deep Infra	$0.750	$3.50	$0.150	127 t/s	843ms	5/30/2026
Fireworks AI	$0.950	$4.00	-	127 t/s	843ms	5/29/2026
Groq	$1.00	$3.00	$0.500	127 t/s	843ms	5/29/2026

Prices updated daily. Last check: May 30, 2026

Model Details

General

Creator: Moonshot
Family: Kimi
Tier: Flagship
Context Window: 128K
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion, Code Generation
Aliases: kimi-k2-instruct, kimi-k2-thinking, moonshotai-kimi-k2

Strengths & Limitations

Strengths

128K token context window supports long document processing and extended conversations
Tool calling capability enables structured API interactions and function execution
44.92 tokens per second output speed for responsive generation
Code generation specialization alongside general chat completion
Multiple model variants (instruct, thinking) for different use cases
Flagship-tier capabilities from Moonshot for complex reasoning tasks
676ms time-to-first-token provides reasonable response latency

Limitations

Text-only modality with no image or multimodal input support
Proprietary model with no open-source weights available
128K context window smaller than some competing flagship models
Limited benchmark data compared to more established model families
Newer model family with less ecosystem support than established alternatives

Key Features

•128K token context window

•Tool calling with structured execution

•Chat completion interface

•Code generation capabilities

•Multiple model variants (instruct and thinking modes)

•Streaming response support

•API-based deployment

•Function calling integration

About Kimi K2

Kimi K2 is Moonshot's flagship model in the Kimi family, representing the company's most capable offering for text-based AI applications. Developed by the Chinese AI company Moonshot, this proprietary model sits at the top of their model hierarchy and targets enterprise and developer use cases requiring sophisticated language understanding. The model operates with a 128K token context window and supports both chat completion and code generation workflows. Kimi K2 includes tool calling functionality, enabling it to interact with external APIs and execute structured tasks. Performance benchmarks show the model generates approximately 45 tokens per second with a time-to-first-token latency of 676 milliseconds, indicating balanced throughput and responsiveness characteristics. Kimi K2 serves applications requiring extended context processing and structured interactions, positioning itself among flagship models from other providers. The model is available through multiple aliases including kimi-k2-instruct and kimi-k2-thinking, suggesting different operational modes or fine-tuning approaches for specific use cases.

Common Use Cases

Kimi K2 is designed for flagship-tier applications requiring sophisticated text processing and reasoning capabilities. Its 128K context window makes it suitable for long document analysis, extended coding sessions, and multi-turn conversations that require maintaining context over thousands of tokens. The tool calling functionality enables agentic workflows, API integrations, and structured data processing tasks. Code generation capabilities position it for software development assistance, while the thinking variant suggests optimization for reasoning-intensive applications like mathematical problem solving or complex analysis. The model targets enterprise developers and organizations needing reliable performance for production text processing workloads.

Frequently Asked Questions

How much does Kimi K2 cost per million tokens?

Kimi K2 pricing varies by provider and may differ between input and output tokens. Check the pricing table above for current rates across all available providers offering this model.

What is Kimi K2 best used for?

Kimi K2 excels at long-form text processing with its 128K context window, code generation tasks, and applications requiring tool calling capabilities. It's particularly suited for extended conversations, document analysis, software development assistance, and agentic workflows that need structured API interactions.

What's the difference between kimi-k2-instruct and kimi-k2-thinking variants?

While both are Kimi K2 variants, the specific differences aren't detailed in available specifications. The naming suggests kimi-k2-instruct may be optimized for following instructions and general chat, while kimi-k2-thinking could be specialized for reasoning and analytical tasks, but you should test both variants for your specific use case.