Kimi K2
Kimi K2 is Moonshot's flagship text model with 128K token context window, featuring tool calling capabilities and optimized for chat and code generation tasks.
API Pricing
Cheapest on Amazon AWS — 55% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.300 | $1.25 | 81.6 t/s | 856ms | 4/14/2026 | |
| $0.570 | $2.30 | 81.6 t/s | 856ms | 4/14/2026 | |
| $0.600 | $2.50 | 81.6 t/s | 856ms | 4/14/2026 | |
| $1.20 | $4.00 | 81.6 t/s | 856ms | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Moonshot
- Family
- Kimi
- Tier
- Flagship
- Context Window
- 128K
- Modalities
- Text
Capabilities
- Tool Calling
- Yes
- Open Source
- No
- Subtypes
- Chat Completion, Code Generation
- Aliases
- kimi-k2-instruct, kimi-k2-thinking, moonshotai-kimi-k2
Strengths & Limitations
- 128K token context window supports long document processing and extended conversations
- Tool calling capability enables structured API interactions and function execution
- 44.92 tokens per second output speed for responsive generation
- Code generation specialization alongside general chat completion
- Multiple model variants (instruct, thinking) for different use cases
- Flagship-tier capabilities from Moonshot for complex reasoning tasks
- 676ms time-to-first-token provides reasonable response latency
- Text-only modality with no image or multimodal input support
- Proprietary model with no open-source weights available
- 128K context window smaller than some competing flagship models
- Limited benchmark data compared to more established model families
- Newer model family with less ecosystem support than established alternatives
Key Features
About Kimi K2
Common Use Cases
Kimi K2 is designed for flagship-tier applications requiring sophisticated text processing and reasoning capabilities. Its 128K context window makes it suitable for long document analysis, extended coding sessions, and multi-turn conversations that require maintaining context over thousands of tokens. The tool calling functionality enables agentic workflows, API integrations, and structured data processing tasks. Code generation capabilities position it for software development assistance, while the thinking variant suggests optimization for reasoning-intensive applications like mathematical problem solving or complex analysis. The model targets enterprise developers and organizations needing reliable performance for production text processing workloads.
Frequently Asked Questions
How much does Kimi K2 cost per million tokens?
Kimi K2 pricing varies by provider and may differ between input and output tokens. Check the pricing table above for current rates across all available providers offering this model.
What is Kimi K2 best used for?
Kimi K2 excels at long-form text processing with its 128K context window, code generation tasks, and applications requiring tool calling capabilities. It's particularly suited for extended conversations, document analysis, software development assistance, and agentic workflows that need structured API interactions.
What's the difference between kimi-k2-instruct and kimi-k2-thinking variants?
While both are Kimi K2 variants, the specific differences aren't detailed in available specifications. The naming suggests kimi-k2-instruct may be optimized for following instructions and general chat, while kimi-k2-thinking could be specialized for reasoning and analytical tasks, but you should test both variants for your specific use case.