FlagshipMoonshot

Kimi K2

Kimi K2 is Moonshot's flagship text model with 128K token context window, featuring tool calling capabilities and optimized for chat and code generation tasks.

Context 128K
Tier Flagship
Tools Supported
Input from
$0.300 / 1M tokens
across 3 providers

API Pricing

Cheapest on Amazon AWS 55% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.300$1.2581.6 t/s856ms4/14/2026
$0.570$2.3081.6 t/s856ms4/14/2026
$0.600$2.5081.6 t/s856ms4/14/2026
$1.20$4.0081.6 t/s856ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Moonshot
Family
Kimi
Tier
Flagship
Context Window
128K
Modalities
Text

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion, Code Generation
Aliases
kimi-k2-instruct, kimi-k2-thinking, moonshotai-kimi-k2

Strengths & Limitations

  • 128K token context window supports long document processing and extended conversations
  • Tool calling capability enables structured API interactions and function execution
  • 44.92 tokens per second output speed for responsive generation
  • Code generation specialization alongside general chat completion
  • Multiple model variants (instruct, thinking) for different use cases
  • Flagship-tier capabilities from Moonshot for complex reasoning tasks
  • 676ms time-to-first-token provides reasonable response latency
  • Text-only modality with no image or multimodal input support
  • Proprietary model with no open-source weights available
  • 128K context window smaller than some competing flagship models
  • Limited benchmark data compared to more established model families
  • Newer model family with less ecosystem support than established alternatives

Key Features

128K token context window
Tool calling with structured execution
Chat completion interface
Code generation capabilities
Multiple model variants (instruct and thinking modes)
Streaming response support
API-based deployment
Function calling integration

About Kimi K2

Kimi K2 is Moonshot's flagship model in the Kimi family, representing the company's most capable offering for text-based AI applications. Developed by the Chinese AI company Moonshot, this proprietary model sits at the top of their model hierarchy and targets enterprise and developer use cases requiring sophisticated language understanding. The model operates with a 128K token context window and supports both chat completion and code generation workflows. Kimi K2 includes tool calling functionality, enabling it to interact with external APIs and execute structured tasks. Performance benchmarks show the model generates approximately 45 tokens per second with a time-to-first-token latency of 676 milliseconds, indicating balanced throughput and responsiveness characteristics. Kimi K2 serves applications requiring extended context processing and structured interactions, positioning itself among flagship models from other providers. The model is available through multiple aliases including kimi-k2-instruct and kimi-k2-thinking, suggesting different operational modes or fine-tuning approaches for specific use cases.

Common Use Cases

Kimi K2 is designed for flagship-tier applications requiring sophisticated text processing and reasoning capabilities. Its 128K context window makes it suitable for long document analysis, extended coding sessions, and multi-turn conversations that require maintaining context over thousands of tokens. The tool calling functionality enables agentic workflows, API integrations, and structured data processing tasks. Code generation capabilities position it for software development assistance, while the thinking variant suggests optimization for reasoning-intensive applications like mathematical problem solving or complex analysis. The model targets enterprise developers and organizations needing reliable performance for production text processing workloads.

Frequently Asked Questions

How much does Kimi K2 cost per million tokens?

Kimi K2 pricing varies by provider and may differ between input and output tokens. Check the pricing table above for current rates across all available providers offering this model.

What is Kimi K2 best used for?

Kimi K2 excels at long-form text processing with its 128K context window, code generation tasks, and applications requiring tool calling capabilities. It's particularly suited for extended conversations, document analysis, software development assistance, and agentic workflows that need structured API interactions.

What's the difference between kimi-k2-instruct and kimi-k2-thinking variants?

While both are Kimi K2 variants, the specific differences aren't detailed in available specifications. The naming suggests kimi-k2-instruct may be optimized for following instructions and general chat, while kimi-k2-thinking could be specialized for reasoning and analytical tasks, but you should test both variants for your specific use case.