Mistral Small
Mistral Small is Mistral's lightweight model optimized for speed and efficiency, featuring a 32K token context window and tool calling capabilities.
API Pricing
Cheapest on OpenRouter — 83% below avg| Provider | Input / 1M | Output / 1M | Updated |
|---|---|---|---|
| $0.050 | $0.080 | 4/14/2026 | |
| $0.075 | $0.200 | 4/4/2026 | |
| $0.100 | $0.300 | 4/1/2026 | |
| $0.100 | $0.300 | 4/14/2026 | |
| $0.176 | $0.410 | 4/13/2026 | |
| $0.400 | $0.400 | 4/1/2026 | |
| $0.500 | $1.50 | 4/14/2026 | |
| $1.00 | $3.00 | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Mistral
- Family
- Mistral
- Tier
- Lightweight
- Context Window
- 32K
- Knowledge Cutoff
- Sep 2023
- Modalities
- Text
Capabilities
- Tool Calling
- Yes
- Open Source
- No
- Subtypes
- Chat Completion
Strengths & Limitations
- Fast inference speed at 123.31 output tokens per second
- Low latency with 337ms time to first token
- 32,000 token context window for substantial document processing
- Tool calling support for function execution
- Optimized for efficiency while maintaining reasonable capabilities
- Chat completion interface for conversational applications
- Text-only modality with no image or multimodal support
- Proprietary model with no open source weights available
- Knowledge cutoff of September 2023 is older than some competing models
- Positioned as lightweight tier with reduced capabilities compared to flagship models
Key Features
About Mistral Small
Common Use Cases
Mistral Small is designed for applications requiring fast, efficient language processing where speed and cost-effectiveness take priority over maximum capability. Its combination of 32K context window and tool calling makes it suitable for customer service chatbots, content moderation, text classification, and basic coding assistance. The model's optimized inference speed makes it particularly valuable for high-volume production deployments, real-time applications, and scenarios where response latency directly impacts user experience. Its lightweight nature also makes it appropriate for developers building applications that need reliable language understanding without the computational costs of larger flagship models.
Frequently Asked Questions
How much does Mistral Small cost per million tokens?
Mistral Small pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.
What is Mistral Small best used for?
Mistral Small excels at high-volume applications requiring fast response times, such as customer service chatbots, text classification, content moderation, and basic coding assistance. Its 32K context window and tool calling capabilities make it suitable for document processing and function execution while maintaining efficient inference speeds.
How does Mistral Small compare to other lightweight models?
Mistral Small offers competitive inference speed at 123.31 tokens per second with a substantial 32K context window, which is larger than many lightweight alternatives. Its tool calling support and 337ms time to first token provide a balance of capability and efficiency, though it's limited to text-only processing unlike some multimodal lightweight options.