LightweightMistral

Mistral Small

Mistral Small is Mistral's lightweight model optimized for speed and efficiency, featuring a 32K token context window and tool calling capabilities.

Context 32K
Tier Lightweight
Knowledge Sep 2023
Tools Supported
Input from
$0.050 / 1M tokens
across 7 providers

API Pricing

Cheapest on OpenRouter 83% below avg
ProviderInput / 1MOutput / 1MUpdated
$0.050$0.0804/14/2026
$0.075$0.2004/4/2026
$0.100$0.3004/1/2026
$0.100$0.3004/14/2026
$0.176$0.4104/13/2026
$0.400$0.4004/1/2026
$0.500$1.504/14/2026
$1.00$3.004/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Mistral
Family
Mistral
Tier
Lightweight
Context Window
32K
Knowledge Cutoff
Sep 2023
Modalities
Text

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion

Strengths & Limitations

  • Fast inference speed at 123.31 output tokens per second
  • Low latency with 337ms time to first token
  • 32,000 token context window for substantial document processing
  • Tool calling support for function execution
  • Optimized for efficiency while maintaining reasonable capabilities
  • Chat completion interface for conversational applications
  • Text-only modality with no image or multimodal support
  • Proprietary model with no open source weights available
  • Knowledge cutoff of September 2023 is older than some competing models
  • Positioned as lightweight tier with reduced capabilities compared to flagship models

Key Features

32,000 token context window
Tool calling with function execution
Chat completion interface
Streaming response support
Fast inference optimization
Text-based language processing
API-based deployment model

About Mistral Small

Mistral Small is a lightweight model developed by Mistral, positioned as the company's efficient option for applications requiring fast inference with reasonable capability. Within Mistral's model family, it serves as the speed-optimized tier below their more capable flagship offerings. The model supports text-based chat completion with a 32,000 token context window and includes tool calling functionality. It demonstrates strong inference speed with 123.31 output tokens per second and a 337ms time to first token, making it well-suited for applications where response latency matters. The model has a knowledge cutoff of September 2023. Mistral Small targets use cases where developers need reliable language model capabilities without the computational overhead of larger models. It competes with other lightweight offerings in the market by balancing performance with efficiency, making it suitable for production deployments requiring consistent speed.

Common Use Cases

Mistral Small is designed for applications requiring fast, efficient language processing where speed and cost-effectiveness take priority over maximum capability. Its combination of 32K context window and tool calling makes it suitable for customer service chatbots, content moderation, text classification, and basic coding assistance. The model's optimized inference speed makes it particularly valuable for high-volume production deployments, real-time applications, and scenarios where response latency directly impacts user experience. Its lightweight nature also makes it appropriate for developers building applications that need reliable language understanding without the computational costs of larger flagship models.

Frequently Asked Questions

How much does Mistral Small cost per million tokens?

Mistral Small pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Mistral Small best used for?

Mistral Small excels at high-volume applications requiring fast response times, such as customer service chatbots, text classification, content moderation, and basic coding assistance. Its 32K context window and tool calling capabilities make it suitable for document processing and function execution while maintaining efficient inference speeds.

How does Mistral Small compare to other lightweight models?

Mistral Small offers competitive inference speed at 123.31 tokens per second with a substantial 32K context window, which is larger than many lightweight alternatives. Its tool calling support and 337ms time to first token provide a balance of capability and efficiency, though it's limited to text-only processing unlike some multimodal lightweight options.