LightweightMistral

Devstral Small

Devstral Small is Mistral's lightweight coding model optimized for fast code generation and completion tasks, with a 128K token context window.

Context 131K
Tier Lightweight
Input from
$0.100 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MUpdated
$0.100$0.3004/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Mistral
Family
Devstral
Tier
Lightweight
Context Window
131K
Modalities
Text

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Fast token generation at approximately 206 tokens per second
  • Quick response initiation with 393ms time to first token
  • Large 128K token context window for substantial code analysis
  • Optimized specifically for coding and development tasks
  • Lightweight architecture reduces computational requirements
  • Suitable for high-frequency coding assistance workflows
  • No tool calling or function execution capabilities
  • Proprietary model with no open source availability
  • Text-only modality without image or multimodal support
  • Lightweight tier may limit complex reasoning capabilities
  • Smaller model size compared to flagship coding models

Key Features

128K token context window
Text-based code generation and completion
Optimized inference for coding tasks
Fast token generation (206 tokens/second)
Quick response times (393ms TTFT)
Streaming response support
Multi-language programming support

About Devstral Small

Devstral Small is Mistral's lightweight coding-focused language model, positioned as the smaller variant in the Devstral family. This model is designed for developers who need efficient code generation and completion capabilities without the computational overhead of larger models. The model operates with a 128,000 token context window and focuses exclusively on text-based interactions. Performance benchmarks show it generates approximately 206 tokens per second with a time to first token of 393 milliseconds, indicating responsive generation speeds suitable for interactive coding workflows. As a proprietary model, the weights are not publicly available. Devstral Small targets use cases where speed and efficiency matter more than handling the most complex coding challenges. It serves developers working on standard programming tasks, code completion, and scenarios where quick turnaround times are prioritized over maximum capability.

Common Use Cases

Devstral Small is well-suited for developers who need efficient code completion, debugging assistance, and code generation for routine programming tasks. Its fast generation speed and large context window make it effective for analyzing substantial codebases, providing real-time coding suggestions in IDEs, and handling repetitive development workflows. The lightweight nature makes it practical for applications requiring frequent API calls or where cost efficiency is important, such as code completion plugins, automated code review assistance, or educational coding platforms where quick feedback is valued over handling the most complex algorithmic challenges.

Frequently Asked Questions

How much does Devstral Small cost per million tokens?

Devstral Small pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Devstral Small best used for?

Devstral Small excels at code completion, routine code generation, and debugging assistance where speed matters. Its 206 tokens/second generation rate and 128K context window make it ideal for IDE integrations, real-time coding suggestions, and analyzing substantial codebases efficiently.

How does Devstral Small compare to larger coding models?

Devstral Small prioritizes speed and efficiency over maximum capability. While larger models may handle more complex algorithmic challenges, Devstral Small's fast 393ms response time and high throughput make it better suited for interactive coding workflows and high-frequency assistance tasks.