LightweightMistral

Devstral Small

Name: Devstral Small
Availability: InStock
Author: Mistral

Devstral Small is Mistral's lightweight coding model optimized for fast code generation and completion tasks, with a 128K token context window.

Context 131K

Tier Lightweight

Input from

$0.100 / 1M tokens

across 1 provider

Compare Prices

API Pricing

Provider	Input / 1M	Output / 1M	Cached / 1M	Updated
OpenRouter	$0.100	$0.300	$0.010	5/28/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Mistral
Family: Devstral
Tier: Lightweight
Context Window: 131K
Modalities: Text

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Fast token generation at approximately 206 tokens per second
Quick response initiation with 393ms time to first token
Large 128K token context window for substantial code analysis
Optimized specifically for coding and development tasks
Lightweight architecture reduces computational requirements
Suitable for high-frequency coding assistance workflows

Limitations

No tool calling or function execution capabilities
Proprietary model with no open source availability
Text-only modality without image or multimodal support
Lightweight tier may limit complex reasoning capabilities
Smaller model size compared to flagship coding models

Key Features

•128K token context window

•Text-based code generation and completion

•Optimized inference for coding tasks

•Fast token generation (206 tokens/second)

•Quick response times (393ms TTFT)

•Streaming response support

•Multi-language programming support

About Devstral Small

Devstral Small is Mistral's lightweight coding-focused language model, positioned as the smaller variant in the Devstral family. This model is designed for developers who need efficient code generation and completion capabilities without the computational overhead of larger models. The model operates with a 128,000 token context window and focuses exclusively on text-based interactions. Performance benchmarks show it generates approximately 206 tokens per second with a time to first token of 393 milliseconds, indicating responsive generation speeds suitable for interactive coding workflows. As a proprietary model, the weights are not publicly available. Devstral Small targets use cases where speed and efficiency matter more than handling the most complex coding challenges. It serves developers working on standard programming tasks, code completion, and scenarios where quick turnaround times are prioritized over maximum capability.

Common Use Cases

Devstral Small is well-suited for developers who need efficient code completion, debugging assistance, and code generation for routine programming tasks. Its fast generation speed and large context window make it effective for analyzing substantial codebases, providing real-time coding suggestions in IDEs, and handling repetitive development workflows. The lightweight nature makes it practical for applications requiring frequent API calls or where cost efficiency is important, such as code completion plugins, automated code review assistance, or educational coding platforms where quick feedback is valued over handling the most complex algorithmic challenges.

Frequently Asked Questions

How much does Devstral Small cost per million tokens?

Devstral Small pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Devstral Small best used for?

Devstral Small excels at code completion, routine code generation, and debugging assistance where speed matters. Its 206 tokens/second generation rate and 128K context window make it ideal for IDE integrations, real-time coding suggestions, and analyzing substantial codebases efficiently.

How does Devstral Small compare to larger coding models?

Devstral Small prioritizes speed and efficiency over maximum capability. While larger models may handle more complex algorithmic challenges, Devstral Small's fast 393ms response time and high throughput make it better suited for interactive coding workflows and high-frequency assistance tasks.