FlagshipMistral

Devstral 2 123B

Name: Devstral 2 123B
Availability: InStock
Author: Mistral

Devstral 2 123B is Mistral's flagship code-specialized model with 123 billion parameters, offering advanced code generation and chat capabilities with a 128K token context window.

Context 128K

Tier Flagship

Tools Supported

Input from

$0.400 / 1M tokens

across 2 providers

Compare Prices

API Pricing

Cheapest on Amazon AWS — 7% below avg

Provider	Input / 1M	Output / 1M	Speed	TTFT	Updated
Amazon AWS	$0.400	$2.00	67.7 t/s	627ms	7/13/2026
Scaleway	$0.457	$2.29	67.7 t/s	627ms	7/12/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

19.2 / 100

Coding

31.3 / 100

Math

36.7 / 100

Output Speed

67.7 t/s

Latency (TTFT)

627ms

Reasoning & Knowledge

MMLU-Pro76.2%
GPQA Diamond59.4%
Humanity's Last Exam3.6%

Coding

LiveCodeBench44.8%
SciCode33.1%

Math

AIME 202536.7%

Agentic & Tool Use

Terminal-Bench Hard18.9%
Terminal-Bench v2.130.3%
τ²-bench24.9%
τ-bench Banking10.3%

Instruction & Long Context

IFBench38.1%
Long-Context Reasoning30.0%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Mistral
Family: Devstral
Tier: Flagship
Context Window: 128K
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion, Code Generation
Aliases: devstral-2-123b-instruct-2512

Strengths & Limitations

Strengths

123 billion parameters provide substantial model capacity for complex coding tasks
Tool calling support enables integration with development environments and APIs
128K token context window accommodates large codebases and extended conversations
Generates 76.04 tokens per second for responsive code generation
Time to first token of 423ms provides quick response initiation
Specialized training for code generation and technical tasks
Flagship tier positioning within Mistral's Devstral family

Limitations

Proprietary model with no open source weights available
Limited to text-only interactions without image or multimodal support
Smaller context window compared to some competing frontier models
Higher computational requirements due to 123B parameter size
No batch processing capabilities mentioned in available specifications

Key Features

•123 billion parameter language model

•128,000 token context window

•Tool calling with structured output

•Chat completion API

•Code generation capabilities

•Streaming response support

•Multi-language programming support

•Technical documentation processing

About Devstral 2 123B

Devstral 2 123B is Mistral's flagship coding model, representing the company's most capable offering in the Devstral family. As a 123 billion parameter model, it sits at the top of Mistral's code-specialized lineup, designed specifically for complex programming tasks and technical conversations. The model is proprietary and available through API access. The model operates with a 128,000 token context window and supports both chat completion and code generation tasks. It includes tool calling capabilities, enabling integration with external systems and APIs. Performance benchmarks show the model generates tokens at 76.04 tokens per second with a time to first token of 423 milliseconds, indicating solid inference speed for a model of this scale. Devstral 2 123B targets professional development workflows where code quality and technical accuracy are priorities. Its large parameter count and specialized training position it for complex coding tasks that require understanding of multiple programming languages, large codebases, and intricate technical requirements.

Common Use Cases

Devstral 2 123B is designed for professional software development workflows requiring sophisticated code generation and technical problem-solving. Its 123 billion parameters and code specialization make it suitable for complex programming tasks like architectural planning, code review and optimization, debugging across large codebases, and generating production-quality code in multiple programming languages. The 128K context window supports analyzing substantial code repositories, while tool calling capabilities enable integration with IDEs, version control systems, and development APIs. Organizations use this flagship model for technical documentation generation, code migration projects, and building AI-powered developer tools where accuracy and technical depth are essential.

Frequently Asked Questions

How much does Devstral 2 123B cost per million tokens?

Devstral 2 123B pricing varies by provider and may include different rates for input and output tokens. Check the pricing table above for current rates across all available providers offering this model.

What is Devstral 2 123B best used for?

Devstral 2 123B excels at complex code generation, technical problem-solving, and professional development workflows. Its 123B parameters and code specialization make it ideal for architectural planning, multi-language code generation, large codebase analysis, and building sophisticated developer tools that require high technical accuracy.

How does Devstral 2 123B compare to other coding models?

Devstral 2 123B offers 123 billion parameters specifically trained for coding tasks, with tool calling support and a 128K context window. It generates 76.04 tokens per second with 423ms time to first token, providing a balance of model capability and inference speed for professional development use cases.