FlagshipMistral

Mistral Medium 3

Mistral Medium 3 is Mistral's flagship multimodal model supporting text and image inputs with a 131K token context window.

Context 131K
Tier Flagship
Modalities text, image
Input from
$0.400 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.400$2.0079.8 t/s479ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Mistral
Family
Mistral
Tier
Flagship
Context Window
131K
Modalities
Text, Image

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • 131,072 token context window supports processing of lengthy documents
  • Multimodal capabilities handle both text and image inputs
  • Flagship-tier model with advanced reasoning capabilities
  • 50.62 tokens per second generation speed for responsive applications
  • 435ms time to first token provides quick response initiation
  • European-developed model offering data sovereignty options
  • Designed for complex reasoning and analysis tasks
  • No tool calling or function execution capabilities
  • Proprietary model with no open source weights available
  • Limited to text and image modalities without audio or video support
  • Smaller context window compared to some competing flagship models

Key Features

131,072 token context window
Text input and generation
Image input processing
Multimodal understanding
Streaming response support
API-only access
Flagship-tier reasoning capabilities
European AI development

About Mistral Medium 3

Mistral Medium 3 is the flagship model from French AI company Mistral, representing their most capable offering in the Mistral family. As a flagship-tier model, it sits at the top of Mistral's model lineup, designed for demanding applications that require sophisticated reasoning and multimodal understanding. The model supports both text and image inputs with a 131,072 token context window, enabling it to process lengthy documents and conversations while maintaining context. Performance benchmarks show it generates tokens at 50.62 tokens per second with a time to first token of 435 milliseconds. Notably, the model does not include tool calling capabilities, focusing instead on direct text generation and image understanding tasks. Mistral Medium 3 competes in the flagship model category alongside other leading multimodal models, offering organizations a European-developed alternative for complex reasoning tasks. The model is proprietary and available only through API access, positioning it for enterprise and research applications that require high-quality multimodal processing.

Common Use Cases

Mistral Medium 3 is designed for sophisticated applications requiring multimodal understanding and complex reasoning. Its combination of text and image processing makes it suitable for document analysis, research assistance, content analysis involving visual elements, and advanced chatbot applications. The 131K context window enables analysis of lengthy reports, academic papers, and extended conversations. Organizations requiring European-developed AI solutions may prefer this model for data sovereignty considerations. The flagship positioning makes it appropriate for demanding enterprise applications, research projects, and scenarios where high-quality reasoning is prioritized over tool integration.

Frequently Asked Questions

How much does Mistral Medium 3 cost per million tokens?

Mistral Medium 3 pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Mistral Medium 3 best used for?

Mistral Medium 3 excels at complex reasoning tasks involving both text and images, such as document analysis, research assistance, and multimodal content understanding. Its 131K context window makes it particularly suitable for processing lengthy documents and maintaining extended conversations.

Does Mistral Medium 3 support tool calling or function execution?

No, Mistral Medium 3 does not include tool calling capabilities. It focuses on direct text generation and multimodal understanding rather than external tool integration. For applications requiring function calling, consider other models in the Mistral family or alternative providers.