FlagshipMistral

Pixtral Large

Name: Pixtral Large
Author: Mistral

Pixtral Large is Mistral's flagship multimodal model supporting text and image inputs with a 131K token context window.

Context 131K

Tier Flagship

Modalities text, image

Contact providers for pricing

Compare Prices

API Pricing

No pricing data available for this model at the moment.

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

8.1 / 100

Math

2.3 / 100

Reasoning & Knowledge

MMLU-Pro70.1%
GPQA Diamond50.5%
Humanity's Last Exam3.6%

Coding

LiveCodeBench26.1%
SciCode29.2%

Math

AIME 20252.3%
AIME7.0%
MATH-50071.4%

Agentic & Tool Use

τ²-bench36.5%

Instruction & Long Context

IFBench34.5%
Long-Context Reasoning10.3%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Mistral
Family: Pixtral
Tier: Flagship
Context Window: 131K
Modalities: Text, Image

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Flagship-tier capabilities from Mistral for complex multimodal tasks
131,072 token context window supports lengthy documents with images
Multimodal input support for both text and image analysis
52.68 tokens per second output speed for responsive inference
446ms time to first token provides quick response initiation
Positions Mistral competitively in the flagship multimodal model segment

Limitations

No tool calling support limits agentic applications
Proprietary model with no open source weights available
Limited to text and image modalities without audio or video support
Smaller context window than some competing flagship models

Key Features

•131,072 token context window

•Text and image input processing

•Multimodal understanding and reasoning

•52.68 tokens per second output generation

•446ms time to first token

•Flagship-tier model capabilities

•Streaming response support

•Cross-modal document analysis

About Pixtral Large

Pixtral Large is Mistral's flagship multimodal model, representing the company's most capable offering in the Pixtral family. As a top-tier model from Mistral, it positions the company competitively in the multimodal AI space alongside other major providers. The model supports both text and image inputs with a 131,072 token context window, enabling analysis of lengthy documents alongside visual content. Performance benchmarks show it generates 52.68 output tokens per second with a time to first token of 446 milliseconds, indicating solid inference speed for a flagship multimodal model. Pixtral Large targets applications requiring sophisticated understanding of both text and visual information, competing with other flagship multimodal models in the market. Organizations use it for complex document analysis, visual reasoning tasks, and applications where high-quality multimodal understanding justifies the computational cost of a flagship-tier model.

Common Use Cases

Pixtral Large serves applications requiring sophisticated multimodal understanding, particularly where both visual and textual analysis are critical. Its flagship-tier capabilities make it suitable for complex document processing involving charts, diagrams, and text, visual content moderation and analysis, multimodal research applications, and detailed image captioning or visual question answering. The 131K context window enables analysis of lengthy reports with embedded images or processing multiple images alongside extensive text. Organizations typically deploy it for high-value use cases where the superior multimodal reasoning capabilities justify the cost of a flagship model, rather than high-volume applications better suited for lighter alternatives.

Frequently Asked Questions

How much does Pixtral Large cost per million tokens?

Pixtral Large pricing varies by provider and may differ for input versus output tokens, as well as text versus image processing. Check the pricing table above for current rates across all providers offering this model.

What is Pixtral Large best used for?

Pixtral Large excels at complex multimodal tasks requiring analysis of both text and images, such as processing documents with charts and diagrams, visual content analysis, detailed image captioning, and applications where sophisticated cross-modal reasoning is needed. Its flagship-tier capabilities and 131K context window make it suitable for high-value applications rather than high-volume use cases.

Does Pixtral Large support tool calling or function execution?

No, Pixtral Large does not support tool calling or function execution capabilities. It focuses on multimodal understanding and generation tasks with text and image inputs, but cannot interact with external tools or APIs through structured function calls.