FlagshipMistral

Pixtral Large

Pixtral Large is Mistral's flagship multimodal model supporting text and image inputs with a 131K token context window.

Context 131K
Tier Flagship
Modalities text, image
Input from
$2.00 / 1M tokens
across 2 providers

API Pricing

ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$2.00$6.0059.0 t/s498ms4/14/2026
$2.00$6.0059.0 t/s498ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Mistral
Family
Pixtral
Tier
Flagship
Context Window
131K
Modalities
Text, Image

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Flagship-tier capabilities from Mistral for complex multimodal tasks
  • 131,072 token context window supports lengthy documents with images
  • Multimodal input support for both text and image analysis
  • 52.68 tokens per second output speed for responsive inference
  • 446ms time to first token provides quick response initiation
  • Positions Mistral competitively in the flagship multimodal model segment
  • No tool calling support limits agentic applications
  • Proprietary model with no open source weights available
  • Limited to text and image modalities without audio or video support
  • Smaller context window than some competing flagship models

Key Features

131,072 token context window
Text and image input processing
Multimodal understanding and reasoning
52.68 tokens per second output generation
446ms time to first token
Flagship-tier model capabilities
Streaming response support
Cross-modal document analysis

About Pixtral Large

Pixtral Large is Mistral's flagship multimodal model, representing the company's most capable offering in the Pixtral family. As a top-tier model from Mistral, it positions the company competitively in the multimodal AI space alongside other major providers. The model supports both text and image inputs with a 131,072 token context window, enabling analysis of lengthy documents alongside visual content. Performance benchmarks show it generates 52.68 output tokens per second with a time to first token of 446 milliseconds, indicating solid inference speed for a flagship multimodal model. Pixtral Large targets applications requiring sophisticated understanding of both text and visual information, competing with other flagship multimodal models in the market. Organizations use it for complex document analysis, visual reasoning tasks, and applications where high-quality multimodal understanding justifies the computational cost of a flagship-tier model.

Common Use Cases

Pixtral Large serves applications requiring sophisticated multimodal understanding, particularly where both visual and textual analysis are critical. Its flagship-tier capabilities make it suitable for complex document processing involving charts, diagrams, and text, visual content moderation and analysis, multimodal research applications, and detailed image captioning or visual question answering. The 131K context window enables analysis of lengthy reports with embedded images or processing multiple images alongside extensive text. Organizations typically deploy it for high-value use cases where the superior multimodal reasoning capabilities justify the cost of a flagship model, rather than high-volume applications better suited for lighter alternatives.

Frequently Asked Questions

How much does Pixtral Large cost per million tokens?

Pixtral Large pricing varies by provider and may differ for input versus output tokens, as well as text versus image processing. Check the pricing table above for current rates across all providers offering this model.

What is Pixtral Large best used for?

Pixtral Large excels at complex multimodal tasks requiring analysis of both text and images, such as processing documents with charts and diagrams, visual content analysis, detailed image captioning, and applications where sophisticated cross-modal reasoning is needed. Its flagship-tier capabilities and 131K context window make it suitable for high-value applications rather than high-volume use cases.

Does Pixtral Large support tool calling or function execution?

No, Pixtral Large does not support tool calling or function execution capabilities. It focuses on multimodal understanding and generation tasks with text and image inputs, but cannot interact with external tools or APIs through structured function calls.