LightweightMistral

Voxtral Small 24B

Voxtral Small 24B is Mistral's lightweight multimodal model that processes both text and audio with a 32K token context window.

Context 32K
Tier Lightweight
Modalities text, audio
Input from
$0.100 / 1M tokens
across 2 providers

API Pricing

Cheapest on OpenRouter 27% below avg
ProviderInput / 1MOutput / 1MUpdated
$0.100$0.3004/14/2026
$0.176$0.4104/13/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Mistral
Family
Voxtral
Tier
Lightweight
Context Window
32K
Modalities
Text, Audio

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Supports both text and audio input modalities
  • 32,000 token context window for substantial input capacity
  • 24B parameter size balances capability with efficiency
  • Lightweight tier positioning for cost-effective deployment
  • Part of Mistral's Voxtral family with consistent API interfaces
  • Enables audio transcription and analysis workflows
  • Suitable for high-volume multimodal processing
  • No tool calling or function execution support
  • Proprietary model with weights not publicly available
  • Smaller parameter count than flagship multimodal alternatives
  • Limited to text and audio modalities only
  • Lightweight tier may have reduced reasoning capabilities compared to larger models

Key Features

Text and audio input processing
32,000 token context window
24 billion parameter architecture
Streaming response support
Cross-modal understanding between text and audio
Audio transcription capabilities
Batch processing support
RESTful API access

About Voxtral Small 24B

Voxtral Small 24B is Mistral's lightweight entry in the Voxtral family, designed to handle both text and audio processing tasks. As a 24 billion parameter model, it sits in the lightweight tier of Mistral's model lineup, offering multimodal capabilities at a smaller scale than flagship alternatives. The model features a 32,000 token context window and supports both text and audio inputs, enabling applications that require understanding of spoken content alongside written text. This dual-modality approach allows for transcription, audio analysis, and cross-modal reasoning tasks without requiring separate specialized models. Voxtral Small 24B targets use cases where audio processing is needed but computational efficiency and cost control are priorities. It provides an alternative to larger multimodal models when the specific combination of text and audio understanding is required at scale.

Common Use Cases

Voxtral Small 24B is well-suited for applications requiring efficient audio and text processing at scale. Common use cases include audio transcription services, voice-to-text applications, podcast analysis, customer service call processing, and content moderation for audio platforms. Its lightweight architecture makes it appropriate for high-volume scenarios where audio understanding is needed but computational budgets are constrained. The model works well for building voice interfaces, analyzing recorded meetings, and processing multimedia content where both spoken and written elements need to be understood together.

Frequently Asked Questions

How much does Voxtral Small 24B cost per million tokens?

Voxtral Small 24B pricing varies by provider and usage patterns. As a lightweight multimodal model, costs will differ for text versus audio processing. Check the pricing table above for current rates across all providers offering this model.

What is Voxtral Small 24B best used for?

Voxtral Small 24B excels at audio transcription, voice analysis, and applications requiring both text and audio understanding. Its lightweight design makes it ideal for high-volume scenarios like customer service call analysis, podcast processing, and voice interface development where efficiency is important.

Does Voxtral Small 24B support tool calling or function execution?

No, Voxtral Small 24B does not support tool calling or function execution capabilities. It focuses specifically on text and audio processing tasks. If you need function calling alongside multimodal capabilities, you would need to consider other models in Mistral's lineup or alternative providers.