LightweightMistral

Voxtral Small 24B

Name: Voxtral Small 24B
Availability: InStock
Author: Mistral

Voxtral Small 24B is Mistral's lightweight multimodal model that processes both text and audio with a 32K token context window.

Context 32K

Tier Lightweight

Modalities text, audio

Input from

$0.050 / 1M tokens

across 3 providers

Compare Prices

API Pricing

Cheapest on Amazon AWS — 53% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Updated
Amazon AWSBatch	$0.050	$0.150	-	7/13/2026
OpenRouter	$0.100	$0.300	$0.010	7/13/2026
Amazon AWS	$0.100	$0.300	-	7/13/2026
Scaleway	$0.172	$0.400	-	7/12/2026

Prices updated daily. Last check: Jul 13, 2026

Model Details

General

Creator: Mistral
Family: Voxtral
Tier: Lightweight
Context Window: 32K
Modalities: Text, Audio

Capabilities

Tool Calling: No
Open Source: No
Aliases: Voxtral Small 1.0, Voxtral Small

Strengths & Limitations

Strengths

Supports both text and audio input modalities
32,000 token context window for substantial input capacity
24B parameter size balances capability with efficiency
Lightweight tier positioning for cost-effective deployment
Part of Mistral's Voxtral family with consistent API interfaces
Enables audio transcription and analysis workflows
Suitable for high-volume multimodal processing

Limitations

No tool calling or function execution support
Proprietary model with weights not publicly available
Smaller parameter count than flagship multimodal alternatives
Limited to text and audio modalities only
Lightweight tier may have reduced reasoning capabilities compared to larger models

Key Features

•Text and audio input processing

•32,000 token context window

•24 billion parameter architecture

•Streaming response support

•Cross-modal understanding between text and audio

•Audio transcription capabilities

•Batch processing support

•RESTful API access

About Voxtral Small 24B

Voxtral Small 24B is Mistral's lightweight entry in the Voxtral family, designed to handle both text and audio processing tasks. As a 24 billion parameter model, it sits in the lightweight tier of Mistral's model lineup, offering multimodal capabilities at a smaller scale than flagship alternatives. The model features a 32,000 token context window and supports both text and audio inputs, enabling applications that require understanding of spoken content alongside written text. This dual-modality approach allows for transcription, audio analysis, and cross-modal reasoning tasks without requiring separate specialized models. Voxtral Small 24B targets use cases where audio processing is needed but computational efficiency and cost control are priorities. It provides an alternative to larger multimodal models when the specific combination of text and audio understanding is required at scale.

Common Use Cases

Voxtral Small 24B is well-suited for applications requiring efficient audio and text processing at scale. Common use cases include audio transcription services, voice-to-text applications, podcast analysis, customer service call processing, and content moderation for audio platforms. Its lightweight architecture makes it appropriate for high-volume scenarios where audio understanding is needed but computational budgets are constrained. The model works well for building voice interfaces, analyzing recorded meetings, and processing multimedia content where both spoken and written elements need to be understood together.

Frequently Asked Questions

How much does Voxtral Small 24B cost per million tokens?

Voxtral Small 24B pricing varies by provider and usage patterns. As a lightweight multimodal model, costs will differ for text versus audio processing. Check the pricing table above for current rates across all providers offering this model.

What is Voxtral Small 24B best used for?

Voxtral Small 24B excels at audio transcription, voice analysis, and applications requiring both text and audio understanding. Its lightweight design makes it ideal for high-volume scenarios like customer service call analysis, podcast processing, and voice interface development where efficiency is important.

Does Voxtral Small 24B support tool calling or function execution?

No, Voxtral Small 24B does not support tool calling or function execution capabilities. It focuses specifically on text and audio processing tasks. If you need function calling alongside multimodal capabilities, you would need to consider other models in Mistral's lineup or alternative providers.