LightweightMistral

Mistral Small

Name: Mistral Small
Availability: InStock
Author: Mistral

Mistral Small is Mistral's lightweight model optimized for speed and efficiency, featuring a 32K token context window and tool calling capabilities.

Context 32K

Tier Lightweight

Knowledge Sep 2023

Tools Supported

Input from

$0.050 / 1M tokens

across 5 providers

Compare Prices Model Page →API Docs

API Pricing

Cheapest on OpenRouter — 84% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Updated
OpenRouter	$0.050	$0.080	-	5/28/2026
Deep Infra	$0.075	$0.200	-	5/29/2026
Together AI	$0.100	$0.300	-	5/29/2026
Scaleway	$0.175	$0.407	$0.175	5/27/2026
Amazon AWSBatch	$0.500	$1.50	-	5/29/2026
Amazon AWS	$1.00	$3.00	-	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Mistral
Family: Mistral
Tier: Lightweight
Context Window: 32K
Knowledge Cutoff: Sep 2023
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion

Strengths & Limitations

Strengths

Fast inference speed at 123.31 output tokens per second
Low latency with 337ms time to first token
32,000 token context window for substantial document processing
Tool calling support for function execution
Optimized for efficiency while maintaining reasonable capabilities
Chat completion interface for conversational applications

Limitations

Text-only modality with no image or multimodal support
Proprietary model with no open source weights available
Knowledge cutoff of September 2023 is older than some competing models
Positioned as lightweight tier with reduced capabilities compared to flagship models

Key Features

•32,000 token context window

•Tool calling with function execution

•Chat completion interface

•Streaming response support

•Fast inference optimization

•Text-based language processing

•API-based deployment model

About Mistral Small

Mistral Small is a lightweight model developed by Mistral, positioned as the company's efficient option for applications requiring fast inference with reasonable capability. Within Mistral's model family, it serves as the speed-optimized tier below their more capable flagship offerings. The model supports text-based chat completion with a 32,000 token context window and includes tool calling functionality. It demonstrates strong inference speed with 123.31 output tokens per second and a 337ms time to first token, making it well-suited for applications where response latency matters. The model has a knowledge cutoff of September 2023. Mistral Small targets use cases where developers need reliable language model capabilities without the computational overhead of larger models. It competes with other lightweight offerings in the market by balancing performance with efficiency, making it suitable for production deployments requiring consistent speed.

Common Use Cases

Mistral Small is designed for applications requiring fast, efficient language processing where speed and cost-effectiveness take priority over maximum capability. Its combination of 32K context window and tool calling makes it suitable for customer service chatbots, content moderation, text classification, and basic coding assistance. The model's optimized inference speed makes it particularly valuable for high-volume production deployments, real-time applications, and scenarios where response latency directly impacts user experience. Its lightweight nature also makes it appropriate for developers building applications that need reliable language understanding without the computational costs of larger flagship models.

Frequently Asked Questions

How much does Mistral Small cost per million tokens?

Mistral Small pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Mistral Small best used for?

Mistral Small excels at high-volume applications requiring fast response times, such as customer service chatbots, text classification, content moderation, and basic coding assistance. Its 32K context window and tool calling capabilities make it suitable for document processing and function execution while maintaining efficient inference speeds.

How does Mistral Small compare to other lightweight models?

Mistral Small offers competitive inference speed at 123.31 tokens per second with a substantial 32K context window, which is larger than many lightweight alternatives. Its tool calling support and 337ms time to first token provide a balance of capability and efficiency, though it's limited to text-only processing unlike some multimodal lightweight options.