Gemini 2.5 Flash
Gemini 2.5 Flash is Google's lightweight multimodal model supporting text, image, video, and audio inputs with a 1M token context window.
API Pricing
| Provider | Input / 1M | Output / 1M | Updated |
|---|---|---|---|
| $0.300 | $2.50 | 4/4/2026 | |
| $0.300 | $2.50 | 4/12/2026 | |
| $0.300 | $2.50 | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Family
- Gemini
- Tier
- Lightweight
- Context Window
- 1.0M
- Modalities
- Text, Image, Video, Audio
Capabilities
- Tool Calling
- Yes
- Open Source
- No
- Subtypes
- Chat Completion
- Aliases
- gemini-2-5-flash-image, gemini-2-5-flash-live-api
Strengths & Limitations
- Supports four modalities: text, image, video, and audio input processing
- 1 million token context window for processing large documents and conversations
- Fast inference speed at 186.54 output tokens per second
- Low latency with 435ms time to first token
- Tool calling support with structured output capabilities
- Multiple API variants including live audio and image-specific endpoints
- Lightweight tier optimized for cost-efficient high-volume processing
- Proprietary model with no access to weights or local deployment
- Lightweight tier with reduced reasoning capabilities compared to Gemini Pro models
- No benchmark scores provided for reasoning or coding tasks
- Limited to Google's API ecosystem and pricing structure
Key Features
About Gemini 2.5 Flash
Common Use Cases
Gemini 2.5 Flash is designed for applications requiring fast multimodal processing at scale, such as content moderation systems that need to analyze text, images, and videos quickly. Its large context window and speed make it suitable for processing lengthy multimodal documents, customer service chatbots handling mixed media inputs, and real-time applications like live audio transcription or image analysis. The model's lightweight positioning makes it cost-effective for high-volume batch processing tasks, automated content classification, and applications where rapid response times matter more than maximum reasoning depth. Its video and audio capabilities enable use cases like media analysis, educational content processing, and accessibility applications.
Frequently Asked Questions
How much does Gemini 2.5 Flash cost per million tokens?
Gemini 2.5 Flash pricing varies by provider and may differ for input versus output tokens. Check the pricing table above for current rates across all available providers.
What is Gemini 2.5 Flash best used for?
Gemini 2.5 Flash excels at fast multimodal processing tasks requiring text, image, video, and audio analysis. Its speed and large context window make it ideal for content moderation, customer service applications, media analysis, and high-volume batch processing where cost efficiency matters more than maximum reasoning capability.
How does Gemini 2.5 Flash compare to other lightweight models?
Gemini 2.5 Flash stands out among lightweight models with its comprehensive four-modality support including video and audio, plus a large 1M token context window. It offers competitive speed at 186.54 tokens per second and low 435ms latency, making it suitable for real-time applications that other lightweight models may not handle as effectively.