LightweightGoogle

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's lightweight multimodal model supporting text, image, video, and audio inputs with a 1M token context window.

Context 1.0M
Tier Lightweight
Tools Supported
Modalities text, image, video, audio
Input from
$0.300 / 1M tokens
across 3 providers

API Pricing

ProviderInput / 1MOutput / 1MUpdated
$0.300$2.504/4/2026
$0.300$2.504/12/2026
$0.300$2.504/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemini
Tier
Lightweight
Context Window
1.0M
Modalities
Text, Image, Video, Audio

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion
Aliases
gemini-2-5-flash-image, gemini-2-5-flash-live-api

Strengths & Limitations

  • Supports four modalities: text, image, video, and audio input processing
  • 1 million token context window for processing large documents and conversations
  • Fast inference speed at 186.54 output tokens per second
  • Low latency with 435ms time to first token
  • Tool calling support with structured output capabilities
  • Multiple API variants including live audio and image-specific endpoints
  • Lightweight tier optimized for cost-efficient high-volume processing
  • Proprietary model with no access to weights or local deployment
  • Lightweight tier with reduced reasoning capabilities compared to Gemini Pro models
  • No benchmark scores provided for reasoning or coding tasks
  • Limited to Google's API ecosystem and pricing structure

Key Features

1 million token context window
Multimodal input support (text, image, video, audio)
Tool calling with structured output
Live API for real-time audio processing
Image-specific API endpoint
Chat completion interface
Streaming response support
Fast inference optimization

About Gemini 2.5 Flash

Gemini 2.5 Flash is Google's lightweight model in the Gemini family, positioned as a fast and efficient option for multimodal applications. As part of Google's second-generation Gemini lineup, it sits below the flagship Gemini Pro models in terms of capability but offers optimized performance for high-throughput use cases. The model supports text, image, video, and audio inputs within a 1 million token context window, making it suitable for processing large multimodal documents and conversations. It includes tool calling capabilities and delivers 186.54 output tokens per second with a 435ms time to first token according to Artificial Analysis benchmarks. The model is available through multiple API variants including specialized endpoints for image processing and live audio interactions. Gemini 2.5 Flash targets applications requiring fast multimodal processing at scale, competing with other lightweight models like Claude Haiku and GPT-4o Mini. Its combination of speed, multimodal support, and large context window makes it practical for real-time applications and batch processing workflows where cost efficiency is prioritized over maximum reasoning capability.

Common Use Cases

Gemini 2.5 Flash is designed for applications requiring fast multimodal processing at scale, such as content moderation systems that need to analyze text, images, and videos quickly. Its large context window and speed make it suitable for processing lengthy multimodal documents, customer service chatbots handling mixed media inputs, and real-time applications like live audio transcription or image analysis. The model's lightweight positioning makes it cost-effective for high-volume batch processing tasks, automated content classification, and applications where rapid response times matter more than maximum reasoning depth. Its video and audio capabilities enable use cases like media analysis, educational content processing, and accessibility applications.

Frequently Asked Questions

How much does Gemini 2.5 Flash cost per million tokens?

Gemini 2.5 Flash pricing varies by provider and may differ for input versus output tokens. Check the pricing table above for current rates across all available providers.

What is Gemini 2.5 Flash best used for?

Gemini 2.5 Flash excels at fast multimodal processing tasks requiring text, image, video, and audio analysis. Its speed and large context window make it ideal for content moderation, customer service applications, media analysis, and high-volume batch processing where cost efficiency matters more than maximum reasoning capability.

How does Gemini 2.5 Flash compare to other lightweight models?

Gemini 2.5 Flash stands out among lightweight models with its comprehensive four-modality support including video and audio, plus a large 1M token context window. It offers competitive speed at 186.54 tokens per second and low 435ms latency, making it suitable for real-time applications that other lightweight models may not handle as effectively.