LightweightGoogle

Gemini 2.5 Flash

Name: Gemini 2.5 Flash
Availability: InStock
Author: Google

Gemini 2.5 Flash is Google's lightweight multimodal model supporting text, image, video, and audio inputs with a 1M token context window.

Context 1.0M

Tier Lightweight

Tools Supported

Modalities text, image, video, audio

Input from

$0.150 / 1M tokens

across 3 providers

Compare Prices Model Page →API Docs

API Pricing

Cheapest on Google Cloud — 43% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Updated
Google CloudBatch	$0.150	$1.25	$0.030	6/20/2026
Deep Infra	$0.300	$2.50	-	7/13/2026
Google Cloud	$0.300	$2.50	$0.030	6/22/2026
OpenRouter	$0.300	$2.50	$0.030	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

18.8 / 100

Math

56.7 / 100

Reasoning & Knowledge

MMLU-Pro83.6%
GPQA Diamond76.6%
Humanity's Last Exam7.8%

Coding

LiveCodeBench62.5%
SciCode37.5%

Math

AIME 202556.7%

Agentic & Tool Use

Terminal-Bench Hard14.4%
τ²-bench28.4%

Instruction & Long Context

IFBench43.5%
Long-Context Reasoning56.7%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Google
Family: Gemini
Tier: Lightweight
Context Window: 1.0M
Modalities: Text, Image, Video, Audio

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion
Aliases: gemini-2-5-flash-image, gemini-2-5-flash-live-api

Strengths & Limitations

Strengths

Supports four modalities: text, image, video, and audio input processing
1 million token context window for processing large documents and conversations
Fast inference speed at 186.54 output tokens per second
Low latency with 435ms time to first token
Tool calling support with structured output capabilities
Multiple API variants including live audio and image-specific endpoints
Lightweight tier optimized for cost-efficient high-volume processing

Limitations

Proprietary model with no access to weights or local deployment
Lightweight tier with reduced reasoning capabilities compared to Gemini Pro models
No benchmark scores provided for reasoning or coding tasks
Limited to Google's API ecosystem and pricing structure

Key Features

•1 million token context window

•Multimodal input support (text, image, video, audio)

•Tool calling with structured output

•Live API for real-time audio processing

•Image-specific API endpoint

•Chat completion interface

•Streaming response support

•Fast inference optimization

About Gemini 2.5 Flash

Gemini 2.5 Flash is Google's lightweight model in the Gemini family, positioned as a fast and efficient option for multimodal applications. As part of Google's second-generation Gemini lineup, it sits below the flagship Gemini Pro models in terms of capability but offers optimized performance for high-throughput use cases. The model supports text, image, video, and audio inputs within a 1 million token context window, making it suitable for processing large multimodal documents and conversations. It includes tool calling capabilities and delivers 186.54 output tokens per second with a 435ms time to first token according to Artificial Analysis benchmarks. The model is available through multiple API variants including specialized endpoints for image processing and live audio interactions. Gemini 2.5 Flash targets applications requiring fast multimodal processing at scale, competing with other lightweight models like Claude Haiku and GPT-4o Mini. Its combination of speed, multimodal support, and large context window makes it practical for real-time applications and batch processing workflows where cost efficiency is prioritized over maximum reasoning capability.

Common Use Cases

Gemini 2.5 Flash is designed for applications requiring fast multimodal processing at scale, such as content moderation systems that need to analyze text, images, and videos quickly. Its large context window and speed make it suitable for processing lengthy multimodal documents, customer service chatbots handling mixed media inputs, and real-time applications like live audio transcription or image analysis. The model's lightweight positioning makes it cost-effective for high-volume batch processing tasks, automated content classification, and applications where rapid response times matter more than maximum reasoning depth. Its video and audio capabilities enable use cases like media analysis, educational content processing, and accessibility applications.

Frequently Asked Questions

How much does Gemini 2.5 Flash cost per million tokens?

Gemini 2.5 Flash pricing varies by provider and may differ for input versus output tokens. Check the pricing table above for current rates across all available providers.

What is Gemini 2.5 Flash best used for?

Gemini 2.5 Flash excels at fast multimodal processing tasks requiring text, image, video, and audio analysis. Its speed and large context window make it ideal for content moderation, customer service applications, media analysis, and high-volume batch processing where cost efficiency matters more than maximum reasoning capability.

How does Gemini 2.5 Flash compare to other lightweight models?

Gemini 2.5 Flash stands out among lightweight models with its comprehensive four-modality support including video and audio, plus a large 1M token context window. It offers competitive speed at 186.54 tokens per second and low 435ms latency, making it suitable for real-time applications that other lightweight models may not handle as effectively.