LightweightGoogle

Gemini 3 Flash

Name: Gemini 3 Flash
Availability: InStock
Author: Google

Gemini 3 Flash is Google's lightweight multimodal model with 1M token context window, supporting text, image, video, and audio inputs for high-speed applications.

Context 1.0M

Tier Lightweight

Knowledge Jun 2025

Tools Supported

Modalities text, image, video, audio

Input from

$0.250 / 1M tokens

across 3 providers

Compare Prices Model Page →API Docs

API Pricing

Cheapest on Google Cloud — 64% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
Google CloudBatch	$0.250	$1.50	-	205 t/s	733ms	7/11/2026
OpenRouter	$0.500	$3.00	$0.050	205 t/s	733ms	7/13/2026
Google Cloud	$0.500	$3.00	-	205 t/s	733ms	7/11/2026
Deep Infra	$1.50	$9.00	-	205 t/s	733ms	6/18/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

27.4 / 100

Math

55.7 / 100

Output Speed

205 t/s

Latency (TTFT)

733ms

Reasoning & Knowledge

MMLU-Pro88.2%
GPQA Diamond81.2%
Humanity's Last Exam14.1%

Coding

LiveCodeBench79.7%
SciCode49.9%

Math

AIME 202555.7%

Agentic & Tool Use

Terminal-Bench Hard31.8%
τ²-bench43.3%

Instruction & Long Context

IFBench55.1%
Long-Context Reasoning48.0%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Google
Family: Gemini
Tier: Lightweight
Context Window: 1.0M
Knowledge Cutoff: Jun 2025
Modalities: Text, Image, Video, Audio

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion

Strengths & Limitations

Strengths

1 million token context window enables processing of very long documents and conversations
Multimodal support for text, image, video, and audio inputs in a single model
High output speed at 180.04 tokens per second for rapid response applications
Tool calling functionality with structured outputs
Recent knowledge cutoff of June 2025 for current information
Lightweight architecture optimized for cost-effectiveness
Chat completion interface for conversational applications

Limitations

Slower time to first token at 5127ms compared to some competitors
Proprietary model with no open source weights available
Lightweight tier means reduced reasoning capability compared to flagship models
Limited to Google's ecosystem and approved API providers

Key Features

•1 million token context window

•Multimodal inputs (text, image, video, audio)

•Tool calling with structured output

•Chat completion interface

•Streaming responses

•High-speed text generation (180+ tokens/second)

•June 2025 knowledge cutoff

•Batch processing support

About Gemini 3 Flash

Gemini 3 Flash is Google's lightweight model in the Gemini family, positioned as a fast, cost-effective option for applications requiring multimodal capabilities without the computational overhead of flagship models. As part of Google's third-generation Gemini lineup, it sits below the more capable Gemini 3.1 Pro in terms of reasoning power but offers practical advantages for high-throughput scenarios. The model features a substantial 1 million token context window and supports multimodal inputs including text, images, video, and audio content. With benchmark performance showing 180.04 output tokens per second, Gemini 3 Flash prioritizes speed and efficiency. It includes tool calling capabilities and maintains a knowledge cutoff of June 2025, making it current for recent information retrieval tasks. Gemini 3 Flash serves applications where rapid response times and multimodal processing matter more than maximum reasoning capability. Its combination of speed, large context window, and broad modality support makes it suitable for content analysis, customer service applications, and real-time multimodal tasks where the full power of flagship models isn't required.

Common Use Cases

Gemini 3 Flash is designed for applications requiring fast multimodal processing without the cost overhead of flagship models. Its large context window and multimodal capabilities make it suitable for content analysis workflows, document processing with mixed media, customer support chatbots that handle images and documents, and real-time applications where response speed is critical. The model works well for high-volume use cases like content moderation, automated social media responses, and educational applications that need to process various media types quickly. Its lightweight nature makes it cost-effective for startups and businesses that need capable multimodal AI without premium pricing.

Frequently Asked Questions

How much does Gemini 3 Flash cost per million tokens?

Gemini 3 Flash pricing varies by provider and usage type (standard vs batch processing). Input and output tokens are typically priced differently for multimodal models. Check the pricing table above for current rates across all providers offering Gemini 3 Flash access.

What is Gemini 3 Flash best used for?

Gemini 3 Flash excels at high-speed multimodal applications where cost efficiency matters. Its 1M token context window and support for text, image, video, and audio make it ideal for content analysis, document processing, customer support with media attachments, and real-time applications requiring fast responses across multiple content types.

How does Gemini 3 Flash compare to other lightweight models for multimodal tasks?

Gemini 3 Flash stands out with its 1 million token context window, which is larger than most lightweight competitors, and native support for four modalities including video and audio. Its 180+ tokens per second output speed is competitive, though the 5+ second time to first token is slower than some alternatives optimized purely for text generation.