Gemini 3 Flash
Gemini 3 Flash is Google's lightweight multimodal model with 1M token context window, supporting text, image, video, and audio inputs for high-speed applications.
API Pricing
Cheapest on Google Cloud — 40% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.250 | $1.50 | 188 t/s | 2.0s | 4/13/2026 | |
| $0.500 | $3.00 | 188 t/s | 2.0s | 4/14/2026 | |
| $0.500 | $3.00 | 188 t/s | 2.0s | 4/13/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Family
- Gemini
- Tier
- Lightweight
- Context Window
- 1.0M
- Knowledge Cutoff
- Jun 2025
- Modalities
- Text, Image, Video, Audio
Capabilities
- Tool Calling
- Yes
- Open Source
- No
- Subtypes
- Chat Completion
Strengths & Limitations
- 1 million token context window enables processing of very long documents and conversations
- Multimodal support for text, image, video, and audio inputs in a single model
- High output speed at 180.04 tokens per second for rapid response applications
- Tool calling functionality with structured outputs
- Recent knowledge cutoff of June 2025 for current information
- Lightweight architecture optimized for cost-effectiveness
- Chat completion interface for conversational applications
- Slower time to first token at 5127ms compared to some competitors
- Proprietary model with no open source weights available
- Lightweight tier means reduced reasoning capability compared to flagship models
- Limited to Google's ecosystem and approved API providers
Key Features
About Gemini 3 Flash
Common Use Cases
Gemini 3 Flash is designed for applications requiring fast multimodal processing without the cost overhead of flagship models. Its large context window and multimodal capabilities make it suitable for content analysis workflows, document processing with mixed media, customer support chatbots that handle images and documents, and real-time applications where response speed is critical. The model works well for high-volume use cases like content moderation, automated social media responses, and educational applications that need to process various media types quickly. Its lightweight nature makes it cost-effective for startups and businesses that need capable multimodal AI without premium pricing.
Frequently Asked Questions
How much does Gemini 3 Flash cost per million tokens?
Gemini 3 Flash pricing varies by provider and usage type (standard vs batch processing). Input and output tokens are typically priced differently for multimodal models. Check the pricing table above for current rates across all providers offering Gemini 3 Flash access.
What is Gemini 3 Flash best used for?
Gemini 3 Flash excels at high-speed multimodal applications where cost efficiency matters. Its 1M token context window and support for text, image, video, and audio make it ideal for content analysis, document processing, customer support with media attachments, and real-time applications requiring fast responses across multiple content types.
How does Gemini 3 Flash compare to other lightweight models for multimodal tasks?
Gemini 3 Flash stands out with its 1 million token context window, which is larger than most lightweight competitors, and native support for four modalities including video and audio. Its 180+ tokens per second output speed is competitive, though the 5+ second time to first token is slower than some alternatives optimized purely for text generation.