LightweightGoogle

Gemini 3 Flash

Gemini 3 Flash is Google's lightweight multimodal model with 1M token context window, supporting text, image, video, and audio inputs for high-speed applications.

Context 1.0M
Tier Lightweight
Knowledge Jun 2025
Tools Supported
Modalities text, image, video, audio
Input from
$0.250 / 1M tokens
across 2 providers

API Pricing

Cheapest on Google Cloud 40% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.250$1.50188 t/s2.0s4/13/2026
$0.500$3.00188 t/s2.0s4/14/2026
$0.500$3.00188 t/s2.0s4/13/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemini
Tier
Lightweight
Context Window
1.0M
Knowledge Cutoff
Jun 2025
Modalities
Text, Image, Video, Audio

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion

Strengths & Limitations

  • 1 million token context window enables processing of very long documents and conversations
  • Multimodal support for text, image, video, and audio inputs in a single model
  • High output speed at 180.04 tokens per second for rapid response applications
  • Tool calling functionality with structured outputs
  • Recent knowledge cutoff of June 2025 for current information
  • Lightweight architecture optimized for cost-effectiveness
  • Chat completion interface for conversational applications
  • Slower time to first token at 5127ms compared to some competitors
  • Proprietary model with no open source weights available
  • Lightweight tier means reduced reasoning capability compared to flagship models
  • Limited to Google's ecosystem and approved API providers

Key Features

1 million token context window
Multimodal inputs (text, image, video, audio)
Tool calling with structured output
Chat completion interface
Streaming responses
High-speed text generation (180+ tokens/second)
June 2025 knowledge cutoff
Batch processing support

About Gemini 3 Flash

Gemini 3 Flash is Google's lightweight model in the Gemini family, positioned as a fast, cost-effective option for applications requiring multimodal capabilities without the computational overhead of flagship models. As part of Google's third-generation Gemini lineup, it sits below the more capable Gemini 3.1 Pro in terms of reasoning power but offers practical advantages for high-throughput scenarios. The model features a substantial 1 million token context window and supports multimodal inputs including text, images, video, and audio content. With benchmark performance showing 180.04 output tokens per second, Gemini 3 Flash prioritizes speed and efficiency. It includes tool calling capabilities and maintains a knowledge cutoff of June 2025, making it current for recent information retrieval tasks. Gemini 3 Flash serves applications where rapid response times and multimodal processing matter more than maximum reasoning capability. Its combination of speed, large context window, and broad modality support makes it suitable for content analysis, customer service applications, and real-time multimodal tasks where the full power of flagship models isn't required.

Common Use Cases

Gemini 3 Flash is designed for applications requiring fast multimodal processing without the cost overhead of flagship models. Its large context window and multimodal capabilities make it suitable for content analysis workflows, document processing with mixed media, customer support chatbots that handle images and documents, and real-time applications where response speed is critical. The model works well for high-volume use cases like content moderation, automated social media responses, and educational applications that need to process various media types quickly. Its lightweight nature makes it cost-effective for startups and businesses that need capable multimodal AI without premium pricing.

Frequently Asked Questions

How much does Gemini 3 Flash cost per million tokens?

Gemini 3 Flash pricing varies by provider and usage type (standard vs batch processing). Input and output tokens are typically priced differently for multimodal models. Check the pricing table above for current rates across all providers offering Gemini 3 Flash access.

What is Gemini 3 Flash best used for?

Gemini 3 Flash excels at high-speed multimodal applications where cost efficiency matters. Its 1M token context window and support for text, image, video, and audio make it ideal for content analysis, document processing, customer support with media attachments, and real-time applications requiring fast responses across multiple content types.

How does Gemini 3 Flash compare to other lightweight models for multimodal tasks?

Gemini 3 Flash stands out with its 1 million token context window, which is larger than most lightweight competitors, and native support for four modalities including video and audio. Its 180+ tokens per second output speed is competitive, though the 5+ second time to first token is slower than some alternatives optimized purely for text generation.