LightweightGoogle

Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google's lightweight multimodal model offering fast inference across text, image, audio, and video with a 1M token context window.

Context 1.0M
Tier Lightweight
Modalities text, image, audio, video
Input from
$0.250 / 1M tokens
across 2 providers

API Pricing

ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.250$1.50199 t/s7.8s4/10/2026
$0.250$1.50199 t/s7.8s4/10/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemini
Tier
Lightweight
Context Window
1.0M
Modalities
Text, Image, Audio, Video

Capabilities

Tool Calling
No
Open Source
No
Aliases
gemini-3-1-flash-lite-preview

Strengths & Limitations

  • Supports four modalities: text, image, audio, and video input
  • Large 1 million token context window for processing lengthy documents
  • Fast output generation at 199 tokens per second
  • Lightweight design optimized for speed and efficiency
  • Part of Google's current-generation Gemini 3.1 model family
  • Multimodal capabilities in a cost-optimized package
  • No tool calling or function execution capabilities
  • Positioned as lightweight tier with reduced reasoning compared to Gemini 3.1 Pro
  • Proprietary model with no open-source availability
  • Time to first token of 7.8 seconds is slower than some competitors

Key Features

1 million token context window
Text input and generation
Image input processing
Audio input processing
Video input processing
Streaming response support
Batch processing capabilities
REST API access

About Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google's lightweight tier model within the Gemini family, positioned as a fast and efficient option for multimodal applications. As part of the Gemini 3.1 generation, it sits below the flagship Gemini 3.1 Pro in Google's model hierarchy, optimized for speed and cost-effectiveness rather than maximum capability. The model supports multimodal input across text, image, audio, and video modalities with a substantial 1 million token context window. Performance benchmarks show output speeds of approximately 199 tokens per second with a time to first token of around 7.8 seconds. However, the model does not include tool calling capabilities, distinguishing it from more feature-complete models in Google's lineup. Gemini 3.1 Flash Lite targets use cases where multimodal processing speed and context length matter more than maximum reasoning capability. Its combination of broad modality support and fast inference makes it suitable for applications requiring quick processing of mixed media content, though users needing advanced reasoning or tool integration would typically choose higher-tier models.

Common Use Cases

Gemini 3.1 Flash Lite is designed for applications requiring fast multimodal processing without the complexity of tool calling or maximum reasoning capability. Its combination of speed, large context window, and broad modality support makes it suitable for content analysis workflows, document processing with mixed media, rapid prototyping of multimodal applications, and high-throughput scenarios where cost efficiency matters. The model works well for summarizing long documents with embedded images, processing video content for basic analysis, and applications needing quick responses across multiple input types. Organizations looking for multimodal capabilities at scale, rather than complex reasoning or agentic workflows, will find this model appropriate for their needs.

Frequently Asked Questions

How much does Gemini 3.1 Flash Lite cost per million tokens?

Gemini 3.1 Flash Lite pricing varies by provider and usage type. Check the pricing table above for current rates across all available providers and compare input vs output token costs.

What is Gemini 3.1 Flash Lite best used for?

Gemini 3.1 Flash Lite excels at fast multimodal processing tasks including document analysis with images, basic video content processing, and high-volume applications where speed and cost efficiency are priorities over maximum reasoning capability.

Does Gemini 3.1 Flash Lite support tool calling and function execution?

No, Gemini 3.1 Flash Lite does not include tool calling capabilities. For applications requiring function execution or API integrations, consider Gemini 3.1 Pro or other models that specifically support structured tool calling.