LightweightGoogle

Gemini 3.1 Flash Lite

Name: Gemini 3.1 Flash Lite
Availability: InStock
Author: Google

Gemini 3.1 Flash Lite is Google's lightweight multimodal model offering fast inference across text, image, audio, and video with a 1M token context window.

Context 1.0M

Tier Lightweight

Modalities text, image, audio, video

Input from

$0.125 / 1M tokens

across 4 providers

Compare Prices

API Pricing

Cheapest on Google Cloud — 44% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
Google CloudBatch	$0.125	$0.750	-	310 t/s	5.0s	7/13/2026
OpenRouter	$0.250	$1.50	$0.025	310 t/s	5.0s	7/13/2026
Google Cloud	$0.250	$1.50	-	310 t/s	5.0s	7/13/2026
Perplexity	$0.250	$1.50	-	310 t/s	5.0s	7/6/2026
Deep Infra	$0.250	$1.50	-	310 t/s	5.0s	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

25.0 / 100

Coding

34.7 / 100

Output Speed

310 t/s

Latency (TTFT)

5.0s

Reasoning & Knowledge

GPQA Diamond82.2%
Humanity's Last Exam16.2%

Coding

SciCode41.9%

Agentic & Tool Use

Terminal-Bench Hard24.2%
Terminal-Bench v2.131.1%
τ²-bench31.3%
τ-bench Banking8.7%

Instruction & Long Context

IFBench77.2%
Long-Context Reasoning65.3%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Google
Family: Gemini
Tier: Lightweight
Context Window: 1.0M
Modalities: Text, Image, Audio, Video

Capabilities

Tool Calling: No
Open Source: No
Aliases: gemini-3-1-flash-lite-preview

Strengths & Limitations

Strengths

Supports four modalities: text, image, audio, and video input
Large 1 million token context window for processing lengthy documents
Fast output generation at 199 tokens per second
Lightweight design optimized for speed and efficiency
Part of Google's current-generation Gemini 3.1 model family
Multimodal capabilities in a cost-optimized package

Limitations

No tool calling or function execution capabilities
Positioned as lightweight tier with reduced reasoning compared to Gemini 3.1 Pro
Proprietary model with no open-source availability
Time to first token of 7.8 seconds is slower than some competitors

Key Features

•1 million token context window

•Text input and generation

•Image input processing

•Audio input processing

•Video input processing

•Streaming response support

•Batch processing capabilities

•REST API access

About Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google's lightweight tier model within the Gemini family, positioned as a fast and efficient option for multimodal applications. As part of the Gemini 3.1 generation, it sits below the flagship Gemini 3.1 Pro in Google's model hierarchy, optimized for speed and cost-effectiveness rather than maximum capability. The model supports multimodal input across text, image, audio, and video modalities with a substantial 1 million token context window. Performance benchmarks show output speeds of approximately 199 tokens per second with a time to first token of around 7.8 seconds. However, the model does not include tool calling capabilities, distinguishing it from more feature-complete models in Google's lineup. Gemini 3.1 Flash Lite targets use cases where multimodal processing speed and context length matter more than maximum reasoning capability. Its combination of broad modality support and fast inference makes it suitable for applications requiring quick processing of mixed media content, though users needing advanced reasoning or tool integration would typically choose higher-tier models.

Common Use Cases

Gemini 3.1 Flash Lite is designed for applications requiring fast multimodal processing without the complexity of tool calling or maximum reasoning capability. Its combination of speed, large context window, and broad modality support makes it suitable for content analysis workflows, document processing with mixed media, rapid prototyping of multimodal applications, and high-throughput scenarios where cost efficiency matters. The model works well for summarizing long documents with embedded images, processing video content for basic analysis, and applications needing quick responses across multiple input types. Organizations looking for multimodal capabilities at scale, rather than complex reasoning or agentic workflows, will find this model appropriate for their needs.

Frequently Asked Questions

How much does Gemini 3.1 Flash Lite cost per million tokens?

Gemini 3.1 Flash Lite pricing varies by provider and usage type. Check the pricing table above for current rates across all available providers and compare input vs output token costs.

What is Gemini 3.1 Flash Lite best used for?

Gemini 3.1 Flash Lite excels at fast multimodal processing tasks including document analysis with images, basic video content processing, and high-volume applications where speed and cost efficiency are priorities over maximum reasoning capability.

Does Gemini 3.1 Flash Lite support tool calling and function execution?

No, Gemini 3.1 Flash Lite does not include tool calling capabilities. For applications requiring function execution or API integrations, consider Gemini 3.1 Pro or other models that specifically support structured tool calling.