LightweightGoogle

Gemma 4 31B

Gemma 4 31B is Google's lightweight multimodal model supporting text, image, and video inputs with a 262K token context window.

Context 262K
Tier Lightweight
Modalities text, image, video
Input from
$0.130 / 1M tokens
across 2 providers

API Pricing

Cheapest on OpenRouter 21% below avg
ProviderInput / 1MOutput / 1MUpdated
$0.130$0.3804/14/2026
$0.200$0.5004/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemma
Tier
Lightweight
Context Window
262K
Modalities
Text, Image, Video

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Supports multimodal inputs including text, images, and video
  • 262K token context window for processing lengthy documents
  • 31B parameter size offers efficiency while maintaining capability
  • Part of Google's established Gemma model family
  • Lightweight tier positioning reduces computational requirements
  • Video input support enables temporal visual understanding
  • No tool calling or function calling capabilities
  • Proprietary model with no open source weights available
  • Lightweight tier means less capable than flagship models for complex reasoning
  • Smaller parameter count than premium models in Google's lineup

Key Features

262K token context window
Text input and generation
Image input processing
Video input analysis
Multimodal understanding across modalities
31 billion parameter architecture
Streaming response support
Batch processing compatibility

About Gemma 4 31B

Gemma 4 31B is Google's lightweight model in the Gemma family, designed for efficient multimodal processing across text, image, and video inputs. As a 31 billion parameter model, it sits in the lightweight tier, offering a balance between capability and computational efficiency compared to larger flagship models in Google's lineup. The model features a 262K token context window, enabling processing of substantial documents and extended conversations. Its multimodal capabilities allow it to analyze and understand text alongside visual content including both static images and video sequences, making it suitable for applications requiring cross-modal understanding without the computational overhead of larger models. Gemma 4 31B serves applications where multimodal processing is needed but computational resources or latency requirements favor a more efficient model over maximum capability. It provides an alternative to flagship models when the use case doesn't require the most advanced reasoning capabilities but still benefits from visual understanding and substantial context length.

Common Use Cases

Gemma 4 31B is well-suited for applications requiring multimodal analysis where efficiency is important, such as content moderation across text and visual media, educational tools that process mixed-media content, and customer support systems handling images or videos alongside text queries. The model's lightweight nature makes it appropriate for high-volume scenarios like document analysis with embedded images, social media content processing, or applications where consistent low latency is preferred over maximum reasoning capability. Its video input support enables use cases like surveillance analysis, educational video summarization, and media content categorization where real-time or high-throughput processing is valued.

Frequently Asked Questions

How much does Gemma 4 31B cost per million tokens?

Gemma 4 31B pricing varies by provider and may differ for text versus multimodal inputs. Check the pricing table above for current rates across all providers offering this model.

What is Gemma 4 31B best used for?

Gemma 4 31B excels at multimodal tasks requiring text, image, and video understanding where efficiency is important. It's ideal for content analysis, document processing with visuals, media moderation, and applications needing substantial context length without the computational cost of flagship models.

Does Gemma 4 31B support tool calling or function calling?

No, Gemma 4 31B does not support tool calling or function calling capabilities. For applications requiring API integrations or external tool use, you would need to consider other models that include these features.