LightweightGoogle

Gemma 4 26B

Gemma 4 26B is Google's lightweight multimodal model supporting text, image, and video inputs with a 262K token context window.

Context 262K
Tier Lightweight
Modalities text, image, video
Input from
$0.080 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MUpdated
$0.080$0.3504/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemma
Tier
Lightweight
Context Window
262K
Modalities
Text, Image, Video

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Supports multimodal inputs including text, images, and video
  • Large 262,144 token context window for processing lengthy content
  • Lightweight 26B parameter design for efficient inference
  • Part of Google's established Gemma model family
  • Suitable for high-throughput multimodal applications
  • Lower computational requirements compared to frontier models
  • No tool calling or function calling support
  • Proprietary model with no open source weights available
  • Lightweight tier may have reduced reasoning capabilities versus larger models
  • Limited structured output capabilities without tool calling

Key Features

262,144 token context window
Text input and generation
Image input processing
Video input analysis
Multimodal content understanding
Streaming response support
Batch processing capabilities

About Gemma 4 26B

Gemma 4 26B is Google's lightweight multimodal model in the Gemma family, positioned as an efficient option for applications requiring text, image, and video processing capabilities. As a 26 billion parameter model, it sits in the lightweight tier, offering a balance between capability and computational efficiency. The model features a substantial 262,144 token context window, enabling processing of lengthy documents, conversations, or multimodal content sequences. Its multimodal capabilities span text generation, image understanding, and video analysis, making it suitable for diverse content processing tasks. However, the model does not support tool calling functionality, limiting its use in agentic applications that require structured API interactions. Gemma 4 26B serves applications where multimodal understanding is needed but the computational overhead of larger frontier models is unnecessary. Its lightweight design makes it practical for organizations seeking multimodal capabilities while managing inference costs and latency requirements.

Common Use Cases

Gemma 4 26B is well-suited for multimodal content analysis, document processing with embedded images, video content summarization, and educational applications requiring visual understanding. Its lightweight design makes it practical for customer service chatbots that need to process images or videos, content moderation across multiple media types, and automated media cataloging. The large context window enables processing of lengthy multimodal documents or extended video content, while the efficient parameter count keeps inference costs manageable for high-volume applications.

Frequently Asked Questions

How much does Gemma 4 26B cost per million tokens?

Gemma 4 26B pricing varies by provider and may differ for text versus multimodal inputs. Check the pricing table above for current rates across all providers.

What is Gemma 4 26B best used for?

Gemma 4 26B excels at multimodal content processing including image analysis, video understanding, and document processing with visual elements. Its lightweight design makes it ideal for high-volume applications requiring multimodal capabilities without the computational overhead of larger frontier models.

Does Gemma 4 26B support tool calling and function calling?

No, Gemma 4 26B does not support tool calling or function calling capabilities. For applications requiring structured API interactions or agent-like behavior, consider models with built-in tool calling support.