LightweightGoogle

Gemma 4 26B

Name: Gemma 4 26B
Availability: InStock
Author: Google

Gemma 4 26B is Google's lightweight multimodal model supporting text, image, and video inputs with a 262K token context window.

Context 262K

Tier Lightweight

Modalities text, image, video

Input from

$0.060 / 1M tokens

across 3 providers

Compare Prices

API Pricing

Cheapest on OpenRouter — 30% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
OpenRouter	$0.060	$0.330	-	117 t/s	639ms	7/13/2026
Deep Infra	$0.070	$0.340	-	117 t/s	639ms	7/13/2026
IO.NET	$0.126	$0.420	$0.063	117 t/s	639ms	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

20.1 / 100

Output Speed

117 t/s

Latency (TTFT)

639ms

Reasoning & Knowledge

GPQA Diamond71.4%
Humanity's Last Exam10.7%

Coding

SciCode37.3%

Agentic & Tool Use

Terminal-Bench Hard25.0%
τ²-bench40.4%

Instruction & Long Context

IFBench45.4%
Long-Context Reasoning39.7%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Google
Family: Gemma
Tier: Lightweight
Context Window: 262K
Modalities: Text, Image, Video

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Supports multimodal inputs including text, images, and video
Large 262,144 token context window for processing lengthy content
Lightweight 26B parameter design for efficient inference
Part of Google's established Gemma model family
Suitable for high-throughput multimodal applications
Lower computational requirements compared to frontier models

Limitations

No tool calling or function calling support
Proprietary model with no open source weights available
Lightweight tier may have reduced reasoning capabilities versus larger models
Limited structured output capabilities without tool calling

Key Features

•262,144 token context window

•Text input and generation

•Image input processing

•Video input analysis

•Multimodal content understanding

•Streaming response support

•Batch processing capabilities

About Gemma 4 26B

Gemma 4 26B is Google's lightweight multimodal model in the Gemma family, positioned as an efficient option for applications requiring text, image, and video processing capabilities. As a 26 billion parameter model, it sits in the lightweight tier, offering a balance between capability and computational efficiency. The model features a substantial 262,144 token context window, enabling processing of lengthy documents, conversations, or multimodal content sequences. Its multimodal capabilities span text generation, image understanding, and video analysis, making it suitable for diverse content processing tasks. However, the model does not support tool calling functionality, limiting its use in agentic applications that require structured API interactions. Gemma 4 26B serves applications where multimodal understanding is needed but the computational overhead of larger frontier models is unnecessary. Its lightweight design makes it practical for organizations seeking multimodal capabilities while managing inference costs and latency requirements.

Common Use Cases

Gemma 4 26B is well-suited for multimodal content analysis, document processing with embedded images, video content summarization, and educational applications requiring visual understanding. Its lightweight design makes it practical for customer service chatbots that need to process images or videos, content moderation across multiple media types, and automated media cataloging. The large context window enables processing of lengthy multimodal documents or extended video content, while the efficient parameter count keeps inference costs manageable for high-volume applications.

Frequently Asked Questions

How much does Gemma 4 26B cost per million tokens?

Gemma 4 26B pricing varies by provider and may differ for text versus multimodal inputs. Check the pricing table above for current rates across all providers.

What is Gemma 4 26B best used for?

Gemma 4 26B excels at multimodal content processing including image analysis, video understanding, and document processing with visual elements. Its lightweight design makes it ideal for high-volume applications requiring multimodal capabilities without the computational overhead of larger frontier models.

Does Gemma 4 26B support tool calling and function calling?

No, Gemma 4 26B does not support tool calling or function calling capabilities. For applications requiring structured API interactions or agent-like behavior, consider models with built-in tool calling support.