LightweightGoogle

Gemma 3 4B

Name: Gemma 3 4B
Availability: InStock
Author: Google

Gemma 3 4B is Google's lightweight multimodal model supporting text and image inputs with a 131K token context window.

Context 131K

Tier Lightweight

Modalities text, image

Input from

$0.020 / 1M tokens

across 3 providers

Compare Prices

API Pricing

Cheapest on Amazon AWS — 43% below avg

Provider	Input / 1M	Output / 1M	Updated
Amazon AWSBatch	$0.020	$0.040	5/29/2026
Deep Infra	$0.040	$0.080	5/29/2026
Amazon AWS	$0.040	$0.080	5/29/2026
OpenRouter	$0.040	$0.080	5/28/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Google
Family: Gemma
Tier: Lightweight
Context Window: 131K
Modalities: Text, Image

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Supports multimodal inputs including text and images
Large 131K token context window for a lightweight model
Fast inference at 30.32 output tokens per second
Efficient 4B parameter size reduces computational requirements
Part of Google's established Gemma model family
Reasonable time to first token at 1095ms
Suitable for resource-constrained multimodal applications

Limitations

No tool calling or function execution capabilities
Proprietary model - weights not publicly available
Smaller parameter count limits complex reasoning compared to larger models
Limited modality support compared to models with audio or video input

Key Features

•131K token context window

•Text input and generation

•Image input processing

•Multimodal understanding

•Streaming response support

•4 billion parameter architecture

•Lightweight inference profile

•Google Gemma family integration

About Gemma 3 4B

Gemma 3 4B is Google's lightweight model in the Gemma family, designed for efficient multimodal processing at reduced computational cost. As a 4 billion parameter model, it sits in the lightweight tier, offering a balance between capability and resource efficiency. The model supports both text and image inputs with a 131K token context window, enabling multimodal applications while maintaining fast inference speeds. Benchmark data shows it achieves 30.32 output tokens per second with a time to first token of 1095ms. The model does not include tool calling capabilities, focusing instead on core text generation and image understanding tasks. Gemma 3 4B targets use cases where multimodal capability is needed but computational resources or latency requirements favor a smaller model over larger alternatives. It competes with other lightweight multimodal models in scenarios requiring efficient image and text processing.

Common Use Cases

Gemma 3 4B is well-suited for applications requiring multimodal processing with efficiency constraints, such as image captioning, visual question answering, document analysis with mixed text and images, and content moderation systems. Its lightweight architecture makes it appropriate for high-volume scenarios where fast inference is prioritized over maximum capability, including mobile applications, edge deployments, or services processing large numbers of image-text pairs. The 131K context window enables processing of longer documents with embedded images while maintaining reasonable computational costs.

Frequently Asked Questions

How much does Gemma 3 4B cost per million tokens?

Gemma 3 4B pricing varies by provider and may differ for text versus image tokens. Check the pricing table above for current rates across all available providers.

What is Gemma 3 4B best used for?

Gemma 3 4B excels at multimodal tasks requiring both text and image processing where efficiency is important, such as image captioning, visual question answering, document analysis, and content classification. Its lightweight architecture makes it ideal for high-volume applications or resource-constrained environments.

Does Gemma 3 4B support function calling or tool use?

No, Gemma 3 4B does not support tool calling or function execution capabilities. It focuses on core text generation and image understanding tasks without external tool integration.