Gemma 3 4B
Gemma 3 4B is Google's lightweight multimodal model supporting text and image inputs with a 131K token context window.
API Pricing
Cheapest on Amazon AWS — 43% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.020 | $0.040 | 31.1 t/s | 1.2s | 4/14/2026 | |
| $0.040 | $0.080 | 31.1 t/s | 1.2s | 4/4/2026 | |
| $0.040 | $0.080 | 31.1 t/s | 1.2s | 4/14/2026 | |
| $0.040 | $0.080 | 31.1 t/s | 1.2s | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Family
- Gemma
- Tier
- Lightweight
- Context Window
- 131K
- Modalities
- Text, Image
Capabilities
- Tool Calling
- No
- Open Source
- No
Strengths & Limitations
- Supports multimodal inputs including text and images
- Large 131K token context window for a lightweight model
- Fast inference at 30.32 output tokens per second
- Efficient 4B parameter size reduces computational requirements
- Part of Google's established Gemma model family
- Reasonable time to first token at 1095ms
- Suitable for resource-constrained multimodal applications
- No tool calling or function execution capabilities
- Proprietary model - weights not publicly available
- Smaller parameter count limits complex reasoning compared to larger models
- Limited modality support compared to models with audio or video input
Key Features
About Gemma 3 4B
Common Use Cases
Gemma 3 4B is well-suited for applications requiring multimodal processing with efficiency constraints, such as image captioning, visual question answering, document analysis with mixed text and images, and content moderation systems. Its lightweight architecture makes it appropriate for high-volume scenarios where fast inference is prioritized over maximum capability, including mobile applications, edge deployments, or services processing large numbers of image-text pairs. The 131K context window enables processing of longer documents with embedded images while maintaining reasonable computational costs.
Frequently Asked Questions
How much does Gemma 3 4B cost per million tokens?
Gemma 3 4B pricing varies by provider and may differ for text versus image tokens. Check the pricing table above for current rates across all available providers.
What is Gemma 3 4B best used for?
Gemma 3 4B excels at multimodal tasks requiring both text and image processing where efficiency is important, such as image captioning, visual question answering, document analysis, and content classification. Its lightweight architecture makes it ideal for high-volume applications or resource-constrained environments.
Does Gemma 3 4B support function calling or tool use?
No, Gemma 3 4B does not support tool calling or function execution capabilities. It focuses on core text generation and image understanding tasks without external tool integration.