Llama 4 Maverick 17B
Llama 4 Maverick 17B is Meta's lightweight multimodal model supporting text and image inputs with a 128K token context window and tool calling capabilities.
API Pricing
Cheapest on Amazon AWS — 35% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.120 | $0.485 | 118 t/s | 641ms | 4/14/2026 | |
| $0.150 | $0.600 | 118 t/s | 641ms | 4/14/2026 | |
| $0.150 | $0.600 | 118 t/s | 641ms | 4/4/2026 | |
| $0.240 | $0.970 | 118 t/s | 641ms | 4/14/2026 | |
| $0.270 | $0.850 | 118 t/s | 641ms | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Meta
- Family
- Llama
- Tier
- Lightweight
- Context Window
- 128K
- Modalities
- Text, Image
Capabilities
- Tool Calling
- Yes
- Open Source
- Yes
- Subtypes
- Chat Completion
- Aliases
- llama-4-maverick-17b-128e, meta-llama-llama-4-maverick-17b-128e
Strengths & Limitations
- Open source with full model weights available for customization
- Multimodal support for both text and image inputs
- Fast inference speed at 114.05 output tokens per second
- Low latency with 481ms time to first token
- Native tool calling functionality
- 128K token context window for substantial document processing
- Lightweight 17B parameter design for efficient deployment
- Lightweight tier positioning limits complex reasoning capabilities
- Smaller parameter count than flagship models in Llama 4 family
- No video or audio input support beyond images
- Performance may lag behind larger models on specialized tasks
Key Features
About Llama 4 Maverick 17B
Common Use Cases
Llama 4 Maverick 17B is designed for applications requiring multimodal processing at scale, such as content moderation systems that analyze both text and images, customer support chatbots handling visual queries, and document processing workflows involving charts and diagrams. Its lightweight architecture makes it suitable for high-volume deployments where cost efficiency matters, while the open-source licensing enables custom fine-tuning for domain-specific applications. The fast inference speed and tool calling capabilities support interactive applications and automated workflows that need to process visual content alongside text with minimal latency.
Frequently Asked Questions
How much does Llama 4 Maverick 17B cost per million tokens?
Llama 4 Maverick 17B pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.
What is Llama 4 Maverick 17B best used for?
Llama 4 Maverick 17B excels at high-volume applications requiring multimodal processing, such as content moderation, customer support with visual elements, and document analysis. Its lightweight design and fast inference make it ideal for interactive applications where response speed matters more than maximum reasoning capability.
Can I fine-tune Llama 4 Maverick 17B for my specific use case?
Yes, Llama 4 Maverick 17B is open source with full model weights available, allowing complete customization through fine-tuning. This makes it suitable for domain-specific applications where proprietary models cannot be modified, though you'll need appropriate computational resources for the fine-tuning process.