LightweightOpen SourceMeta

Llama 4 Maverick 17B

Name: Llama 4 Maverick 17B
Availability: InStock
Author: Meta

Llama 4 Maverick 17B is Meta's lightweight multimodal model supporting text and image inputs with a 128K token context window and tool calling capabilities.

Context 128K

Tier Lightweight

Tools Supported

License Open Source

Modalities text, image

Input from

$0.120 / 1M tokens

across 5 providers

Compare Prices Model Page →Paper

API Pricing

Cheapest on Amazon AWS — 44% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
Amazon AWSBatch	$0.120	$0.485	-	110 t/s	642ms	5/29/2026
Deep Infra	$0.150	$0.600	-	110 t/s	642ms	5/29/2026
OpenRouter	$0.150	$0.600	-	110 t/s	642ms	5/28/2026
Amazon AWS	$0.240	$0.970	-	110 t/s	642ms	5/29/2026
Together AI	$0.270	$0.850	-	110 t/s	642ms	5/29/2026
IO.NET	$0.350	$1.06	$0.175	110 t/s	642ms	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Meta
Family: Llama
Tier: Lightweight
Context Window: 128K
Modalities: Text, Image

Capabilities

Tool Calling: Yes
Open Source: Yes
Subtypes: Chat Completion
Aliases: llama-4-maverick-17b-128e, meta-llama-llama-4-maverick-17b-128e

Strengths & Limitations

Strengths

Open source with full model weights available for customization
Multimodal support for both text and image inputs
Fast inference speed at 114.05 output tokens per second
Low latency with 481ms time to first token
Native tool calling functionality
128K token context window for substantial document processing
Lightweight 17B parameter design for efficient deployment

Limitations

Lightweight tier positioning limits complex reasoning capabilities
Smaller parameter count than flagship models in Llama 4 family
No video or audio input support beyond images
Performance may lag behind larger models on specialized tasks

Key Features

•128,000 token context window

•Text and image input processing

•Tool calling with structured outputs

•Open source model weights

•Chat completion interface

•Streaming response support

•Fast inference optimization

•Multi-provider API compatibility

About Llama 4 Maverick 17B

Llama 4 Maverick 17B is Meta's lightweight tier model in the Llama 4 family, designed to balance capability with efficiency for high-throughput applications. As an open-source model, it provides developers with full access to model weights while maintaining competitive performance in its size class. The model supports multimodal inputs including text and images within a 128,000 token context window. It includes native tool calling functionality and demonstrates strong inference speed with 114.05 output tokens per second and 481ms time to first token. These performance characteristics make it suitable for applications requiring responsive interaction while processing both textual and visual content. Llama 4 Maverick 17B serves applications where developers need multimodal capabilities without the computational overhead of larger models. Its open-source nature allows for fine-tuning and deployment flexibility that proprietary alternatives cannot match, while its 17B parameter count provides a practical balance between model capability and resource requirements.

Common Use Cases

Llama 4 Maverick 17B is designed for applications requiring multimodal processing at scale, such as content moderation systems that analyze both text and images, customer support chatbots handling visual queries, and document processing workflows involving charts and diagrams. Its lightweight architecture makes it suitable for high-volume deployments where cost efficiency matters, while the open-source licensing enables custom fine-tuning for domain-specific applications. The fast inference speed and tool calling capabilities support interactive applications and automated workflows that need to process visual content alongside text with minimal latency.

Frequently Asked Questions

How much does Llama 4 Maverick 17B cost per million tokens?

Llama 4 Maverick 17B pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is Llama 4 Maverick 17B best used for?

Llama 4 Maverick 17B excels at high-volume applications requiring multimodal processing, such as content moderation, customer support with visual elements, and document analysis. Its lightweight design and fast inference make it ideal for interactive applications where response speed matters more than maximum reasoning capability.

Can I fine-tune Llama 4 Maverick 17B for my specific use case?

Yes, Llama 4 Maverick 17B is open source with full model weights available, allowing complete customization through fine-tuning. This makes it suitable for domain-specific applications where proprietary models cannot be modified, though you'll need appropriate computational resources for the fine-tuning process.