FlagshipMeta

Llama 4 Scout

Name: Llama 4 Scout
Availability: InStock
Author: Meta

Llama 4 Scout is Meta's flagship multimodal model with text and image input capabilities, featuring a 327K token context window for complex reasoning tasks.

Context 328K

Tier Flagship

Modalities text, image

Input from

$0.085 / 1M tokens

across 5 providers

Compare Prices Model Page →Paper

API Pricing

Cheapest on Amazon AWS — 32% below avg

Provider	Input / 1M	Output / 1M	Speed	TTFT	Updated
Amazon AWSBatch	$0.085	$0.330	107 t/s	604ms	7/13/2026
OpenRouter	$0.100	$0.300	107 t/s	604ms	7/13/2026
Deep Infra	$0.100	$0.300	107 t/s	604ms	7/13/2026
Groq	$0.110	$0.340	107 t/s	604ms	7/13/2026
Amazon AWS	$0.170	$0.660	107 t/s	604ms	7/13/2026
Together AI	$0.180	$0.590	107 t/s	604ms	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

10.0 / 100

Coding

8.2 / 100

Math

14.0 / 100

Output Speed

107 t/s

Latency (TTFT)

604ms

Reasoning & Knowledge

MMLU-Pro75.2%
GPQA Diamond58.7%
Humanity's Last Exam4.3%

Coding

LiveCodeBench29.9%
SciCode17.0%

Math

AIME 202514.0%
AIME28.3%
MATH-50084.4%

Agentic & Tool Use

Terminal-Bench Hard1.5%
Terminal-Bench v2.13.7%
τ²-bench15.5%
τ-bench Banking3.3%

Instruction & Long Context

IFBench39.5%
Long-Context Reasoning25.8%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Meta
Family: Llama
Tier: Flagship
Context Window: 328K
Modalities: Text, Image

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Large 327,680 token context window supports extensive document processing
Native multimodal support for both text and image inputs
Fast token generation at 133.36 tokens per second
Relatively quick time-to-first-token at 473 milliseconds
Flagship-tier reasoning capabilities across modalities
Suitable for complex document analysis with visual components
Meta's most advanced model architecture to date

Limitations

No tool calling or function execution capabilities
Proprietary model with no open-source weights available
Limited to text and image modalities only
Does not support structured output modes like JSON
Newer model family with less deployment history than established alternatives

Key Features

•327,680 token context window

•Multimodal text and image input processing

•Streaming response generation

•High-speed token generation (133+ tokens/second)

•Cross-modal reasoning capabilities

•Batch processing support

•Vision-language understanding

•Extended context document analysis

About Llama 4 Scout

Llama 4 Scout is Meta's flagship model in the Llama family, representing the company's most capable offering for multimodal AI applications. Unlike previous Llama generations, Scout combines text and image processing capabilities in a single model architecture, positioning it as Meta's answer to other flagship multimodal models in the market. The model features a substantial 327,680 token context window, enabling it to process extensive documents, codebases, and image collections in a single session. Its multimodal architecture allows simultaneous processing of text and visual inputs, making it suitable for tasks requiring understanding of both textual content and visual information. Performance benchmarks show the model generates tokens at 133.36 tokens per second with a time-to-first-token of 473 milliseconds. Llama 4 Scout targets enterprise and research applications requiring sophisticated reasoning across text and visual modalities. As a flagship model, it competes directly with other leading multimodal models, though it notably lacks tool calling capabilities that some competing flagship models provide. The model represents Meta's shift toward more capable, multimodal AI systems while maintaining focus on reasoning and comprehension tasks.

Common Use Cases

Llama 4 Scout is designed for sophisticated multimodal applications requiring deep understanding of both textual and visual content. Its large context window makes it particularly effective for analyzing lengthy documents that include charts, diagrams, or images, such as research papers, technical manuals, or financial reports. The model excels at tasks like visual question answering, document summarization with image content, educational content creation, and complex reasoning across multiple data types. Its flagship-tier capabilities make it suitable for enterprise applications requiring nuanced understanding of mixed-media content, though the absence of tool calling limits its effectiveness for agentic workflows that require external API integration.

Frequently Asked Questions

How much does Llama 4 Scout cost per million tokens?

Llama 4 Scout pricing varies by provider and service type. Check the pricing table above for current rates across all available providers offering this model.

What is Llama 4 Scout best used for?

Llama 4 Scout excels at multimodal tasks requiring understanding of both text and images, such as document analysis with visual components, research paper summarization, educational content creation, and complex reasoning across mixed media. Its large 327K context window makes it particularly effective for processing extensive documents containing charts, diagrams, or other visual elements.

Can Llama 4 Scout call functions or use tools?

No, Llama 4 Scout does not support tool calling or function execution capabilities. It focuses on multimodal reasoning and content generation but cannot integrate with external APIs or execute structured function calls, limiting its use in agentic workflows that require external tool integration.