FlagshipOpen SourceMeta

Llama 3.1 405B

Name: Llama 3.1 405B
Availability: InStock
Author: Meta

Llama 3.1 405B is Meta's flagship open-source language model with 405 billion parameters, offering advanced reasoning and coding capabilities with a 128K token context window.

Context 128K

Tier Flagship

Knowledge Dec 2023

Tools Supported

License Open Source

Input from

$1.00 / 1M tokens

across 4 providers

Compare Prices Model Page →Paper

API Pricing

Cheapest on Deep Infra — 40% below avg

Provider	Input / 1M	Output / 1M	Speed	TTFT	Updated
Deep Infra	$1.00	$1.00	38.9 t/s	762ms	5/29/2026
OpenRouter	$1.00	$1.00	38.9 t/s	762ms	5/28/2026
Amazon AWSBatch	$1.20	$1.20	38.9 t/s	762ms	5/29/2026
Together AI	$3.50	$3.50	38.9 t/s	762ms	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Meta
Family: Llama
Tier: Flagship
Context Window: 128K
Knowledge Cutoff: Dec 2023
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: Yes
Subtypes: Chat Completion, Code Generation

Strengths & Limitations

Strengths

Open-source with full model weights available for download and customization
405 billion parameters make it one of the largest publicly available language models
128,000 token context window supports long-form conversations and documents
Tool calling support enables function execution and structured interactions
Strong performance on coding and mathematical reasoning tasks
Can be deployed locally for complete data privacy and control
No usage restrictions for research and commercial applications

Limitations

Text-only input - no support for images, audio, or other modalities
Knowledge cutoff of December 2023 is older than some competing models
Requires significant computational resources due to 405B parameter size
Inference speed of 33.9 tokens/second is slower than some optimized alternatives
689ms time to first token is higher latency than faster commercial models

Key Features

•405 billion parameter architecture

•128,000 token context window

•Tool calling with structured output

•Open-source model weights and architecture

•Chat completion interface

•Code generation capabilities

•Streaming response support

•Fine-tuning compatibility

About Llama 3.1 405B

Llama 3.1 405B is Meta's largest and most capable model in the Llama 3.1 family, representing the flagship tier of their open-source language model lineup. With 405 billion parameters, it stands as one of the largest openly available language models, competing directly with proprietary frontier models from other providers. The model features a 128,000 token context window and supports text-based interactions including chat completion and code generation. It includes tool calling capabilities and demonstrates strong performance across reasoning, mathematics, and programming tasks. The model's knowledge cutoff is December 2023, and it processes text at approximately 33.9 tokens per second with a time to first token of 689 milliseconds. Llama 3.1 405B is designed for applications requiring maximum capability within the open-source ecosystem, including complex reasoning tasks, advanced code generation, and research applications. Its open-source nature allows for fine-tuning, local deployment, and full control over the model, distinguishing it from proprietary alternatives while offering competitive performance on many benchmarks.

Common Use Cases

Llama 3.1 405B is suited for organizations and researchers requiring maximum open-source language model capability. Its large parameter count and open nature make it ideal for complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications where model transparency is important. The model excels in scenarios requiring fine-tuning for specialized domains, local deployment for data privacy, or applications where avoiding vendor lock-in is critical. Its tool calling capabilities enable sophisticated agent workflows, while the 128K context window supports long-form document analysis and multi-turn conversations.

Frequently Asked Questions

How much does Llama 3.1 405B cost per million tokens?

Llama 3.1 405B pricing varies significantly by provider, with some offering it through cloud APIs and others providing compute-based pricing for self-hosting. Check the pricing table above for current rates across all providers offering this model.

What is Llama 3.1 405B best used for?

Llama 3.1 405B excels at complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications. Its open-source nature makes it particularly valuable for organizations requiring model customization, local deployment, or full control over their AI infrastructure.

How does Llama 3.1 405B compare to other models in the Llama family?

As the flagship model in the Llama 3.1 family, the 405B variant offers the highest capability with 405 billion parameters compared to the smaller 8B and 70B versions. It provides superior performance on complex tasks but requires significantly more computational resources for inference and fine-tuning.