FlagshipOpen SourceMeta

Llama 3.1 405B

Llama 3.1 405B is Meta's flagship open-source language model with 405 billion parameters, offering advanced reasoning and coding capabilities with a 128K token context window.

Context 128K
Tier Flagship
Knowledge Dec 2023
Tools Supported
License Open Source
Input from
$1.00 / 1M tokens
across 4 providers

API Pricing

Cheapest on OpenRouter 63% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$1.00$1.0032.5 t/s733ms4/14/2026
$1.20$1.2032.5 t/s733ms4/14/2026
$3.50$3.5032.5 t/s733ms4/14/2026
$5.00$16.0032.5 t/s733ms3/30/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Meta
Family
Llama
Tier
Flagship
Context Window
128K
Knowledge Cutoff
Dec 2023
Modalities
Text

Capabilities

Tool Calling
Yes
Open Source
Yes
Subtypes
Chat Completion, Code Generation

Strengths & Limitations

  • Open-source with full model weights available for download and customization
  • 405 billion parameters make it one of the largest publicly available language models
  • 128,000 token context window supports long-form conversations and documents
  • Tool calling support enables function execution and structured interactions
  • Strong performance on coding and mathematical reasoning tasks
  • Can be deployed locally for complete data privacy and control
  • No usage restrictions for research and commercial applications
  • Text-only input - no support for images, audio, or other modalities
  • Knowledge cutoff of December 2023 is older than some competing models
  • Requires significant computational resources due to 405B parameter size
  • Inference speed of 33.9 tokens/second is slower than some optimized alternatives
  • 689ms time to first token is higher latency than faster commercial models

Key Features

405 billion parameter architecture
128,000 token context window
Tool calling with structured output
Open-source model weights and architecture
Chat completion interface
Code generation capabilities
Streaming response support
Fine-tuning compatibility

About Llama 3.1 405B

Llama 3.1 405B is Meta's largest and most capable model in the Llama 3.1 family, representing the flagship tier of their open-source language model lineup. With 405 billion parameters, it stands as one of the largest openly available language models, competing directly with proprietary frontier models from other providers. The model features a 128,000 token context window and supports text-based interactions including chat completion and code generation. It includes tool calling capabilities and demonstrates strong performance across reasoning, mathematics, and programming tasks. The model's knowledge cutoff is December 2023, and it processes text at approximately 33.9 tokens per second with a time to first token of 689 milliseconds. Llama 3.1 405B is designed for applications requiring maximum capability within the open-source ecosystem, including complex reasoning tasks, advanced code generation, and research applications. Its open-source nature allows for fine-tuning, local deployment, and full control over the model, distinguishing it from proprietary alternatives while offering competitive performance on many benchmarks.

Common Use Cases

Llama 3.1 405B is suited for organizations and researchers requiring maximum open-source language model capability. Its large parameter count and open nature make it ideal for complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications where model transparency is important. The model excels in scenarios requiring fine-tuning for specialized domains, local deployment for data privacy, or applications where avoiding vendor lock-in is critical. Its tool calling capabilities enable sophisticated agent workflows, while the 128K context window supports long-form document analysis and multi-turn conversations.

Frequently Asked Questions

How much does Llama 3.1 405B cost per million tokens?

Llama 3.1 405B pricing varies significantly by provider, with some offering it through cloud APIs and others providing compute-based pricing for self-hosting. Check the pricing table above for current rates across all providers offering this model.

What is Llama 3.1 405B best used for?

Llama 3.1 405B excels at complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications. Its open-source nature makes it particularly valuable for organizations requiring model customization, local deployment, or full control over their AI infrastructure.

How does Llama 3.1 405B compare to other models in the Llama family?

As the flagship model in the Llama 3.1 family, the 405B variant offers the highest capability with 405 billion parameters compared to the smaller 8B and 70B versions. It provides superior performance on complex tasks but requires significantly more computational resources for inference and fine-tuning.