Llama 3.1 405B
Llama 3.1 405B is Meta's flagship open-source language model with 405 billion parameters, offering advanced reasoning and coding capabilities with a 128K token context window.
API Pricing
Cheapest on OpenRouter — 63% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $1.00 | $1.00 | 32.5 t/s | 733ms | 4/14/2026 | |
| $1.20 | $1.20 | 32.5 t/s | 733ms | 4/14/2026 | |
| $3.50 | $3.50 | 32.5 t/s | 733ms | 4/14/2026 | |
| $5.00 | $16.00 | 32.5 t/s | 733ms | 3/30/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Meta
- Family
- Llama
- Tier
- Flagship
- Context Window
- 128K
- Knowledge Cutoff
- Dec 2023
- Modalities
- Text
Capabilities
- Tool Calling
- Yes
- Open Source
- Yes
- Subtypes
- Chat Completion, Code Generation
Strengths & Limitations
- Open-source with full model weights available for download and customization
- 405 billion parameters make it one of the largest publicly available language models
- 128,000 token context window supports long-form conversations and documents
- Tool calling support enables function execution and structured interactions
- Strong performance on coding and mathematical reasoning tasks
- Can be deployed locally for complete data privacy and control
- No usage restrictions for research and commercial applications
- Text-only input - no support for images, audio, or other modalities
- Knowledge cutoff of December 2023 is older than some competing models
- Requires significant computational resources due to 405B parameter size
- Inference speed of 33.9 tokens/second is slower than some optimized alternatives
- 689ms time to first token is higher latency than faster commercial models
Key Features
About Llama 3.1 405B
Common Use Cases
Llama 3.1 405B is suited for organizations and researchers requiring maximum open-source language model capability. Its large parameter count and open nature make it ideal for complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications where model transparency is important. The model excels in scenarios requiring fine-tuning for specialized domains, local deployment for data privacy, or applications where avoiding vendor lock-in is critical. Its tool calling capabilities enable sophisticated agent workflows, while the 128K context window supports long-form document analysis and multi-turn conversations.
Frequently Asked Questions
How much does Llama 3.1 405B cost per million tokens?
Llama 3.1 405B pricing varies significantly by provider, with some offering it through cloud APIs and others providing compute-based pricing for self-hosting. Check the pricing table above for current rates across all providers offering this model.
What is Llama 3.1 405B best used for?
Llama 3.1 405B excels at complex reasoning tasks, advanced code generation, mathematical problem solving, and research applications. Its open-source nature makes it particularly valuable for organizations requiring model customization, local deployment, or full control over their AI infrastructure.
How does Llama 3.1 405B compare to other models in the Llama family?
As the flagship model in the Llama 3.1 family, the 405B variant offers the highest capability with 405 billion parameters compared to the smaller 8B and 70B versions. It provides superior performance on complex tasks but requires significantly more computational resources for inference and fine-tuning.