FlagshipNVIDIA

Nemotron 3 Super 120B

Nemotron 3 Super 120B is NVIDIA's flagship 120-billion parameter language model with a 262K token context window for complex text processing tasks.

Context 262K
Tier Flagship
Input from
$0.100 / 1M tokens
across 3 providers

API Pricing

Cheapest on Deep Infra 14% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.100$0.500154 t/s687ms4/4/2026
$0.100$0.500154 t/s687ms4/14/2026
$0.150$0.650154 t/s687ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
NVIDIA
Family
Nemotron
Tier
Flagship
Context Window
262K
Modalities
Text

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Large 262K token context window supports extensive document processing
  • 120 billion parameters provide substantial model capacity
  • Output rate of 164.34 tokens per second for consistent generation speed
  • Developed by NVIDIA with potential optimization for their hardware ecosystem
  • Flagship tier positioning within the Nemotron model family
  • Time to first token of 744ms enables responsive initial output
  • Extended context enables processing of lengthy conversations and documents
  • No tool calling or function execution capabilities
  • Text-only modality limits use cases compared to multimodal alternatives
  • Proprietary model with no open source availability
  • No image, audio, or video input support
  • Limited API features compared to models with structured output modes

Key Features

262,144 token context window
120 billion parameter architecture
Text input and output processing
Streaming response generation
Extended document processing capabilities
High-throughput text generation
Large-scale language understanding
Long-form content generation

About Nemotron 3 Super 120B

Nemotron 3 Super 120B is NVIDIA's flagship language model in the Nemotron family, featuring 120 billion parameters designed for sophisticated text processing and generation tasks. As NVIDIA's top-tier offering in this model line, it represents the company's entry into large-scale language modeling alongside their established GPU and AI infrastructure products. The model operates with a 262,144 token context window, enabling processing of lengthy documents and extended conversations. It focuses exclusively on text modalities and delivers measured performance with an output rate of 164.34 tokens per second and a time to first token of 744 milliseconds according to Artificial Analysis benchmarks. The model does not include tool calling capabilities, positioning it as a pure language processing solution. Nemotron 3 Super 120B serves users requiring substantial language understanding and generation capabilities within NVIDIA's ecosystem. While newer flagship models from other providers have emerged since its release, it remains NVIDIA's primary large language model offering for complex text-based applications where extended context and substantial model capacity are priorities.

Common Use Cases

Nemotron 3 Super 120B is designed for applications requiring extensive language processing capabilities and long context understanding. Its 262K token context window makes it suitable for document analysis, legal document review, academic research processing, and lengthy technical documentation tasks. The model's flagship tier positioning and 120B parameter count enable complex reasoning over extended text, making it appropriate for content summarization, research synthesis, and detailed text analysis workflows. Organizations working within NVIDIA's ecosystem may find it particularly suitable for text-heavy AI applications that benefit from the model's substantial capacity and extended context capabilities.

Frequently Asked Questions

How much does Nemotron 3 Super 120B cost per million tokens?

Nemotron 3 Super 120B pricing varies by provider and usage patterns. Check the pricing table above for current rates across all available providers and pricing tiers.

What is Nemotron 3 Super 120B best used for?

Nemotron 3 Super 120B excels at tasks requiring extensive context understanding and complex text processing. Its 262K token context window makes it ideal for document analysis, research synthesis, legal document review, and processing lengthy technical materials where maintaining context across extended passages is crucial.

Does Nemotron 3 Super 120B support tool calling or multimodal inputs?

No, Nemotron 3 Super 120B focuses exclusively on text processing and does not support tool calling, function execution, or multimodal inputs like images or audio. It is designed as a pure language model for text-based applications requiring substantial context and processing capacity.