FlagshipNVIDIA

Nemotron 3 Super 120B

Name: Nemotron 3 Super 120B
Availability: InStock
Author: NVIDIA

Nemotron 3 Super 120B is NVIDIA's flagship 120-billion parameter language model with a 262K token context window for complex text processing tasks.

Context 262K

Tier Flagship

Input from

$0.090 / 1M tokens

across 3 providers

Compare Prices

API Pricing

Cheapest on OpenRouter — 21% below avg

Provider	Input / 1M	Output / 1M	Speed	TTFT	Updated
OpenRouter	$0.090	$0.450	150 t/s	1.1s	5/28/2026
Deep Infra	$0.100	$0.500	150 t/s	1.1s	5/29/2026
Amazon AWS	$0.150	$0.650	150 t/s	1.1s	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: NVIDIA
Family: Nemotron
Tier: Flagship
Context Window: 262K
Modalities: Text

Capabilities

Tool Calling: No
Open Source: No

Strengths & Limitations

Strengths

Large 262K token context window supports extensive document processing
120 billion parameters provide substantial model capacity
Output rate of 164.34 tokens per second for consistent generation speed
Developed by NVIDIA with potential optimization for their hardware ecosystem
Flagship tier positioning within the Nemotron model family
Time to first token of 744ms enables responsive initial output
Extended context enables processing of lengthy conversations and documents

Limitations

No tool calling or function execution capabilities
Text-only modality limits use cases compared to multimodal alternatives
Proprietary model with no open source availability
No image, audio, or video input support
Limited API features compared to models with structured output modes

Key Features

•262,144 token context window

•120 billion parameter architecture

•Text input and output processing

•Streaming response generation

•Extended document processing capabilities

•High-throughput text generation

•Large-scale language understanding

•Long-form content generation

About Nemotron 3 Super 120B

Nemotron 3 Super 120B is NVIDIA's flagship language model in the Nemotron family, featuring 120 billion parameters designed for sophisticated text processing and generation tasks. As NVIDIA's top-tier offering in this model line, it represents the company's entry into large-scale language modeling alongside their established GPU and AI infrastructure products. The model operates with a 262,144 token context window, enabling processing of lengthy documents and extended conversations. It focuses exclusively on text modalities and delivers measured performance with an output rate of 164.34 tokens per second and a time to first token of 744 milliseconds according to Artificial Analysis benchmarks. The model does not include tool calling capabilities, positioning it as a pure language processing solution. Nemotron 3 Super 120B serves users requiring substantial language understanding and generation capabilities within NVIDIA's ecosystem. While newer flagship models from other providers have emerged since its release, it remains NVIDIA's primary large language model offering for complex text-based applications where extended context and substantial model capacity are priorities.

Common Use Cases

Nemotron 3 Super 120B is designed for applications requiring extensive language processing capabilities and long context understanding. Its 262K token context window makes it suitable for document analysis, legal document review, academic research processing, and lengthy technical documentation tasks. The model's flagship tier positioning and 120B parameter count enable complex reasoning over extended text, making it appropriate for content summarization, research synthesis, and detailed text analysis workflows. Organizations working within NVIDIA's ecosystem may find it particularly suitable for text-heavy AI applications that benefit from the model's substantial capacity and extended context capabilities.

Frequently Asked Questions

How much does Nemotron 3 Super 120B cost per million tokens?

Nemotron 3 Super 120B pricing varies by provider and usage patterns. Check the pricing table above for current rates across all available providers and pricing tiers.

What is Nemotron 3 Super 120B best used for?

Nemotron 3 Super 120B excels at tasks requiring extensive context understanding and complex text processing. Its 262K token context window makes it ideal for document analysis, research synthesis, legal document review, and processing lengthy technical materials where maintaining context across extended passages is crucial.

Does Nemotron 3 Super 120B support tool calling or multimodal inputs?

No, Nemotron 3 Super 120B focuses exclusively on text processing and does not support tool calling, function execution, or multimodal inputs like images or audio. It is designed as a pure language model for text-based applications requiring substantial context and processing capacity.