LightweightOpen SourceOpenAI

GPT-OSS-120B

Name: GPT-OSS-120B
Availability: InStock
Author: OpenAI

GPT-OSS-120B is OpenAI's open-source lightweight model with 120 billion parameters, offering fast inference and a 128K token context window for developers.

Context 128K

Tier Lightweight

Knowledge Jan 2025

License Open Source

Input from

$0.039 / 1M tokens

across 10 providers

Compare Prices Model Page →Paper

API Pricing

Cheapest on OpenRouter — 71% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
OpenRouter	$0.039	$0.180	-	354 t/s	500ms	5/28/2026
Amazon AWSBatch	$0.070	$0.300	-	354 t/s	500ms	5/29/2026
Hyperstack	$0.100	$0.400	-	354 t/s	500ms	5/29/2026
Deep Infra	$0.150	$0.600	-	354 t/s	500ms	5/29/2026
Microsoft Azure	$0.150	$0.600	-	354 t/s	500ms	4/30/2026
Amazon AWS	$0.150	$0.600	-	354 t/s	500ms	5/29/2026
Groq	$0.150	$0.600	-	354 t/s	500ms	5/29/2026
Together AI	$0.150	$0.600	-	354 t/s	500ms	5/29/2026
Fireworks AI	$0.150	$0.600	-	354 t/s	500ms	5/29/2026
Scaleway	$0.175	$0.698	$0.175	354 t/s	500ms	5/27/2026
IO.NET	$0.180	$0.610	$0.090	354 t/s	500ms	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: OpenAI
Family: GPT-OSS
Tier: Lightweight
Context Window: 128K
Knowledge Cutoff: Jan 2025
Modalities: Text

Capabilities

Tool Calling: No
Open Source: Yes
Subtypes: Chat Completion
Aliases: GPT OSS Safeguard 120B

Strengths & Limitations

Strengths

Open-source model weights available for local deployment and customization
Fast inference speed at 207.02 output tokens per second
128K token context window supports long document processing
Knowledge cutoff of January 1, 2025 provides recent training data
Lightweight 120B parameter count balances performance with efficiency
No API dependency required for inference
Can be fine-tuned for domain-specific applications

Limitations

No tool calling or function execution capabilities
Text-only modality with no image or multimodal support
Smaller parameter count than frontier models limits complex reasoning
Requires local infrastructure and technical expertise to deploy
Time to first token of 502ms slower than some specialized inference models

Key Features

•128K token context window

•Chat completion interface

•Open-source model weights

•Text-only input and output

•Streaming response capability

•Local deployment support

•Fine-tuning compatibility

•January 2025 knowledge cutoff

About GPT-OSS-120B

GPT-OSS-120B is OpenAI's open-source language model in the GPT-OSS family, positioned as a lightweight alternative to the company's flagship GPT models. With 120 billion parameters, this model represents OpenAI's entry into the open-source model space, making their technology accessible to developers who want to run models locally or modify them for specific use cases. The model features a 128K token context window and focuses on text-only chat completion tasks. It delivers strong inference performance with 207.02 output tokens per second and a time to first token of 502 milliseconds. The model has a knowledge cutoff of January 1, 2025, providing relatively current training data. Unlike OpenAI's proprietary models, GPT-OSS-120B does not include tool calling capabilities, reflecting its streamlined design for core language tasks. GPT-OSS-120B serves applications requiring fast, cost-effective text generation where the full capabilities of frontier models are unnecessary. Its open-source nature allows for local deployment, fine-tuning, and integration into custom applications without API dependencies, making it suitable for organizations with data privacy requirements or those seeking to reduce operational costs while maintaining quality text generation capabilities.

Common Use Cases

GPT-OSS-120B is well-suited for applications requiring fast, reliable text generation without the complexity of tool use or multimodal capabilities. Its open-source nature makes it ideal for organizations needing local deployment for data privacy, custom fine-tuning for domain-specific tasks, or integration into products without API dependencies. Common use cases include content generation, document summarization, customer service chatbots, code commenting, and batch text processing workflows where the 128K context window enables handling of long documents. The model's lightweight design and fast inference make it particularly valuable for high-throughput applications or resource-constrained environments where deploying larger frontier models would be impractical.

Frequently Asked Questions

How much does GPT-OSS-120B cost per million tokens?

GPT-OSS-120B pricing varies by provider and deployment method. Since it's open-source, you can also run it locally without per-token costs. Check the pricing table above for current rates across API providers.

What is GPT-OSS-120B best used for?

GPT-OSS-120B excels at text generation tasks requiring fast inference and long context handling, such as content creation, document summarization, and chatbot applications. Its open-source nature makes it ideal for organizations needing local deployment, custom fine-tuning, or applications with data privacy requirements.

How does GPT-OSS-120B compare to OpenAI's proprietary GPT models?

GPT-OSS-120B trades some advanced capabilities for accessibility and control. Unlike proprietary GPT models, it lacks tool calling and multimodal features but offers open-source weights for local deployment and customization. It's designed for use cases where fast, reliable text generation is needed without the full feature set of frontier models.