LightweightOpen SourceOpenAI

GPT-OSS-120B

GPT-OSS-120B is OpenAI's open-source lightweight model with 120 billion parameters, offering fast inference and a 128K token context window for developers.

Context 128K
Tier Lightweight
Knowledge Jan 2025
License Open Source
Input from
$0.039 / 1M tokens
across 8 providers

API Pricing

Cheapest on OpenRouter 67% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.039$0.190212 t/s535ms4/14/2026
$0.075$0.300212 t/s535ms4/14/2026
$0.090$0.360212 t/s535ms4/3/2026
$0.100$0.400212 t/s535ms4/1/2026
$0.150$0.600212 t/s535ms4/14/2026
$0.150$0.600212 t/s535ms4/14/2026
$0.150$0.600212 t/s535ms4/14/2026
$0.150$0.600212 t/s535ms4/11/2026
$0.176$0.703212 t/s535ms4/13/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
OpenAI
Family
GPT-OSS
Tier
Lightweight
Context Window
128K
Knowledge Cutoff
Jan 2025
Modalities
Text

Capabilities

Tool Calling
No
Open Source
Yes
Subtypes
Chat Completion

Strengths & Limitations

  • Open-source model weights available for local deployment and customization
  • Fast inference speed at 207.02 output tokens per second
  • 128K token context window supports long document processing
  • Knowledge cutoff of January 1, 2025 provides recent training data
  • Lightweight 120B parameter count balances performance with efficiency
  • No API dependency required for inference
  • Can be fine-tuned for domain-specific applications
  • No tool calling or function execution capabilities
  • Text-only modality with no image or multimodal support
  • Smaller parameter count than frontier models limits complex reasoning
  • Requires local infrastructure and technical expertise to deploy
  • Time to first token of 502ms slower than some specialized inference models

Key Features

128K token context window
Chat completion interface
Open-source model weights
Text-only input and output
Streaming response capability
Local deployment support
Fine-tuning compatibility
January 2025 knowledge cutoff

About GPT-OSS-120B

GPT-OSS-120B is OpenAI's open-source language model in the GPT-OSS family, positioned as a lightweight alternative to the company's flagship GPT models. With 120 billion parameters, this model represents OpenAI's entry into the open-source model space, making their technology accessible to developers who want to run models locally or modify them for specific use cases. The model features a 128K token context window and focuses on text-only chat completion tasks. It delivers strong inference performance with 207.02 output tokens per second and a time to first token of 502 milliseconds. The model has a knowledge cutoff of January 1, 2025, providing relatively current training data. Unlike OpenAI's proprietary models, GPT-OSS-120B does not include tool calling capabilities, reflecting its streamlined design for core language tasks. GPT-OSS-120B serves applications requiring fast, cost-effective text generation where the full capabilities of frontier models are unnecessary. Its open-source nature allows for local deployment, fine-tuning, and integration into custom applications without API dependencies, making it suitable for organizations with data privacy requirements or those seeking to reduce operational costs while maintaining quality text generation capabilities.

Common Use Cases

GPT-OSS-120B is well-suited for applications requiring fast, reliable text generation without the complexity of tool use or multimodal capabilities. Its open-source nature makes it ideal for organizations needing local deployment for data privacy, custom fine-tuning for domain-specific tasks, or integration into products without API dependencies. Common use cases include content generation, document summarization, customer service chatbots, code commenting, and batch text processing workflows where the 128K context window enables handling of long documents. The model's lightweight design and fast inference make it particularly valuable for high-throughput applications or resource-constrained environments where deploying larger frontier models would be impractical.

Frequently Asked Questions

How much does GPT-OSS-120B cost per million tokens?

GPT-OSS-120B pricing varies by provider and deployment method. Since it's open-source, you can also run it locally without per-token costs. Check the pricing table above for current rates across API providers.

What is GPT-OSS-120B best used for?

GPT-OSS-120B excels at text generation tasks requiring fast inference and long context handling, such as content creation, document summarization, and chatbot applications. Its open-source nature makes it ideal for organizations needing local deployment, custom fine-tuning, or applications with data privacy requirements.

How does GPT-OSS-120B compare to OpenAI's proprietary GPT models?

GPT-OSS-120B trades some advanced capabilities for accessibility and control. Unlike proprietary GPT models, it lacks tool calling and multimodal features but offers open-source weights for local deployment and customization. It's designed for use cases where fast, reliable text generation is needed without the full feature set of frontier models.