GPT-OSS-120B
GPT-OSS-120B is OpenAI's open-source lightweight model with 120 billion parameters, offering fast inference and a 128K token context window for developers.
API Pricing
Cheapest on OpenRouter — 67% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.039 | $0.190 | 212 t/s | 535ms | 4/14/2026 | |
| $0.075 | $0.300 | 212 t/s | 535ms | 4/14/2026 | |
| $0.090 | $0.360 | 212 t/s | 535ms | 4/3/2026 | |
| $0.100 | $0.400 | 212 t/s | 535ms | 4/1/2026 | |
| $0.150 | $0.600 | 212 t/s | 535ms | 4/14/2026 | |
| $0.150 | $0.600 | 212 t/s | 535ms | 4/14/2026 | |
| $0.150 | $0.600 | 212 t/s | 535ms | 4/14/2026 | |
| $0.150 | $0.600 | 212 t/s | 535ms | 4/11/2026 | |
| $0.176 | $0.703 | 212 t/s | 535ms | 4/13/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- OpenAI
- Family
- GPT-OSS
- Tier
- Lightweight
- Context Window
- 128K
- Knowledge Cutoff
- Jan 2025
- Modalities
- Text
Capabilities
- Tool Calling
- No
- Open Source
- Yes
- Subtypes
- Chat Completion
Strengths & Limitations
- Open-source model weights available for local deployment and customization
- Fast inference speed at 207.02 output tokens per second
- 128K token context window supports long document processing
- Knowledge cutoff of January 1, 2025 provides recent training data
- Lightweight 120B parameter count balances performance with efficiency
- No API dependency required for inference
- Can be fine-tuned for domain-specific applications
- No tool calling or function execution capabilities
- Text-only modality with no image or multimodal support
- Smaller parameter count than frontier models limits complex reasoning
- Requires local infrastructure and technical expertise to deploy
- Time to first token of 502ms slower than some specialized inference models
Key Features
About GPT-OSS-120B
Common Use Cases
GPT-OSS-120B is well-suited for applications requiring fast, reliable text generation without the complexity of tool use or multimodal capabilities. Its open-source nature makes it ideal for organizations needing local deployment for data privacy, custom fine-tuning for domain-specific tasks, or integration into products without API dependencies. Common use cases include content generation, document summarization, customer service chatbots, code commenting, and batch text processing workflows where the 128K context window enables handling of long documents. The model's lightweight design and fast inference make it particularly valuable for high-throughput applications or resource-constrained environments where deploying larger frontier models would be impractical.
Frequently Asked Questions
How much does GPT-OSS-120B cost per million tokens?
GPT-OSS-120B pricing varies by provider and deployment method. Since it's open-source, you can also run it locally without per-token costs. Check the pricing table above for current rates across API providers.
What is GPT-OSS-120B best used for?
GPT-OSS-120B excels at text generation tasks requiring fast inference and long context handling, such as content creation, document summarization, and chatbot applications. Its open-source nature makes it ideal for organizations needing local deployment, custom fine-tuning, or applications with data privacy requirements.
How does GPT-OSS-120B compare to OpenAI's proprietary GPT models?
GPT-OSS-120B trades some advanced capabilities for accessibility and control. Unlike proprietary GPT models, it lacks tool calling and multimodal features but offers open-source weights for local deployment and customization. It's designed for use cases where fast, reliable text generation is needed without the full feature set of frontier models.