
Groq
The fastest AI inference
Last reviewed Mar 14, 2026
Groq provides ultra-fast LLM inference powered by their custom LPU (Language Processing Unit) hardware, offering the fastest token generation speeds in the industry.
LLM API Pricing
Pay-per-token pricing. Prices shown per 1M tokens.
Prices last updated: April 27, 2026
| Model | Creator | Context | Input/1M | Output/1M | Updated |
|---|---|---|---|---|---|
| Meta | 128K | $0.050 | $0.080 | 4/12/2026 | |
| OpenAI | 128K | $0.075 | $0.300 | 4/27/2026 | |
| Meta | 328K | $0.110 | $0.340 | 4/27/2026 | |
| OpenAI | 128K | $0.150 | $0.600 | 4/27/2026 | |
| Alibaba | 41K | $0.290 | $0.590 | 4/27/2026 | |
| Meta | 128K | $0.590 | $0.790 | 4/27/2026 |
Pros & Cons
Advantages
- Fastest inference speeds in the industry (500+ tokens/second)
- OpenAI-compatible API for easy integration
- Competitive pricing for open-source models
- Free tier available for testing
Limitations
- Limited model selection compared to larger providers
- Focus on inference only - no training capabilities
- Newer platform with less ecosystem maturity
Key Features
LPU-Powered Inference
Custom Language Processing Units deliver industry-leading inference speeds
OpenAI-Compatible API
Drop-in replacement for OpenAI API with minimal code changes
Free Tier Available
Generous free tier for experimentation and small projects
Ultra-Low Latency
Sub-second time-to-first-token for interactive applications
Pricing Options
| Option | Details |
|---|---|
| Pay-per-token | Simple token-based pricing with separate input/output rates |
| Free tier | Rate-limited free access for development and testing |
Availability & Support
Regions
Global availability via cloud infrastructure
Support
Documentation, Discord community, email support
Getting Started
- 1
Create an account
Sign up at console.groq.com with email or OAuth
- 2
Get API key
Generate an API key from the console dashboard
- 3
Make API calls
Use the OpenAI-compatible endpoint with your preferred model