What GPU types does Groq offer?

Groq offers various GPU types including . Check the pricing table above for current availability and pricing.

How do I get started with Groq?

Create an account, Get API key, Make API calls

What are Groq's main advantages?

Groq's main advantages include: Fastest inference speeds in the industry (500+ tokens/second), OpenAI-compatible API for easy integration, Competitive pricing for open-source models, Free tier available for testing.

What are Groq's limitations?

Groq's main limitations include: Limited model selection compared to larger providers, Focus on inference only - no training capabilities, Newer platform with less ecosystem maturity.

Groq

The fastest AI inference

Inference specialist🇺🇸 USinferencellmfast

Last reviewed Mar 14, 2026

Groq provides ultra-fast LLM inference powered by their custom LPU (Language Processing Unit) hardware, offering the fastest token generation speeds in the industry.

Visit Groq Documentation

LLM Models

$0.05

From / 1M input

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: April 27, 2026

Model	Creator	Context	Input/1M	Output/1M	Updated
Llama 3.1 8B	Meta	128K	$0.050	$0.080	4/12/2026
GPT-OSS-20B	OpenAI	128K	$0.075	$0.300	4/27/2026
Llama 4 Scout	Meta	328K	$0.110	$0.340	4/27/2026
GPT-OSS-120B	OpenAI	128K	$0.150	$0.600	4/27/2026
Qwen 3 32B	Alibaba	41K	$0.290	$0.590	4/27/2026
Llama 3.3 70B	Meta	128K	$0.590	$0.790	4/27/2026

Pros & Cons

Advantages

Fastest inference speeds in the industry (500+ tokens/second)
OpenAI-compatible API for easy integration
Competitive pricing for open-source models
Free tier available for testing

Limitations

Limited model selection compared to larger providers
Focus on inference only - no training capabilities
Newer platform with less ecosystem maturity

Key Features

LPU-Powered Inference

Custom Language Processing Units deliver industry-leading inference speeds

OpenAI-Compatible API

Drop-in replacement for OpenAI API with minimal code changes

Free Tier Available

Generous free tier for experimentation and small projects

Ultra-Low Latency

Sub-second time-to-first-token for interactive applications

Pricing Options

Option	Details
Pay-per-token	Simple token-based pricing with separate input/output rates
Free tier	Rate-limited free access for development and testing

Availability & Support

Regions

Global availability via cloud infrastructure

Support

Documentation, Discord community, email support

Getting Started

1
Create an account
Sign up at console.groq.com with email or OAuth
2
Get API key
Generate an API key from the console dashboard
3
Make API calls
Use the OpenAI-compatible endpoint with your preferred model