Groq logo

Groq

The fastest AI inference

Inference specialist🇺🇸 USinferencellmfast

Last reviewed Mar 14, 2026

Groq provides ultra-fast LLM inference powered by their custom LPU (Language Processing Unit) hardware, offering the fastest token generation speeds in the industry.

6
LLM Models
$0.05
From / 1M input

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: April 27, 2026

ModelCreatorContextInput/1MOutput/1MUpdated
Meta128K$0.050$0.0804/12/2026
OpenAI128K$0.075$0.3004/27/2026
Meta328K$0.110$0.3404/27/2026
OpenAI128K$0.150$0.6004/27/2026
Alibaba41K$0.290$0.5904/27/2026
Meta128K$0.590$0.7904/27/2026

Pros & Cons

Advantages

  • Fastest inference speeds in the industry (500+ tokens/second)
  • OpenAI-compatible API for easy integration
  • Competitive pricing for open-source models
  • Free tier available for testing

Limitations

  • Limited model selection compared to larger providers
  • Focus on inference only - no training capabilities
  • Newer platform with less ecosystem maturity

Key Features

LPU-Powered Inference

Custom Language Processing Units deliver industry-leading inference speeds

OpenAI-Compatible API

Drop-in replacement for OpenAI API with minimal code changes

Free Tier Available

Generous free tier for experimentation and small projects

Ultra-Low Latency

Sub-second time-to-first-token for interactive applications

Pricing Options

OptionDetails
Pay-per-tokenSimple token-based pricing with separate input/output rates
Free tierRate-limited free access for development and testing

Availability & Support

Regions

Global availability via cloud infrastructure

Support

Documentation, Discord community, email support

Getting Started

  1. 1

    Create an account

    Sign up at console.groq.com with email or OAuth

  2. 2

    Get API key

    Generate an API key from the console dashboard

  3. 3

    Make API calls

    Use the OpenAI-compatible endpoint with your preferred model