LightweightZhipu

GLM-4.5 Air

GLM-4.5 Air is Zhipu's lightweight text model with a 128K token context window, optimized for speed with 82.55 tokens/second output.

Context 128K
Tier Lightweight
Tools Supported
Input from
$0.130 / 1M tokens
across 2 providers

API Pricing

Cheapest on OpenRouter 21% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.130$0.85072.0 t/s612ms4/14/2026
$0.200$1.1072.0 t/s612ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Zhipu
Family
GLM
Tier
Lightweight
Context Window
128K
Modalities
Text

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion
Aliases
glm-4-5v

Strengths & Limitations

  • Fast output generation at 82.55 tokens per second
  • 128K token context window for processing long documents
  • Tool calling support with structured interactions
  • Quick response initiation with 644ms time to first token
  • Lightweight architecture optimized for efficiency
  • Part of established GLM model family ecosystem
  • No image or multimodal input support
  • Proprietary model with no open source availability
  • Lightweight tier may limit complex reasoning capabilities
  • Smaller context window compared to some flagship models
  • Limited to text-only chat completion tasks

Key Features

128K token context window
Tool calling functionality
Chat completion API
Text-only input and output
High-speed token generation
Structured response formatting
API-based deployment model

About GLM-4.5 Air

GLM-4.5 Air is a lightweight text model developed by Zhipu as part of their GLM model family. Positioned as an efficient option within the GLM lineup, this model prioritizes speed and responsiveness for applications requiring quick text generation and processing. The model operates with a 128,000 token context window and supports tool calling functionality. Performance benchmarks show an output speed of 82.55 tokens per second with a time to first token of 644 milliseconds. GLM-4.5 Air focuses exclusively on text-based interactions through chat completion capabilities, without multimodal support for images or other input types. As a lightweight model, GLM-4.5 Air targets use cases where response speed and efficiency are prioritized over maximum capability. It competes in the tier of models designed for high-throughput applications rather than complex reasoning tasks that would typically require larger, more capable models.

Common Use Cases

GLM-4.5 Air is designed for applications requiring fast, efficient text processing where response speed is crucial. Its lightweight architecture makes it suitable for high-volume chat applications, customer service automation, content generation workflows, and real-time text analysis tasks. The 128K context window enables document summarization and analysis, while tool calling support allows integration with external systems and APIs. Organizations prioritizing cost efficiency and speed over maximum model capability will find GLM-4.5 Air appropriate for production deployments requiring consistent, quick responses rather than complex reasoning or creative tasks.

Frequently Asked Questions

How much does GLM-4.5 Air cost per million tokens?

GLM-4.5 Air pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is GLM-4.5 Air best used for?

GLM-4.5 Air excels at high-volume text processing tasks requiring fast response times, such as customer service automation, content generation, and real-time chat applications. Its 128K context window and tool calling capabilities make it suitable for document analysis and API integration workflows where efficiency is prioritized over complex reasoning.

How fast is GLM-4.5 Air compared to other lightweight models?

GLM-4.5 Air generates output at 82.55 tokens per second with a 644ms time to first token. This positions it as a speed-optimized option within the lightweight model category, though specific comparisons depend on the particular models and providers being evaluated.