GLM-4.5 Air
GLM-4.5 Air is Zhipu's lightweight text model with a 128K token context window, optimized for speed with 82.55 tokens/second output.
API Pricing
Cheapest on OpenRouter — 21% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.130 | $0.850 | 72.0 t/s | 612ms | 4/14/2026 | |
| $0.200 | $1.10 | 72.0 t/s | 612ms | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Zhipu
- Family
- GLM
- Tier
- Lightweight
- Context Window
- 128K
- Modalities
- Text
Capabilities
- Tool Calling
- Yes
- Open Source
- No
- Subtypes
- Chat Completion
- Aliases
- glm-4-5v
Strengths & Limitations
- Fast output generation at 82.55 tokens per second
- 128K token context window for processing long documents
- Tool calling support with structured interactions
- Quick response initiation with 644ms time to first token
- Lightweight architecture optimized for efficiency
- Part of established GLM model family ecosystem
- No image or multimodal input support
- Proprietary model with no open source availability
- Lightweight tier may limit complex reasoning capabilities
- Smaller context window compared to some flagship models
- Limited to text-only chat completion tasks
Key Features
About GLM-4.5 Air
Common Use Cases
GLM-4.5 Air is designed for applications requiring fast, efficient text processing where response speed is crucial. Its lightweight architecture makes it suitable for high-volume chat applications, customer service automation, content generation workflows, and real-time text analysis tasks. The 128K context window enables document summarization and analysis, while tool calling support allows integration with external systems and APIs. Organizations prioritizing cost efficiency and speed over maximum model capability will find GLM-4.5 Air appropriate for production deployments requiring consistent, quick responses rather than complex reasoning or creative tasks.
Frequently Asked Questions
How much does GLM-4.5 Air cost per million tokens?
GLM-4.5 Air pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.
What is GLM-4.5 Air best used for?
GLM-4.5 Air excels at high-volume text processing tasks requiring fast response times, such as customer service automation, content generation, and real-time chat applications. Its 128K context window and tool calling capabilities make it suitable for document analysis and API integration workflows where efficiency is prioritized over complex reasoning.
How fast is GLM-4.5 Air compared to other lightweight models?
GLM-4.5 Air generates output at 82.55 tokens per second with a 644ms time to first token. This positions it as a speed-optimized option within the lightweight model category, though specific comparisons depend on the particular models and providers being evaluated.