LightweightZhipu

GLM-4.5 Air

Name: GLM-4.5 Air
Availability: InStock
Author: Zhipu

GLM-4.5 Air is Zhipu's lightweight text model with a 128K token context window, optimized for speed with 82.55 tokens/second output.

Context 128K

Tier Lightweight

Tools Supported

Input from

$0.130 / 1M tokens

across 3 providers

Compare Prices

API Pricing

Cheapest on OpenRouter — 20% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
OpenRouter	$0.130	$0.850	$0.025	67.8 t/s	1.3s	7/13/2026
IO.NET	$0.157	$0.937	$0.078	67.8 t/s	1.3s	7/13/2026
Together AI	$0.200	$1.10	-	67.8 t/s	1.3s	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

16.5 / 100

Math

80.7 / 100

Output Speed

67.8 t/s

Latency (TTFT)

1.3s

Reasoning & Knowledge

MMLU-Pro81.5%
GPQA Diamond73.3%
Humanity's Last Exam6.8%

Coding

LiveCodeBench68.4%
SciCode30.6%

Math

AIME 202580.7%
AIME67.3%
MATH-50096.5%

Agentic & Tool Use

Terminal-Bench Hard20.5%
τ²-bench46.5%

Instruction & Long Context

IFBench37.6%
Long-Context Reasoning43.7%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Zhipu
Family: GLM
Tier: Lightweight
Context Window: 128K
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion
Aliases: glm-4-5v

Strengths & Limitations

Strengths

Fast output generation at 82.55 tokens per second
128K token context window for processing long documents
Tool calling support with structured interactions
Quick response initiation with 644ms time to first token
Lightweight architecture optimized for efficiency
Part of established GLM model family ecosystem

Limitations

No image or multimodal input support
Proprietary model with no open source availability
Lightweight tier may limit complex reasoning capabilities
Smaller context window compared to some flagship models
Limited to text-only chat completion tasks

Key Features

•128K token context window

•Tool calling functionality

•Chat completion API

•Text-only input and output

•High-speed token generation

•Structured response formatting

•API-based deployment model

About GLM-4.5 Air

GLM-4.5 Air is a lightweight text model developed by Zhipu as part of their GLM model family. Positioned as an efficient option within the GLM lineup, this model prioritizes speed and responsiveness for applications requiring quick text generation and processing. The model operates with a 128,000 token context window and supports tool calling functionality. Performance benchmarks show an output speed of 82.55 tokens per second with a time to first token of 644 milliseconds. GLM-4.5 Air focuses exclusively on text-based interactions through chat completion capabilities, without multimodal support for images or other input types. As a lightweight model, GLM-4.5 Air targets use cases where response speed and efficiency are prioritized over maximum capability. It competes in the tier of models designed for high-throughput applications rather than complex reasoning tasks that would typically require larger, more capable models.

Common Use Cases

GLM-4.5 Air is designed for applications requiring fast, efficient text processing where response speed is crucial. Its lightweight architecture makes it suitable for high-volume chat applications, customer service automation, content generation workflows, and real-time text analysis tasks. The 128K context window enables document summarization and analysis, while tool calling support allows integration with external systems and APIs. Organizations prioritizing cost efficiency and speed over maximum model capability will find GLM-4.5 Air appropriate for production deployments requiring consistent, quick responses rather than complex reasoning or creative tasks.

Frequently Asked Questions

How much does GLM-4.5 Air cost per million tokens?

GLM-4.5 Air pricing varies by provider and pricing type (standard vs batch). Check the pricing table above for current rates across all providers.

What is GLM-4.5 Air best used for?

GLM-4.5 Air excels at high-volume text processing tasks requiring fast response times, such as customer service automation, content generation, and real-time chat applications. Its 128K context window and tool calling capabilities make it suitable for document analysis and API integration workflows where efficiency is prioritized over complex reasoning.

How fast is GLM-4.5 Air compared to other lightweight models?

GLM-4.5 Air generates output at 82.55 tokens per second with a 644ms time to first token. This positions it as a speed-optimized option within the lightweight model category, though specific comparisons depend on the particular models and providers being evaluated.