LightweightZhipu

GLM-4.6

Name: GLM-4.6
Availability: InStock
Author: Zhipu

GLM-4.6 is Zhipu's lightweight text model with tool calling support and a 128K token context window, optimized for efficient performance.

Context 128K

Tier Lightweight

Tools Supported

Input from

$0.100 / 1M tokens

across 4 providers

Compare Prices

API Pricing

Cheapest on OpenRouter — 80% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
OpenRouter	$0.100	$0.100	-	45.6 t/s	1.9s	5/28/2026
Deep Infra	$0.430	$1.74	$0.080	45.6 t/s	1.9s	5/29/2026
Together AI	$0.600	$2.20	-	45.6 t/s	1.9s	5/29/2026
IO.NET	$0.850	$2.75	$0.300	45.6 t/s	1.9s	5/29/2026

Prices updated daily. Last check: May 29, 2026

Model Details

General

Creator: Zhipu
Family: GLM
Tier: Lightweight
Context Window: 128K
Modalities: Text

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion
Aliases: glm-4-32b

Strengths & Limitations

Strengths

Tool calling support enables API integrations and structured workflows
128K token context window accommodates substantial document processing
28.25 tokens per second output speed for responsive applications
Lightweight design reduces computational requirements compared to flagship models
Chat completion format supports conversational applications
Efficient time to first token at 1,193 milliseconds

Limitations

Text-only modality lacks image or multimodal input support
Proprietary model with no open source availability
Lightweight tier may have reduced reasoning capabilities compared to flagship models
Limited to chat completion format without other generation modes

Key Features

•128K token context window

•Tool calling with API integration

•Chat completion interface

•Text-only input processing

•Streaming response capability

•28.25 tokens/second output speed

•Alias support as glm-4-32b

About GLM-4.6

GLM-4.6 is a lightweight text model developed by Zhipu as part of their GLM family. Positioned as an efficient option within the GLM lineup, this model focuses on delivering capable performance for standard language tasks without the computational overhead of larger flagship models. The model features a 128K token context window and supports tool calling functionality, enabling it to interact with external APIs and structured workflows. GLM-4.6 processes text-only inputs and generates chat completions, with benchmark performance showing 28.25 output tokens per second and a time to first token of 1,193 milliseconds. The model is also known by the alias glm-4-32b. As a lightweight tier model, GLM-4.6 serves applications requiring efficient language processing without the complexity demands that necessitate frontier-class models. It provides a balance between capability and performance for developers building applications that need reliable text generation with tool integration features.

Common Use Cases

GLM-4.6 is well-suited for applications requiring efficient text processing with tool integration capabilities. Its lightweight design makes it appropriate for high-volume chat applications, customer service bots, and automated workflows that need to call external APIs. The 128K context window supports document analysis, content summarization, and multi-turn conversations with substantial history. Organizations building cost-effective language applications that don't require the advanced reasoning of frontier models can leverage GLM-4.6 for reliable performance in production environments where response speed and resource efficiency are priorities.

Frequently Asked Questions

How much does GLM-4.6 cost per million tokens?

GLM-4.6 pricing varies by provider and pricing type. Check the pricing table above for current rates across all providers offering this model.

What is GLM-4.6 best used for?

GLM-4.6 excels at efficient text processing tasks including chat applications, customer service automation, and workflows requiring tool calling capabilities. Its lightweight design and 128K context window make it suitable for document processing and multi-turn conversations where performance efficiency is important.

Does GLM-4.6 support image inputs or multimodal capabilities?

No, GLM-4.6 is a text-only model that supports chat completions but does not process images or other multimodal inputs. It focuses on text generation with tool calling functionality for API integrations.