LightweightZhipu

GLM-4.6

GLM-4.6 is Zhipu's lightweight text model with tool calling support and a 128K token context window, optimized for efficient performance.

Context 128K
Tier Lightweight
Tools Supported
Input from
$0.100 / 1M tokens
across 2 providers

API Pricing

Cheapest on OpenRouter 71% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.100$0.10027.3 t/s1.2s4/14/2026
$0.600$2.2027.3 t/s1.2s4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Zhipu
Family
GLM
Tier
Lightweight
Context Window
128K
Modalities
Text

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion
Aliases
glm-4-32b

Strengths & Limitations

  • Tool calling support enables API integrations and structured workflows
  • 128K token context window accommodates substantial document processing
  • 28.25 tokens per second output speed for responsive applications
  • Lightweight design reduces computational requirements compared to flagship models
  • Chat completion format supports conversational applications
  • Efficient time to first token at 1,193 milliseconds
  • Text-only modality lacks image or multimodal input support
  • Proprietary model with no open source availability
  • Lightweight tier may have reduced reasoning capabilities compared to flagship models
  • Limited to chat completion format without other generation modes

Key Features

128K token context window
Tool calling with API integration
Chat completion interface
Text-only input processing
Streaming response capability
28.25 tokens/second output speed
Alias support as glm-4-32b

About GLM-4.6

GLM-4.6 is a lightweight text model developed by Zhipu as part of their GLM family. Positioned as an efficient option within the GLM lineup, this model focuses on delivering capable performance for standard language tasks without the computational overhead of larger flagship models. The model features a 128K token context window and supports tool calling functionality, enabling it to interact with external APIs and structured workflows. GLM-4.6 processes text-only inputs and generates chat completions, with benchmark performance showing 28.25 output tokens per second and a time to first token of 1,193 milliseconds. The model is also known by the alias glm-4-32b. As a lightweight tier model, GLM-4.6 serves applications requiring efficient language processing without the complexity demands that necessitate frontier-class models. It provides a balance between capability and performance for developers building applications that need reliable text generation with tool integration features.

Common Use Cases

GLM-4.6 is well-suited for applications requiring efficient text processing with tool integration capabilities. Its lightweight design makes it appropriate for high-volume chat applications, customer service bots, and automated workflows that need to call external APIs. The 128K context window supports document analysis, content summarization, and multi-turn conversations with substantial history. Organizations building cost-effective language applications that don't require the advanced reasoning of frontier models can leverage GLM-4.6 for reliable performance in production environments where response speed and resource efficiency are priorities.

Frequently Asked Questions

How much does GLM-4.6 cost per million tokens?

GLM-4.6 pricing varies by provider and pricing type. Check the pricing table above for current rates across all providers offering this model.

What is GLM-4.6 best used for?

GLM-4.6 excels at efficient text processing tasks including chat applications, customer service automation, and workflows requiring tool calling capabilities. Its lightweight design and 128K context window make it suitable for document processing and multi-turn conversations where performance efficiency is important.

Does GLM-4.6 support image inputs or multimodal capabilities?

No, GLM-4.6 is a text-only model that supports chat completions but does not process images or other multimodal inputs. It focuses on text generation with tool calling functionality for API integrations.