FlagshipAlibaba

Qwen 3.5 122B

Qwen 3.5 122B is Alibaba's flagship multimodal model supporting text, image, and video inputs with a 262K token context window.

Context 262K
Tier Flagship
Modalities text, image, video
Input from
$0.260 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.260$1.56151 t/s1.1s4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Alibaba
Family
Qwen
Tier
Flagship
Context Window
262K
Modalities
Text, Image, Video

Capabilities

Tool Calling
No
Open Source
No
Aliases
qwen3-5-plus-02-15

Strengths & Limitations

  • Supports text, image, and video input modalities
  • Large 262,144 token context window for extensive content processing
  • Output speed of 139.78 tokens per second for responsive applications
  • 122-billion parameter scale for complex reasoning tasks
  • Native multimodal processing without separate model calls
  • Flagship tier positioning with comprehensive capabilities
  • Video understanding capability beyond static image analysis
  • No function calling or tool use capabilities
  • Proprietary model with no open source weights available
  • Time to first token of 1,024ms may impact real-time applications
  • Limited to inference API access only

Key Features

262,144 token context window
Text input and generation
Image input processing
Video content analysis
Multimodal understanding
122-billion parameter architecture
Streaming response support
Batch processing capabilities

About Qwen 3.5 122B

Qwen 3.5 122B is Alibaba's flagship model in the Qwen family, representing the company's most capable offering for complex multimodal tasks. This 122-billion parameter model positions itself as a comprehensive solution for enterprises and developers requiring advanced AI capabilities across multiple content types. The model supports text, image, and video inputs with a substantial 262,144 token context window, enabling processing of lengthy documents, extensive conversations, and complex multimodal workflows. Performance benchmarks show an output speed of 139.78 tokens per second with a time to first token of 1,024 milliseconds. The model handles multiple modalities natively, allowing users to combine text instructions with visual content in a single request. Qwen 3.5 122B targets use cases requiring sophisticated understanding across text and visual media, competing with other flagship multimodal models in enterprise and research environments. Its large context window and multimodal capabilities make it suitable for applications requiring analysis of long-form content combined with visual elements.

Common Use Cases

Qwen 3.5 122B is designed for complex multimodal applications requiring analysis of text, images, and video content within a single workflow. Its large context window makes it suitable for document analysis combined with visual elements, content moderation across multiple media types, educational applications involving multimedia materials, and research tasks requiring comprehensive understanding of mixed content formats. The model's flagship positioning and video capabilities make it appropriate for media analysis, content creation workflows, and enterprise applications where multimodal understanding is essential for business processes.

Frequently Asked Questions

How much does Qwen 3.5 122B cost per million tokens?

Qwen 3.5 122B pricing varies by provider and may include different rates for text and image tokens. Check the pricing table above for current rates across all available providers.

What is Qwen 3.5 122B best used for?

Qwen 3.5 122B excels at multimodal tasks involving text, image, and video analysis. Its large context window and video understanding capabilities make it well-suited for content analysis, document processing with visual elements, educational applications, and enterprise workflows requiring comprehensive multimedia understanding.

Does Qwen 3.5 122B support function calling?

No, Qwen 3.5 122B does not support function calling or tool use capabilities. It focuses on multimodal understanding and generation tasks rather than agentic workflows that require external tool integration.