Qwen 3 VL 235B
Qwen 3 VL 235B is Alibaba's flagship multimodal model with vision and text capabilities, featuring a 262K token context window for complex reasoning tasks.
API Pricing
Cheapest on OpenRouter — 39% below avg| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.200 | $0.880 | 58.0 t/s | 1.2s | 4/14/2026 | |
| $0.260 | $1.33 | 58.0 t/s | 1.2s | 4/14/2026 | |
| $0.530 | $2.66 | 58.0 t/s | 1.2s | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Alibaba
- Family
- Qwen
- Tier
- Flagship
- Context Window
- 262K
- Modalities
- Text, Image
Capabilities
- Tool Calling
- No
- Open Source
- No
Strengths & Limitations
- Large 262K token context window supports extensive multimodal conversations
- 235B parameter scale provides substantial reasoning capabilities
- Multimodal support handles both text and image inputs natively
- Output speed of 60.22 tokens per second for responsive generation
- Flagship-tier model from Alibaba with latest architectural improvements
- Extended context enables analysis of long documents with embedded visuals
- No function calling or tool use capabilities
- Proprietary model with no open-source weights available
- Time to first token of over 1 second may impact latency-sensitive applications
- Limited to text and image modalities without audio or video support
- Larger model size may result in higher computational costs
Key Features
About Qwen 3 VL 235B
Common Use Cases
Qwen 3 VL 235B is designed for complex multimodal applications requiring sophisticated reasoning across text and visual content. Its large context window makes it particularly suitable for analyzing lengthy documents that contain charts, diagrams, or images, such as research papers, technical manuals, or financial reports. The model excels at visual question answering, content moderation involving images, educational applications requiring diagram explanation, and enterprise document processing workflows. The flagship-tier capabilities enable advanced reasoning tasks like comparing multiple images, extracting information from complex visual layouts, and maintaining context across extended multimodal conversations.
Frequently Asked Questions
How much does Qwen 3 VL 235B cost per million tokens?
Qwen 3 VL 235B pricing varies by provider and may have different rates for text versus image tokens. Check the pricing table above for current rates across all available providers.
What is Qwen 3 VL 235B best used for?
Qwen 3 VL 235B excels at complex multimodal tasks requiring analysis of both text and images, particularly document understanding with visual elements, visual question answering, and reasoning over mixed media content within its 262K token context window.
Does Qwen 3 VL 235B support function calling or tool use?
No, Qwen 3 VL 235B does not currently support function calling or tool use capabilities. It focuses on direct text and image analysis rather than external tool integration.