ERNIE 4.5 VL 28B
ERNIE 4.5 VL 28B is Baidu's lightweight multimodal model with vision capabilities and a 30K token context window for efficient text and image processing.
API Pricing
| Provider | Input / 1M | Output / 1M | Updated |
|---|---|---|---|
| $0.140 | $0.560 | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- Baidu
- Family
- ERNIE
- Tier
- Lightweight
- Context Window
- 30K
- Modalities
- Text, Image
Capabilities
- Tool Calling
- No
- Open Source
- No
Strengths & Limitations
- Multimodal support for both text and image inputs
- 30,000 token context window for processing longer documents and conversations
- 28 billion parameter architecture balances capability with efficiency
- Part of Baidu's established ERNIE model family with Chinese language optimization
- Lightweight tier positioning enables faster inference compared to flagship models
- Vision-language capabilities for image analysis and multimodal reasoning
- No tool calling or function execution capabilities
- Proprietary model with weights not publicly available
- Smaller context window compared to flagship models in the 100K+ range
- Limited to text and image modalities without audio or video support
- Lightweight tier may have reduced reasoning complexity versus flagship alternatives
Key Features
About ERNIE 4.5 VL 28B
Common Use Cases
ERNIE 4.5 VL 28B is well-suited for applications requiring efficient multimodal processing, particularly in scenarios involving Chinese language content and visual analysis. Its lightweight architecture makes it appropriate for document analysis with images, e-commerce product description generation, content moderation involving both text and images, educational applications that need to process textbooks with diagrams, and customer service scenarios where visual context is important. The 30K context window supports moderate-length conversations and document processing while maintaining cost efficiency. Organizations needing vision-language capabilities at scale, particularly in Chinese markets or multilingual applications, can benefit from its balanced performance-to-efficiency ratio without requiring the computational resources of flagship multimodal models.
Frequently Asked Questions
How much does ERNIE 4.5 VL 28B cost per million tokens?
ERNIE 4.5 VL 28B pricing varies by provider and may have different rates for text and image processing. Check the pricing table above for current rates across all available providers.
What is ERNIE 4.5 VL 28B best used for?
ERNIE 4.5 VL 28B excels at multimodal tasks requiring both text and image understanding, including document analysis with visual elements, image description, visual question answering, and content moderation. Its lightweight architecture makes it particularly suitable for high-volume applications and scenarios where Chinese language support is important.
Does ERNIE 4.5 VL 28B support tool calling or function execution?
No, ERNIE 4.5 VL 28B does not support tool calling or function execution capabilities. The model is focused on text and image understanding tasks rather than agentic workflows that require external tool integration.