FlagshipBaidu

ERNIE 4.5 VL 424B

ERNIE 4.5 VL 424B is Baidu's flagship multimodal model with 424 billion parameters, supporting text and image inputs with a 123K token context window.

Context 123K
Tier Flagship
Modalities text, image
Input from
$0.420 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MUpdated
$0.420$1.254/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Baidu
Family
ERNIE
Tier
Flagship
Context Window
123K
Modalities
Text, Image

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • 424 billion parameters provide substantial model capacity
  • Multimodal support for both text and image inputs
  • 123,000 token context window handles lengthy documents
  • Flagship tier positioning within ERNIE model family
  • Developed by Baidu with focus on Chinese language capabilities
  • Large parameter count suitable for complex reasoning tasks
  • Vision-language understanding in single model
  • No tool calling or function execution capabilities
  • Proprietary model with no open-source availability
  • Smaller context window compared to some competing flagship models
  • Limited to text and image modalities only

Key Features

424 billion parameter architecture
123,000 token context window
Text input and processing
Image input and understanding
Multimodal reasoning capabilities
Chinese and multilingual support
Vision-language integration
Large-scale model inference

About ERNIE 4.5 VL 424B

ERNIE 4.5 VL 424B is Baidu's flagship model in the ERNIE family, featuring 424 billion parameters and multimodal capabilities. As a proprietary model from one of China's leading AI companies, it represents the top tier of Baidu's language model offerings and demonstrates the company's advancement in large-scale model development. The model supports both text and image inputs with a 123,000 token context window, enabling it to process lengthy documents alongside visual content. With its substantial 424B parameter count, it is designed to handle complex reasoning tasks, multimodal understanding, and sophisticated language processing across various domains. ERNIE 4.5 VL 424B is positioned for enterprise applications requiring advanced multimodal AI capabilities, particularly in Chinese language contexts where Baidu's models have shown strong performance. The model competes with other flagship multimodal models in the market, offering organizations an alternative for complex AI workloads that involve both text and visual understanding.

Common Use Cases

ERNIE 4.5 VL 424B is suited for complex enterprise applications requiring multimodal AI capabilities, particularly those involving both text and visual content analysis. Its 424B parameter count and flagship positioning make it appropriate for sophisticated reasoning tasks, document analysis with visual elements, content generation, and applications requiring deep understanding of both textual and visual information. The model is especially valuable for organizations needing advanced AI capabilities in Chinese language contexts or those requiring a single model to handle diverse multimodal workloads without tool integration.

Frequently Asked Questions

How much does ERNIE 4.5 VL 424B cost per million tokens?

ERNIE 4.5 VL 424B pricing varies by provider and may differ for text versus image inputs. Check the pricing table above for current rates across all available providers.

What is ERNIE 4.5 VL 424B best used for?

ERNIE 4.5 VL 424B excels at complex multimodal tasks requiring both text and image understanding, such as document analysis with visual elements, content creation involving images, and sophisticated reasoning tasks. Its 424B parameters make it suitable for enterprise applications requiring advanced AI capabilities.

Does ERNIE 4.5 VL 424B support tool calling or function execution?

No, ERNIE 4.5 VL 424B does not support tool calling or function execution capabilities. It focuses on text and image processing without external tool integration features.