FlagshipGoogle

Gemini 2.5 Pro

Name: Gemini 2.5 Pro
Availability: InStock
Author: Google

Gemini 2.5 Pro is Google's flagship multimodal model supporting text, image, video, and audio inputs with a 1M token context window.

Context 1.0M

Tier Flagship

Tools Supported

Modalities text, image, video, audio

Input from

$0.625 / 1M tokens

across 3 providers

Compare Prices Model Page →API Docs

API Pricing

Cheapest on Google Cloud — 43% below avg

Provider	Input / 1M	Output / 1M	Cached / 1M	Speed	TTFT	Updated
Google CloudBatch	$0.625	$5.00	$0.050	140 t/s	17.6s	6/28/2026
OpenRouter	$1.25	$10.00	$0.125	140 t/s	17.6s	7/13/2026
Google Cloud	$1.25	$10.00	$0.130	140 t/s	17.6s	6/28/2026
Deep Infra	$1.25	$10.00	-	140 t/s	17.6s	7/13/2026

Prices updated daily. Last check: Jul 13, 2026

Performance & Benchmarks

Source: Artificial Analysis →

Intelligence

25.8 / 100

Coding

33.3 / 100

Math

87.7 / 100

Output Speed

140 t/s

Latency (TTFT)

17.6s

Reasoning & Knowledge

MMLU-Pro86.2%
GPQA Diamond84.4%
Humanity's Last Exam21.1%

Coding

LiveCodeBench80.1%
SciCode42.8%

Math

AIME 202587.7%
AIME88.7%
MATH-50096.7%

Agentic & Tool Use

Terminal-Bench Hard26.5%
Terminal-Bench v2.128.5%
τ²-bench54.1%
τ-bench Banking9.3%

Instruction & Long Context

IFBench48.7%
Long-Context Reasoning66.0%

Benchmarks measured Jul 2026. Scores are independent evaluations, not vendor-reported.

Model Details

General

Creator: Google
Family: Gemini
Tier: Flagship
Context Window: 1.0M
Modalities: Text, Image, Video, Audio

Capabilities

Tool Calling: Yes
Open Source: No
Subtypes: Chat Completion, Code Generation
Aliases: gemini-2-5-pro-computer-use

Strengths & Limitations

Strengths

1 million token context window for processing very long documents and conversations
Full multimodal support across text, image, video, and audio inputs
Tool calling with structured output capabilities
Code generation and programming assistance
Handles complex reasoning across multiple media types simultaneously
Computer use capabilities for interacting with software interfaces
Large context enables comprehensive document analysis and synthesis

Limitations

Proprietary model with no open-source weights available
Benchmark performance metrics not publicly disclosed
No streaming token generation speed data available
Limited to Google's API ecosystem for access

Key Features

•1 million token context window

•Multimodal input support (text, image, video, audio)

•Tool calling with structured outputs

•Code generation and analysis

•Computer use and interface interaction

•Chat completion API

•Function calling with parallel execution

•Long-form document processing

About Gemini 2.5 Pro

Gemini 2.5 Pro is Google's flagship model in the Gemini family, representing the company's most capable offering for complex multimodal tasks. Developed by Google DeepMind, it sits at the top tier of the Gemini model lineup, designed to handle the most demanding AI workloads across enterprise and research applications. The model features an exceptionally large 1 million token context window and supports comprehensive multimodal capabilities across text, image, video, and audio inputs. It includes advanced tool calling functionality and code generation capabilities, enabling it to interact with external systems and generate structured outputs. The model's multimodal architecture allows it to process and reason across different media types within the same conversation. Gemini 2.5 Pro competes directly with other flagship models like Claude Opus 4.6 and GPT-5.4 in the high-capability segment. Its standout feature is the combination of its massive context window with full multimodal support, making it particularly suited for applications requiring long-form document analysis, video understanding, and complex reasoning tasks that span multiple media types.

Common Use Cases

Gemini 2.5 Pro is designed for complex enterprise and research applications that require multimodal reasoning and extensive context handling. Its 1M token context window makes it ideal for comprehensive document analysis, legal contract review, and research synthesis across multiple sources. The multimodal capabilities enable use cases like video content analysis, audio transcription with visual context, and educational applications that combine text, images, and video. The computer use functionality supports automated workflow tasks and software testing scenarios. Organizations use it for complex coding projects, data analysis across multiple formats, and building sophisticated AI agents that need to process diverse media types while maintaining context across very long interactions.

Frequently Asked Questions

How much does Gemini 2.5 Pro cost per million tokens?

Gemini 2.5 Pro pricing varies by provider and usage type (standard vs batch processing). Check the pricing table above for current rates across all available providers offering this model.

What is Gemini 2.5 Pro best used for?

Gemini 2.5 Pro excels at complex multimodal tasks requiring long context, such as comprehensive document analysis, video content understanding, and building AI agents that work across multiple media types. Its 1M token context window and computer use capabilities make it particularly strong for enterprise workflows involving extensive data processing.

How does Gemini 2.5 Pro's context window compare to other flagship models?

Gemini 2.5 Pro's 1 million token context window is among the largest available in flagship models, enabling it to process very long documents, maintain extended conversations, and analyze comprehensive datasets that would exceed the limits of models with smaller context windows.