FlagshipGoogle

Gemini 3 Pro

Gemini 3 Pro is Google's flagship multimodal model supporting text, image, video, and audio inputs with a 1M token context window.

Context 1.0M
Tier Flagship
Knowledge Jun 2025
Tools Supported
Modalities text, image, video, audio
Input from
$1.00 / 1M tokens
across 2 providers

API Pricing

Cheapest on Google Cloud 40% below avg
ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$1.00$6.00134 t/s28.7s4/13/2026
$2.00$12.00134 t/s28.7s4/14/2026
$2.00$12.00134 t/s28.7s4/13/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Google
Family
Gemini
Tier
Flagship
Context Window
1.0M
Knowledge Cutoff
Jun 2025
Modalities
Text, Image, Video, Audio

Capabilities

Tool Calling
Yes
Open Source
No
Subtypes
Chat Completion, Code Generation

Strengths & Limitations

  • Supports four modalities: text, image, video, and audio input
  • 1 million token context window for processing extensive documents
  • Tool calling functionality with structured outputs
  • Output speed of 139 tokens per second
  • Knowledge cutoff of June 2025 for current information
  • Code generation capabilities alongside chat completion
  • Video input support distinguishes it from many competing models
  • Proprietary model with no open-source weights available
  • Time to first token of 20.4 seconds is slower than some competitors
  • No longer the newest model in Google's lineup with Gemini 3.1 Pro available
  • Multimodal capabilities may come with higher computational costs

Key Features

1 million token context window
Multimodal input (text, image, video, audio)
Tool calling with structured output
Chat completion
Code generation
Streaming responses
Video processing capabilities
Function calling API

About Gemini 3 Pro

Gemini 3 Pro is Google's flagship model in the Gemini family, representing the company's most capable offering for complex multimodal tasks. Developed by Google DeepMind, it sits at the top tier of the Gemini model lineup, designed to handle sophisticated reasoning and generation across multiple input types. The model features a 1 million token context window and supports text, image, video, and audio inputs, making it one of the most versatile multimodal models available. It includes tool calling capabilities and covers chat completion and code generation tasks. Performance benchmarks show an output speed of 139 tokens per second with a time to first token of 20.4 seconds. The model's knowledge cutoff is June 2025, providing relatively current information. Gemini 3 Pro competes directly with other flagship models like Claude Opus 4.6 and GPT-5.4, offering Google's approach to multimodal AI capabilities. Its extensive context window and broad modality support make it suitable for complex document analysis, multimedia content processing, and applications requiring understanding across different input types.

Common Use Cases

Gemini 3 Pro is designed for complex multimodal applications requiring flagship-level performance across diverse input types. Its video and audio processing capabilities make it suitable for multimedia content analysis, educational applications involving varied media formats, and research tasks requiring understanding of visual and auditory information. The 1 million token context window enables processing of extensive documents, lengthy conversations, and large codebases. Organizations use it for sophisticated AI agents that need to interpret and reason about multiple data types simultaneously, complex content moderation involving video and audio, and applications requiring deep understanding of multimedia educational or training materials.

Frequently Asked Questions

How much does Gemini 3 Pro cost per million tokens?

Gemini 3 Pro pricing varies by provider and pricing type (standard vs batch). Input and output tokens typically have different rates, and multimodal inputs may have separate pricing structures. Check the pricing table above for current rates across all providers.

What is Gemini 3 Pro best used for?

Gemini 3 Pro excels at complex multimodal tasks requiring understanding of text, images, video, and audio. Its 1M token context window makes it ideal for extensive document analysis, multimedia content processing, educational applications with varied media types, and AI agents that need to reason across multiple input modalities simultaneously.

How does Gemini 3 Pro compare to Gemini 3.1 Pro?

Gemini 3.1 Pro is the newer model in Google's flagship tier, likely offering improved capabilities over Gemini 3 Pro. Both models share similar multimodal capabilities and large context windows, but Gemini 3.1 Pro represents Google's latest advancements in the Gemini family. The choice depends on whether you need the absolute latest capabilities or if Gemini 3 Pro's features meet your requirements.