Google: Multimodal Understanding Pro

Model Overview

Property	Value
Model ID	`google/multimodal-understanding-pro`
Name	Multimodal Understanding Pro
Status	Preview
Released	2025-10-20

Description

Advanced multimodal model.

Description

Google: Multimodal Understanding Pro is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec	Value
Context Window	500,000 tokens
Max Output	4,096 tokens
Modalities	text, image, audio, video, docs

Pricing

Type	Price
Input	$2.0/1M tokens
Output	$6.0/1M tokens

Capabilities

Text: Yes
Image: Yes
Audio: Yes
Video: Yes
Tool Use: Yes
JSON Mode: Yes

Key Features

Multimodal Support - Text, images, audio, and video
Large Context - Up to 500,000 tokens
Tool Use - Supported
JSON Mode - Supported
Streaming - Real-time generation
Cost Effective - Optimized pricing

Best For

Document analysis
Multimedia processing
Enterprise apps
Research documents

Data & Usage Policies

Policy	Status
Training Data	Not used for training
Prompt Retention	Does not retain prompts
Data Processing	Google Cloud privacy compliant

Status & Availability

Status: PREVIEW
Free Tier: No
Provider: Google

API Usage Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "google/multimodal-understanding-pro",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 4096
  }'

google/gemini-3-pro-preview - Latest flagship
google/gemini-2.5-pro - Advanced 2.5 model
google/gemini-2.0-flash - Fast multimodal
google/gemma-3-27b-it - Open-source alternative

Source

Generated for LangMart AI Platform on 2025-12-28

Google: Multimodal Understanding Pro

Google: Multimodal Understanding Pro

Model Overview

Description

Description

Specifications

Pricing

Capabilities

Key Features

Best For

Data & Usage Policies

Status & Availability

API Usage Example

Related Models

Source