G

Google: Multimodal Understanding Pro

Google
Vision Tools
500K
Context
$2.00
Input /1M
$6.00
Output /1M
4K
Max Output

Google: Multimodal Understanding Pro

Model Overview

Property Value
Model ID google/multimodal-understanding-pro
Name Multimodal Understanding Pro
Status Preview
Released 2025-10-20

Description

Advanced multimodal model.

Description

Google: Multimodal Understanding Pro is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec Value
Context Window 500,000 tokens
Max Output 4,096 tokens
Modalities text, image, audio, video, docs

Pricing

Type Price
Input $2.0/1M tokens
Output $6.0/1M tokens

Capabilities

  • Text: Yes
  • Image: Yes
  • Audio: Yes
  • Video: Yes
  • Tool Use: Yes
  • JSON Mode: Yes

Key Features

  1. Multimodal Support - Text, images, audio, and video
  2. Large Context - Up to 500,000 tokens
  3. Tool Use - Supported
  4. JSON Mode - Supported
  5. Streaming - Real-time generation
  6. Cost Effective - Optimized pricing

Best For

  • Document analysis
  • Multimedia processing
  • Enterprise apps
  • Research documents

Data & Usage Policies

Policy Status
Training Data Not used for training
Prompt Retention Does not retain prompts
Data Processing Google Cloud privacy compliant

Status & Availability

  • Status: PREVIEW
  • Free Tier: No
  • Provider: Google

API Usage Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "google/multimodal-understanding-pro",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 4096
  }'
  • google/gemini-3-pro-preview - Latest flagship
  • google/gemini-2.5-pro - Advanced 2.5 model
  • google/gemini-2.0-flash - Fast multimodal
  • google/gemma-3-27b-it - Open-source alternative

Source

Generated for LangMart AI Platform on 2025-12-28