G

Google: Gemini Audio Understanding

Google
Vision
50K
Context
$0.0500
Input /1M
$0.1500
Output /1M
4K
Max Output

Google: Gemini Audio Understanding

Model Overview

Property Value
Model ID google/gemini-audio-understanding
Name Gemini Audio Understanding
Status Preview
Released 2025-11-01

Description

Audio analysis model.

Description

Google: Gemini Audio Understanding is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec Value
Context Window 50,000 tokens
Max Output 4,096 tokens
Modalities audio, text

Pricing

Type Price
Input $0.05/1M tokens
Output $0.15/1M tokens

Capabilities

  • Text: Yes
  • Image: No
  • Audio: Yes
  • Video: No
  • Tool Use: No
  • JSON Mode: No

Key Features

  1. Multimodal Support - Text, images, audio, and video
  2. Large Context - Up to 50,000 tokens
  3. Tool Use - Not supported
  4. JSON Mode - Not available
  5. Streaming - Real-time generation
  6. Cost Effective - Optimized pricing

Best For

  • Audio transcription
  • Speech analysis
  • Meeting summarization
  • Audio classification

Data & Usage Policies

Policy Status
Training Data Not used for training
Prompt Retention Does not retain prompts
Data Processing Google Cloud privacy compliant

Status & Availability

  • Status: PREVIEW
  • Free Tier: No
  • Provider: Google

API Usage Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "google/gemini-audio-understanding",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 4096
  }'
  • google/gemini-3-pro-preview - Latest flagship
  • google/gemini-2.5-pro - Advanced 2.5 model
  • google/gemini-2.0-flash - Fast multimodal
  • google/gemma-3-27b-it - Open-source alternative

Source

Generated for LangMart AI Platform on 2025-12-28