Trinity Mini - API, Providers, Stats

Overview

Trinity Mini is a 26B-parameter sparse mixture-of-experts language model engineered by Arcee AI for efficient reasoning over extended contexts with robust function calling and multi-step agent capabilities.

Key Specifications:

Architecture: Sparse Mixture of Experts (MoE)
Total Parameters: 26B
Active Parameters: 3B (effective)
Expert Configuration: 128 experts with 8 active per token
Context Length: 131,072 tokens
Input/Output Modalities: Text-only
Release Date: December 1, 2025

Pricing

Metric	Cost
Input	$0.04 per 1M tokens
Output	$0.15 per 1M tokens

Performance

Latency & Throughput (via Together):

Latency: 0.26s
Throughput: 243.8 tokens per second
Uptime: 100.0%

Recent Usage Metrics:

Peak daily requests: ~58,500 (Dec 10, 2025)
Significant reasoning token usage patterns
Active tool calling functionality
Error rate: <1% on tool calls
Consistent performance across deployments

Information about related models is not provided on the Trinity Mini model page.

Providers

Provider	Quantization	Status
Clarifai	BF16	Active

Model Endpoint ID: arcee_ai/AFM/models/trinity-mini

Parameters

Supported Parameters:

Max Tokens
Temperature (default: 0.15)
Top P (default: 0.75)
Stop Sequences
Frequency Penalty
Presence Penalty
Top K
Repetition Penalty
Logit Bias
Min P

Advanced Features:

Reasoning mode with <think> tags for internal reasoning
Tool/Function calling with tool choice selection
Structured outputs & response formatting
Include reasoning flag
Response format specification

Tool Capabilities:

Robust function calling support
Multi-step agent workflow capability
Tool choice parameters available

Trinity Mini - API, Providers, Stats

Trinity Mini - API, Providers, Stats

Overview

Pricing

Performance

Related Models

Providers

Parameters