Trinity Mini - API, Providers, Stats
Overview
Trinity Mini is a 26B-parameter sparse mixture-of-experts language model engineered by Arcee AI for efficient reasoning over extended contexts with robust function calling and multi-step agent capabilities.
Key Specifications:
- Architecture: Sparse Mixture of Experts (MoE)
- Total Parameters: 26B
- Active Parameters: 3B (effective)
- Expert Configuration: 128 experts with 8 active per token
- Context Length: 131,072 tokens
- Input/Output Modalities: Text-only
- Release Date: December 1, 2025
Pricing
| Metric | Cost |
|---|---|
| Input | $0.04 per 1M tokens |
| Output | $0.15 per 1M tokens |
Performance
Latency & Throughput (via Together):
- Latency: 0.26s
- Throughput: 243.8 tokens per second
- Uptime: 100.0%
Recent Usage Metrics:
- Peak daily requests: ~58,500 (Dec 10, 2025)
- Significant reasoning token usage patterns
- Active tool calling functionality
- Error rate: <1% on tool calls
- Consistent performance across deployments
Related Models
Information about related models is not provided on the Trinity Mini model page.
Providers
| Provider | Quantization | Status |
|---|---|---|
| Clarifai | BF16 | Active |
Model Endpoint ID: arcee_ai/AFM/models/trinity-mini
Parameters
Supported Parameters:
- Max Tokens
- Temperature (default: 0.15)
- Top P (default: 0.75)
- Stop Sequences
- Frequency Penalty
- Presence Penalty
- Top K
- Repetition Penalty
- Logit Bias
- Min P
Advanced Features:
- Reasoning mode with
<think>tags for internal reasoning - Tool/Function calling with tool choice selection
- Structured outputs & response formatting
- Include reasoning flag
- Response format specification
Tool Capabilities:
- Robust function calling support
- Multi-step agent workflow capability
- Tool choice parameters available