Connection Pools for Inference Providers
Connection pools enable high availability, load balancing, and better rate limit handling by distributing requests across multiple API keys.
What are Connection Pools?
A connection pool groups multiple connections to the same provider, allowing:
- Load Balancing: Distribute requests across multiple API keys
- Failover: Automatic routing when one connection fails
- Rate Limit Handling: Avoid rate limits by spreading requests
- Redundancy: No single point of failure
When to Use Connection Pools
| Scenario | Recommendation |
|---|---|
| Low volume (<100 req/min) | Single connection |
| Medium volume (100-1000 req/min) | 2-3 connections in pool |
| High volume (1000+ req/min) | Multiple pools, load balancer |
| Mission critical | Pool with failover |
Creating a Connection Pool
Via Web Interface
- Go to Connections page
- Click Create Pool
- Configure pool settings:
- Name: Descriptive name (e.g., "Production OpenAI")
- Provider: Select provider
- Strategy: Load balancing method
- Add connections to the pool
- Save the pool
Via API
# Step 1: Create individual connections
curl -X POST https://api.langmart.ai/api/connections \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "OpenAI Key 1",
"provider_id": "openai",
"api_key": "sk-key-1...",
"scope": "organization"
}'
curl -X POST https://api.langmart.ai/api/connections \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "OpenAI Key 2",
"provider_id": "openai",
"api_key": "sk-key-2...",
"scope": "organization"
}'
# Step 2: Create pool with connections
curl -X POST https://api.langmart.ai/api/connection-pools \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "Production OpenAI Pool",
"provider_id": "openai",
"scope": "organization",
"billing_mode": "org_pays",
"strategy": "round_robin",
"connection_ids": ["<connection-1-id>", "<connection-2-id>"]
}'Load Balancing Strategies
Round Robin
Requests alternate between connections in order:
Request 1 → Connection A
Request 2 → Connection B
Request 3 → Connection A
Request 4 → Connection BBest for: Equal capacity connections
Random
Requests randomly select a connection:
Request 1 → Connection B
Request 2 → Connection A
Request 3 → Connection A
Request 4 → Connection BBest for: Similar capacity, unpredictable patterns
Least Used
Requests route to the connection with lowest recent usage:
Connection A: 100 requests this minute
Connection B: 50 requests this minute
→ Next request goes to Connection BBest for: Unequal capacity or rate limits
Weighted
Distribute requests based on assigned weights:
{
"connections": [
{"id": "conn-a", "weight": 3}, // 60% of traffic
{"id": "conn-b", "weight": 2} // 40% of traffic
]
}Best for: Different tier API keys
Priority (Failover)
Use primary until failure, then fallback:
{
"connections": [
{"id": "conn-a", "priority": 1}, // Primary
{"id": "conn-b", "priority": 2} // Fallback
]
}Best for: Primary/backup scenarios
Failover Configuration
Automatic Failover
Configure how the pool handles connection failures:
curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"failover": {
"enabled": true,
"retry_count": 3,
"retry_delay_ms": 1000,
"circuit_breaker": {
"enabled": true,
"failure_threshold": 5,
"reset_timeout_ms": 30000
}
}
}'Failover Behavior
| Scenario | Behavior |
|---|---|
| Single request fails | Retry on next connection |
| Connection rate limited | Skip for retry_delay |
| Multiple failures | Circuit breaker opens |
| Circuit open | Use other connections only |
Circuit Breaker States
CLOSED → (failures exceed threshold) → OPEN
↓
(timeout expires)
↓
HALF_OPEN
↓
(success) ← → (failure)
↓ ↓
CLOSED OPENHealth Monitoring
Pool Health Status
curl -X GET https://api.langmart.ai/api/connection-pools/<pool_id>/health \
-H "Authorization: Bearer <your-api-key>"Response:
{
"pool_id": "pool-123",
"status": "healthy",
"connections": [
{
"id": "conn-a",
"status": "healthy",
"last_success": "2024-01-15T10:30:00Z",
"requests_per_minute": 45,
"error_rate": 0.01
},
{
"id": "conn-b",
"status": "degraded",
"last_success": "2024-01-15T10:28:00Z",
"requests_per_minute": 30,
"error_rate": 0.05
}
]
}Health Statuses
| Status | Description | Action |
|---|---|---|
| healthy | Operating normally | None |
| degraded | Elevated errors | Monitor closely |
| unhealthy | Frequent failures | Investigate |
| offline | Not responding | Check provider |
Health Check Configuration
{
"health_check": {
"enabled": true,
"interval_ms": 60000,
"timeout_ms": 5000,
"unhealthy_threshold": 3,
"healthy_threshold": 2
}
}Pool Management
Adding Connections to Pool
curl -X POST https://api.langmart.ai/api/connection-pools/<pool_id>/connections \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"connection_id": "<new-connection-id>"
}'Removing Connections
curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id>/connections/<connection_id> \
-H "Authorization: Bearer <your-api-key>"Updating Pool Settings
curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"strategy": "least_used",
"billing_mode": "org_pays"
}'Deleting a Pool
curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id> \
-H "Authorization: Bearer <your-api-key>"Best Practices
Connection Distribution
| Provider | Recommended Pool Size | Rationale |
|---|---|---|
| OpenAI | 2-5 keys | Tier-based rate limits |
| Anthropic | 2-3 keys | Organization limits |
| Groq | 3-5 keys | Request-based limits |
Rate Limit Management
- Know your limits: Check provider documentation
- Monitor usage: Track requests per key
- Spread load: Use round_robin or least_used
- Plan capacity: Total pool capacity > peak demand
High Availability Setup
For mission-critical deployments:
┌─────────────────────────────────────────┐
│ Primary Pool │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Key 1 │ │ Key 2 │ │ Key 3 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────┬───────────────────────┘
│
▼ (failover)
┌─────────────────────────────────────────┐
│ Backup Pool │
│ ┌─────────┐ ┌─────────┐ │
│ │ Key 4 │ │ Key 5 │ │
│ └─────────┘ └─────────┘ │
└─────────────────────────────────────────┘Cost Optimization
- Use weighted distribution to prefer cheaper tiers
- Reserve high-tier keys for overflow
- Monitor per-key costs
- Rotate keys to spread credits
Troubleshooting
All Connections Failing
| Cause | Solution |
|---|---|
| Provider outage | Check provider status page |
| All keys expired | Refresh API keys |
| Network issue | Check connectivity |
| Billing issue | Verify provider payment |
Uneven Load Distribution
| Cause | Solution |
|---|---|
| Round robin with failures | Check connection health |
| Weight misconfiguration | Review weights |
| One key much slower | Use least_used strategy |
High Error Rates
| Cause | Solution |
|---|---|
| Rate limits | Add more connections |
| Invalid keys | Remove/replace bad keys |
| Model deprecated | Update model selection |
Monitoring & Alerts
Pool Alerts
Set up alerts for pool health:
curl -X POST https://api.langmart.ai/api/alerts \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "Pool Health Alert",
"type": "connection_pool_health",
"pool_id": "<pool_id>",
"condition": "status == unhealthy",
"action": "email",
"recipients": ["[email protected]"]
}'Key Metrics to Monitor
| Metric | Warning Threshold | Critical Threshold |
|---|---|---|
| Error rate | >5% | >10% |
| Latency p95 | >5s | >10s |
| Active connections | <2 | <1 |
| Rate limit hits | >10/min | >50/min |
Next Steps
- Usage Analytics - Monitor pool performance
- Billing Models - Configure pool billing
- Setup Organization - Advanced settings