Connection Pools for Inference Providers

Connection pools enable high availability, load balancing, and better rate limit handling by distributing requests across multiple API keys.

What are Connection Pools?

A connection pool groups multiple connections to the same provider, allowing:

  • Load Balancing: Distribute requests across multiple API keys
  • Failover: Automatic routing when one connection fails
  • Rate Limit Handling: Avoid rate limits by spreading requests
  • Redundancy: No single point of failure

When to Use Connection Pools

Scenario Recommendation
Low volume (<100 req/min) Single connection
Medium volume (100-1000 req/min) 2-3 connections in pool
High volume (1000+ req/min) Multiple pools, load balancer
Mission critical Pool with failover

Creating a Connection Pool

Via Web Interface

  1. Go to Connections page
  2. Click Create Pool
  3. Configure pool settings:
    • Name: Descriptive name (e.g., "Production OpenAI")
    • Provider: Select provider
    • Strategy: Load balancing method
  4. Add connections to the pool
  5. Save the pool

Via API

# Step 1: Create individual connections
curl -X POST https://api.langmart.ai/api/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI Key 1",
    "provider_id": "openai",
    "api_key": "sk-key-1...",
    "scope": "organization"
  }'

curl -X POST https://api.langmart.ai/api/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI Key 2",
    "provider_id": "openai",
    "api_key": "sk-key-2...",
    "scope": "organization"
  }'

# Step 2: Create pool with connections
curl -X POST https://api.langmart.ai/api/connection-pools \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production OpenAI Pool",
    "provider_id": "openai",
    "scope": "organization",
    "billing_mode": "org_pays",
    "strategy": "round_robin",
    "connection_ids": ["<connection-1-id>", "<connection-2-id>"]
  }'

Load Balancing Strategies

Round Robin

Requests alternate between connections in order:

Request 1 → Connection A
Request 2 → Connection B
Request 3 → Connection A
Request 4 → Connection B

Best for: Equal capacity connections

Random

Requests randomly select a connection:

Request 1 → Connection B
Request 2 → Connection A
Request 3 → Connection A
Request 4 → Connection B

Best for: Similar capacity, unpredictable patterns

Least Used

Requests route to the connection with lowest recent usage:

Connection A: 100 requests this minute
Connection B: 50 requests this minute
→ Next request goes to Connection B

Best for: Unequal capacity or rate limits

Weighted

Distribute requests based on assigned weights:

{
  "connections": [
    {"id": "conn-a", "weight": 3},  // 60% of traffic
    {"id": "conn-b", "weight": 2}   // 40% of traffic
  ]
}

Best for: Different tier API keys

Priority (Failover)

Use primary until failure, then fallback:

{
  "connections": [
    {"id": "conn-a", "priority": 1},  // Primary
    {"id": "conn-b", "priority": 2}   // Fallback
  ]
}

Best for: Primary/backup scenarios

Failover Configuration

Automatic Failover

Configure how the pool handles connection failures:

curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "failover": {
      "enabled": true,
      "retry_count": 3,
      "retry_delay_ms": 1000,
      "circuit_breaker": {
        "enabled": true,
        "failure_threshold": 5,
        "reset_timeout_ms": 30000
      }
    }
  }'

Failover Behavior

Scenario Behavior
Single request fails Retry on next connection
Connection rate limited Skip for retry_delay
Multiple failures Circuit breaker opens
Circuit open Use other connections only

Circuit Breaker States

CLOSED → (failures exceed threshold) → OPEN
                                        ↓
                               (timeout expires)
                                        ↓
                                   HALF_OPEN
                                        ↓
                  (success) ← → (failure)
                      ↓              ↓
                   CLOSED          OPEN

Health Monitoring

Pool Health Status

curl -X GET https://api.langmart.ai/api/connection-pools/<pool_id>/health \
  -H "Authorization: Bearer <your-api-key>"

Response:

{
  "pool_id": "pool-123",
  "status": "healthy",
  "connections": [
    {
      "id": "conn-a",
      "status": "healthy",
      "last_success": "2024-01-15T10:30:00Z",
      "requests_per_minute": 45,
      "error_rate": 0.01
    },
    {
      "id": "conn-b",
      "status": "degraded",
      "last_success": "2024-01-15T10:28:00Z",
      "requests_per_minute": 30,
      "error_rate": 0.05
    }
  ]
}

Health Statuses

Status Description Action
healthy Operating normally None
degraded Elevated errors Monitor closely
unhealthy Frequent failures Investigate
offline Not responding Check provider

Health Check Configuration

{
  "health_check": {
    "enabled": true,
    "interval_ms": 60000,
    "timeout_ms": 5000,
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  }
}

Pool Management

Adding Connections to Pool

curl -X POST https://api.langmart.ai/api/connection-pools/<pool_id>/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "<new-connection-id>"
  }'

Removing Connections

curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id>/connections/<connection_id> \
  -H "Authorization: Bearer <your-api-key>"

Updating Pool Settings

curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "strategy": "least_used",
    "billing_mode": "org_pays"
  }'

Deleting a Pool

curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>"

Best Practices

Connection Distribution

Provider Recommended Pool Size Rationale
OpenAI 2-5 keys Tier-based rate limits
Anthropic 2-3 keys Organization limits
Groq 3-5 keys Request-based limits

Rate Limit Management

  1. Know your limits: Check provider documentation
  2. Monitor usage: Track requests per key
  3. Spread load: Use round_robin or least_used
  4. Plan capacity: Total pool capacity > peak demand

High Availability Setup

For mission-critical deployments:

┌─────────────────────────────────────────┐
│            Primary Pool                  │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│  │  Key 1  │ │  Key 2  │ │  Key 3  │   │
│  └─────────┘ └─────────┘ └─────────┘   │
└─────────────────┬───────────────────────┘
                  │
                  ▼ (failover)
┌─────────────────────────────────────────┐
│           Backup Pool                    │
│  ┌─────────┐ ┌─────────┐               │
│  │  Key 4  │ │  Key 5  │               │
│  └─────────┘ └─────────┘               │
└─────────────────────────────────────────┘

Cost Optimization

  • Use weighted distribution to prefer cheaper tiers
  • Reserve high-tier keys for overflow
  • Monitor per-key costs
  • Rotate keys to spread credits

Troubleshooting

All Connections Failing

Cause Solution
Provider outage Check provider status page
All keys expired Refresh API keys
Network issue Check connectivity
Billing issue Verify provider payment

Uneven Load Distribution

Cause Solution
Round robin with failures Check connection health
Weight misconfiguration Review weights
One key much slower Use least_used strategy

High Error Rates

Cause Solution
Rate limits Add more connections
Invalid keys Remove/replace bad keys
Model deprecated Update model selection

Monitoring & Alerts

Pool Alerts

Set up alerts for pool health:

curl -X POST https://api.langmart.ai/api/alerts \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Pool Health Alert",
    "type": "connection_pool_health",
    "pool_id": "<pool_id>",
    "condition": "status == unhealthy",
    "action": "email",
    "recipients": ["[email protected]"]
  }'

Key Metrics to Monitor

Metric Warning Threshold Critical Threshold
Error rate >5% >10%
Latency p95 >5s >10s
Active connections <2 <1
Rate limit hits >10/min >50/min

Next Steps