Connection Pools for Inference Providers

Connection pools enable high availability, load balancing, and better rate limit handling by distributing requests across multiple API keys.

What are Connection Pools?

A connection pool groups multiple connections to the same provider, allowing:

Load Balancing: Distribute requests across multiple API keys
Failover: Automatic routing when one connection fails
Rate Limit Handling: Avoid rate limits by spreading requests
Redundancy: No single point of failure

When to Use Connection Pools

Scenario	Recommendation
Low volume (<100 req/min)	Single connection
Medium volume (100-1000 req/min)	2-3 connections in pool
High volume (1000+ req/min)	Multiple pools, load balancer
Mission critical	Pool with failover

Creating a Connection Pool

Via Web Interface

Go to Connections page
Click Create Pool
Configure pool settings:
- Name: Descriptive name (e.g., "Production OpenAI")
- Provider: Select provider
- Strategy: Load balancing method
Add connections to the pool
Save the pool

Via API

# Step 1: Create individual connections
curl -X POST https://api.langmart.ai/api/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI Key 1",
    "provider_id": "openai",
    "api_key": "sk-key-1...",
    "scope": "organization"
  }'

curl -X POST https://api.langmart.ai/api/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI Key 2",
    "provider_id": "openai",
    "api_key": "sk-key-2...",
    "scope": "organization"
  }'

# Step 2: Create pool with connections
curl -X POST https://api.langmart.ai/api/connection-pools \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production OpenAI Pool",
    "provider_id": "openai",
    "scope": "organization",
    "billing_mode": "org_pays",
    "strategy": "round_robin",
    "connection_ids": ["<connection-1-id>", "<connection-2-id>"]
  }'

Load Balancing Strategies

Round Robin

Requests alternate between connections in order:

Request 1 → Connection A
Request 2 → Connection B
Request 3 → Connection A
Request 4 → Connection B

Best for: Equal capacity connections

Random

Requests randomly select a connection:

Request 1 → Connection B
Request 2 → Connection A
Request 3 → Connection A
Request 4 → Connection B

Best for: Similar capacity, unpredictable patterns

Least Used

Requests route to the connection with lowest recent usage:

Connection A: 100 requests this minute
Connection B: 50 requests this minute
→ Next request goes to Connection B

Best for: Unequal capacity or rate limits

Weighted

Distribute requests based on assigned weights:

{
  "connections": [
    {"id": "conn-a", "weight": 3},  // 60% of traffic
    {"id": "conn-b", "weight": 2}   // 40% of traffic
  ]
}

Best for: Different tier API keys

Priority (Failover)

Use primary until failure, then fallback:

{
  "connections": [
    {"id": "conn-a", "priority": 1},  // Primary
    {"id": "conn-b", "priority": 2}   // Fallback
  ]
}

Best for: Primary/backup scenarios

Failover Configuration

Automatic Failover

Configure how the pool handles connection failures:

curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "failover": {
      "enabled": true,
      "retry_count": 3,
      "retry_delay_ms": 1000,
      "circuit_breaker": {
        "enabled": true,
        "failure_threshold": 5,
        "reset_timeout_ms": 30000
      }
    }
  }'

Failover Behavior

Scenario	Behavior
Single request fails	Retry on next connection
Connection rate limited	Skip for retry_delay
Multiple failures	Circuit breaker opens
Circuit open	Use other connections only

Circuit Breaker States

CLOSED → (failures exceed threshold) → OPEN
                                        ↓
                               (timeout expires)
                                        ↓
                                   HALF_OPEN
                                        ↓
                  (success) ← → (failure)
                      ↓              ↓
                   CLOSED          OPEN

Health Monitoring

Pool Health Status

curl -X GET https://api.langmart.ai/api/connection-pools/<pool_id>/health \
  -H "Authorization: Bearer <your-api-key>"

Response:

{
  "pool_id": "pool-123",
  "status": "healthy",
  "connections": [
    {
      "id": "conn-a",
      "status": "healthy",
      "last_success": "2024-01-15T10:30:00Z",
      "requests_per_minute": 45,
      "error_rate": 0.01
    },
    {
      "id": "conn-b",
      "status": "degraded",
      "last_success": "2024-01-15T10:28:00Z",
      "requests_per_minute": 30,
      "error_rate": 0.05
    }
  ]
}

Health Statuses

Status	Description	Action
healthy	Operating normally	None
degraded	Elevated errors	Monitor closely
unhealthy	Frequent failures	Investigate
offline	Not responding	Check provider

Health Check Configuration

{
  "health_check": {
    "enabled": true,
    "interval_ms": 60000,
    "timeout_ms": 5000,
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  }
}

Pool Management

Adding Connections to Pool

curl -X POST https://api.langmart.ai/api/connection-pools/<pool_id>/connections \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "<new-connection-id>"
  }'

Removing Connections

curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id>/connections/<connection_id> \
  -H "Authorization: Bearer <your-api-key>"

Updating Pool Settings

curl -X PUT https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "strategy": "least_used",
    "billing_mode": "org_pays"
  }'

Deleting a Pool

curl -X DELETE https://api.langmart.ai/api/connection-pools/<pool_id> \
  -H "Authorization: Bearer <your-api-key>"

Best Practices

Connection Distribution

Provider	Recommended Pool Size	Rationale
OpenAI	2-5 keys	Tier-based rate limits
Anthropic	2-3 keys	Organization limits
Groq	3-5 keys	Request-based limits

Rate Limit Management

Know your limits: Check provider documentation
Monitor usage: Track requests per key
Spread load: Use round_robin or least_used
Plan capacity: Total pool capacity > peak demand

High Availability Setup

For mission-critical deployments:

┌─────────────────────────────────────────┐
│            Primary Pool                  │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│  │  Key 1  │ │  Key 2  │ │  Key 3  │   │
│  └─────────┘ └─────────┘ └─────────┘   │
└─────────────────┬───────────────────────┘
                  │
                  ▼ (failover)
┌─────────────────────────────────────────┐
│           Backup Pool                    │
│  ┌─────────┐ ┌─────────┐               │
│  │  Key 4  │ │  Key 5  │               │
│  └─────────┘ └─────────┘               │
└─────────────────────────────────────────┘

Cost Optimization

Use weighted distribution to prefer cheaper tiers
Reserve high-tier keys for overflow
Monitor per-key costs
Rotate keys to spread credits

Troubleshooting

All Connections Failing

Cause	Solution
Provider outage	Check provider status page
All keys expired	Refresh API keys
Network issue	Check connectivity
Billing issue	Verify provider payment

Uneven Load Distribution

Cause	Solution
Round robin with failures	Check connection health
Weight misconfiguration	Review weights
One key much slower	Use least_used strategy

High Error Rates

Cause	Solution
Rate limits	Add more connections
Invalid keys	Remove/replace bad keys
Model deprecated	Update model selection

Monitoring & Alerts

Pool Alerts

Set up alerts for pool health:

curl -X POST https://api.langmart.ai/api/alerts \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Pool Health Alert",
    "type": "connection_pool_health",
    "pool_id": "<pool_id>",
    "condition": "status == unhealthy",
    "action": "email",
    "recipients": ["[email protected]"]
  }'

Key Metrics to Monitor

Metric	Warning Threshold	Critical Threshold
Error rate	>5%	>10%
Latency p95	>5s	>10s
Active connections	<2	<1
Rate limit hits	>10/min	>50/min

Next Steps

Usage Analytics - Monitor pool performance
Billing Models - Configure pool billing
Setup Organization - Advanced settings

Previous Billing Models for Inference Providers Next Getting Started as an Inference Provider