OpenAI-Compatible API

Base URL: https://api.langmart.ai/v1

LangMart provides a fully OpenAI-compatible API, allowing you to use existing OpenAI SDKs and tools with models from multiple providers.

Endpoints Overview

Endpoint Method Description
/v1/chat/completions POST Create chat completions
/v1/completions POST Create text completions (legacy)
/v1/embeddings POST Create embeddings
/v1/models GET List available models
/v1/models/{model} GET Get model details

Chat Completions

Create a chat completion with conversation history.

Endpoint

POST /v1/chat/completions

Request Headers

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Request Body

Parameter Type Required Description
model string Yes Model ID (e.g., openai/gpt-4o)
messages array Yes Array of message objects
temperature number No Sampling temperature (0-2). Default: 1
top_p number No Nucleus sampling (0-1). Default: 1
max_tokens integer No Maximum tokens to generate
stream boolean No Enable streaming. Default: false
stop string/array No Stop sequences
presence_penalty number No Presence penalty (-2 to 2). Default: 0
frequency_penalty number No Frequency penalty (-2 to 2). Default: 0
tools array No Available tools/functions
tool_choice string/object No Tool selection mode
response_format object No Response format (e.g., JSON mode)
seed integer No Random seed for reproducibility
user string No User identifier for tracking

Message Object

{
  "role": "user | assistant | system | tool",
  "content": "Message content",
  "name": "optional_name",
  "tool_calls": [],
  "tool_call_id": "for tool responses"
}

Example Request

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Response

{
  "id": "chatcmpl-9abc123def456",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream": true
  }'

Streaming Response (Server-Sent Events):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Why"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" did"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool/Function Calling

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in Paris?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Tool Call Response:

{
  "id": "chatcmpl-abc",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Paris\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

JSON Mode

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "List 3 fruits as JSON"}
    ],
    "response_format": {"type": "json_object"}
  }'

Text Completions (Legacy)

Create a text completion. Note: Most providers now prefer chat completions.

Endpoint

POST /v1/completions

Request Body

Parameter Type Required Description
model string Yes Model ID
prompt string/array Yes Text prompt(s)
max_tokens integer No Maximum tokens
temperature number No Sampling temperature
top_p number No Nucleus sampling
stop string/array No Stop sequences
echo boolean No Echo prompt in response

Example

curl https://api.langmart.ai/v1/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-3.5-turbo-instruct",
    "prompt": "Once upon a time",
    "max_tokens": 50
  }'

Embeddings

Generate vector embeddings for text.

Endpoint

POST /v1/embeddings

Request Body

Parameter Type Required Description
model string Yes Embedding model ID
input string/array Yes Text(s) to embed
encoding_format string No float or base64
dimensions integer No Output dimensions (some models)

Example

curl https://api.langmart.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0048, 0.0089, ...]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Batch Embeddings

curl https://api.langmart.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": [
      "First text to embed",
      "Second text to embed",
      "Third text to embed"
    ]
  }'

List Models

Get a list of available models.

Endpoint

GET /v1/models

Example

curl https://api.langmart.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

{
  "object": "list",
  "data": [
    {
      "id": "openai/gpt-4o",
      "object": "model",
      "created": 1704067200,
      "owned_by": "openai",
      "permission": [],
      "root": "gpt-4o",
      "parent": null
    },
    {
      "id": "anthropic/claude-3-5-sonnet-20241022",
      "object": "model",
      "created": 1704067200,
      "owned_by": "anthropic",
      "permission": [],
      "root": "claude-3-5-sonnet",
      "parent": null
    }
  ]
}

Get Model

Get details for a specific model.

Endpoint

GET /v1/models/{model_id}

Example

curl https://api.langmart.ai/v1/models/openai/gpt-4o \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

{
  "id": "openai/gpt-4o",
  "object": "model",
  "created": 1704067200,
  "owned_by": "openai",
  "permission": [],
  "root": "gpt-4o",
  "parent": null
}

Supported Models by Provider

OpenAI

Model ID Description Context
openai/gpt-4o Most capable GPT-4 128K
openai/gpt-4o-mini Fast, cost-effective 128K
openai/gpt-4-turbo GPT-4 Turbo 128K
openai/gpt-3.5-turbo Fast, affordable 16K
openai/text-embedding-3-small Small embeddings -
openai/text-embedding-3-large Large embeddings -

Anthropic

Model ID Description Context
anthropic/claude-3-5-sonnet-20241022 Most intelligent 200K
anthropic/claude-3-opus-20240229 Most powerful 200K
anthropic/claude-3-haiku-20240307 Fastest 200K

Google

Model ID Description Context
google/gemini-1.5-pro Most capable 1M
google/gemini-1.5-flash Fast 1M
google/gemini-pro Balanced 32K

Groq (Ultra-Fast)

Model ID Description Context
groq/llama-3.3-70b-versatile Llama 3.3 70B 128K
groq/llama-3.1-70b-versatile Llama 3.1 70B 128K
groq/mixtral-8x7b-32768 Mixtral MoE 32K

Mistral

Model ID Description Context
mistral/mistral-large-latest Most capable 128K
mistral/mistral-medium-latest Balanced 32K
mistral/mistral-small-latest Fast 32K

SDK Examples

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_LANGMART_API_KEY",
    base_url="https://api.langmart.ai/v1"
)

# Chat completion
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Embeddings
embeddings = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="Hello world"
)
print(embeddings.data[0].embedding[:5])

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'YOUR_LANGMART_API_KEY',
    baseURL: 'https://api.langmart.ai/v1'
});

// Chat completion
const response = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Hello!' }
    ]
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [{ role: 'user', content: 'Tell me a story' }],
    stream: true
});
for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

cURL

# Basic request
curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# With streaming
curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Error Handling

Common Errors

Status Error Type Description
400 invalid_request_error Malformed request
401 authentication_error Invalid API key
402 billing_error Insufficient credits
404 model_not_found Model doesn't exist
429 rate_limit_error Too many requests
500 server_error Internal error
503 gateway_unavailable No available gateway

Error Response Format

{
  "error": {
    "type": "authentication_error",
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "param": null,
    "details": {}
  }
}

Retry Logic

import time
from openai import OpenAI, RateLimitError, APIError

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.langmart.ai/v1"
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="openai/gpt-4o",
                messages=messages
            )
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        except APIError as e:
            if e.status_code >= 500:
                time.sleep(2 ** attempt)
            else:
                raise
    raise Exception("Max retries exceeded")

Best Practices

1. Use Streaming for Long Responses

Streaming provides better user experience for longer outputs:

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

2. Set Appropriate Max Tokens

Prevent runaway costs by limiting output length:

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[...],
    max_tokens=500
)

3. Use System Messages Effectively

Guide model behavior with clear system prompts:

messages = [
    {"role": "system", "content": "You are a concise assistant. Keep responses under 100 words."},
    {"role": "user", "content": "Explain quantum computing"}
]

4. Handle Errors Gracefully

Always implement proper error handling:

try:
    response = client.chat.completions.create(...)
except openai.AuthenticationError:
    print("Check your API key")
except openai.RateLimitError:
    print("Rate limited, waiting...")
    time.sleep(60)
except openai.APIError as e:
    print(f"API error: {e}")

Feature Direct Link
Browse Models https://langmart.ai/models
API Keys https://langmart.ai/settings
Request Logs https://langmart.ai/requests
Usage & Costs https://langmart.ai/usage