API reference

Infer exposes an OpenAI-compatible /chat/completions endpoint. Any SDK or HTTP client that accepts a custom base URL can call it with no code changes.

Tip:

New to Infer? Run through the Quickstart first for an API key, the base URL, and a connection test. This page assumes you already have both.

Endpoint

POST https://api-agenthub-pre.riema.xyz/v1/chat/completions

Authentication

Pass your API key as a Bearer token in the Authorization header:

Authorization: Bearer your_api_key

Keys are scoped to a team. Create and rotate them in the API Keys dashboard.

Request body

Field	Required	Description
`model`	yes	Model ID to route to, e.g. `gpt-5.4`. See the Models catalog.
`messages`	yes	Array of chat messages. Must contain at least one entry.
`temperature`	no	Sampling temperature. Defaults depend on the model.
`max_tokens`	no	Upper bound on output tokens. On reasoning models the budget can be fully consumed by hidden reasoning tokens, which returns an empty `content` with `finish_reason: "length"`.
`stream`	no	When `true`, the response is a Server-Sent Events stream of deltas.
`tools` / `tool_choice`	no	Function calling, same schema as OpenAI.

Any other field in the OpenAI /chat/completions schema (top_p, stop, seed, response_format, …) is accepted unchanged.

Response

A non-streaming response matches the OpenAI shape:

{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1738960610,
"model": "gpt-5.4",
"choices": [
  {
    "index": 0,
    "message": { "role": "assistant", "content": "Hello! How can I help you today?" },
    "finish_reason": "stop"
  }
],
"usage": {
  "prompt_tokens": 13,
  "completion_tokens": 9,
  "total_tokens": 22
}
}

Examples

curl

curl https://api-agenthub-pre.riema.xyz/v1/chat/completions \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your_model",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="your_api_key",
    base_url="https://api-agenthub-pre.riema.xyz/v1",
)

response = client.chat.completions.create(
    model="your_model",
    messages=[{"role": "user", "content": "Hello" }],
)

print(response.choices[0].message.content)

javascript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your_api_key",
  baseURL: "https://api-agenthub-pre.riema.xyz/v1",
});

const response = await client.chat.completions.create({
  model: "your_model",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(response.choices[0].message.content);

Streaming

Set stream: true in the request body. The response becomes a Server-Sent Events stream. Each chunk follows the OpenAI chat.completion.chunk shape, and the stream terminates with a data: [DONE] line: