API reference
Infer exposes an OpenAI-compatible /chat/completions endpoint. Any SDK or HTTP
client that accepts a custom base URL can call it with no code changes.
New to Infer? Run through the Quickstart first for an API key, the base URL, and a connection test. This page assumes you already have both.
Endpoint
POST https://api-agenthub-pre.riema.xyz/v1/chat/completionsAuthentication
Pass your API key as a Bearer token in the Authorization header:
Authorization: Bearer your_api_keyKeys are scoped to a team. Create and rotate them in the API Keys dashboard.
Request body
| Field | Required | Description |
|---|---|---|
model | yes | Model ID to route to, e.g. gpt-5.4. See the Models catalog. |
messages | yes | Array of chat messages. Must contain at least one entry. |
temperature | no | Sampling temperature. Defaults depend on the model. |
max_tokens | no | Upper bound on output tokens. On reasoning models the budget can be fully consumed by hidden reasoning tokens, which returns an empty content with finish_reason: "length". |
stream | no | When true, the response is a Server-Sent Events stream of deltas. |
tools / tool_choice | no | Function calling, same schema as OpenAI. |
Any other field in the OpenAI /chat/completions schema (top_p, stop,
seed, response_format, …) is accepted unchanged.
Response
A non-streaming response matches the OpenAI shape:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1738960610,
"model": "gpt-5.4",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help you today?" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 9,
"total_tokens": 22
}
}Examples
curl https://api-agenthub-pre.riema.xyz/v1/chat/completions \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "your_model",
"messages": [{ "role": "user", "content": "Hello" }]
}'from openai import OpenAI
client = OpenAI(
api_key="your_api_key",
base_url="https://api-agenthub-pre.riema.xyz/v1",
)
response = client.chat.completions.create(
model="your_model",
messages=[{"role": "user", "content": "Hello" }],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your_api_key",
baseURL: "https://api-agenthub-pre.riema.xyz/v1",
});
const response = await client.chat.completions.create({
model: "your_model",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);Streaming
Set stream: true in the request body. The response becomes a Server-Sent
Events stream. Each chunk follows the OpenAI chat.completion.chunk shape, and
the stream terminates with a data: [DONE] line:
curl https://api-agenthub-pre.riema.xyz/v1/chat/completions \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "your_model",
"stream": true,
"messages": [{ "role": "user", "content": "Hello" }]
}'from openai import OpenAI
client = OpenAI(
api_key="your_api_key",
base_url="https://api-agenthub-pre.riema.xyz/v1",
)
stream = client.chat.completions.create(
model="your_model",
messages=[{"role": "user", "content": "Hello" }],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your_api_key",
baseURL: "https://api-agenthub-pre.riema.xyz/v1",
});
const stream = await client.chat.completions.create({
model: "your_model",
messages: [{ role: "user", content: "Hello" }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}Error codes
| Code | Meaning | Fix |
|---|---|---|
| 400 | Invalid request, malformed JSON body, or unsupported parameter. | Check the request body against the field table above and confirm the model value is copyable from the live model list. |
| 401 | Invalid, missing, or revoked API key. | Re-copy the key from the API Keys dashboard and confirm the Authorization header is present. |
| 402 | The selected team has insufficient balance for the request. | Open Billing, add funds to the team, then retry the same request. |
| 429 | Rate or quota limit reached. | Back off with exponential delay, reduce concurrency, or check team quota before retrying. |
| 500 | Gateway or upstream model provider error. | Retry after a short delay. If it persists, try another enabled model or contact support. |
See also
- OpenAI Chat Completions reference : Infer mirrors this schema.
- Models: full catalog with live availability.
- Quickstart: API key, base URL, and first call.