Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.hitopen.com/llms.txt

Use this file to discover all available pages before exploring further.

Send a conversation history and receive a model-generated reply. The endpoint is compatible with the OpenAI Chat Completions API, so any existing OpenAI SDK or client works without modification.

Endpoint

POST https://YOUR_NEWAPI_BASE_URL/v1/chat/completions

Request parameters

model
string
required
The model ID to use, for example gpt-4o or claude-3-5-sonnet-20241022. The available values depend on the channels your admin has configured.
messages
object[]
required
The conversation history as an ordered list of messages. Each message must have a role (system, user, assistant, or tool) and a content field.
temperature
number
default:"1"
Sampling temperature between 0 and 2. Lower values make responses more deterministic; higher values make them more varied.
top_p
number
default:"1"
Nucleus sampling parameter between 0 and 1. An alternative to temperature — only the tokens comprising the top probability mass are considered.
n
integer
default:"1"
Number of completion choices to generate. Must be 1 or greater.
stream
boolean
default:"false"
When true, the response is streamed as Server-Sent Events (SSE). Each chunk is a data: line containing a partial JSON delta, ending with data: [DONE].
stream_options
object
Options for streaming responses. Set include_usage: true to include a final chunk with token usage statistics.
max_tokens
integer
Maximum number of tokens to generate in the response.
max_completion_tokens
integer
Alternative to max_tokens. Sets the maximum number of tokens allowed in the completion, including reasoning tokens for models that support them.
stop
string | string[]
One or more sequences where generation stops. The model will stop before producing any of the specified sequences.
presence_penalty
number
default:"0"
Penalizes new tokens based on whether they have appeared in the text so far. Valid range: -2.0 to 2.0.
frequency_penalty
number
default:"0"
Penalizes new tokens based on their frequency in the text so far. Valid range: -2.0 to 2.0.
tools
object[]
A list of tools (functions) the model may call. Each tool must include a type of function and a function object with name, description, and parameters.
tool_choice
string | object
Controls how the model selects tools. Pass "none" to disable tool calls, "auto" to let the model decide, or an object specifying a particular function to call.
response_format
object
Constrains the output format. Set { "type": "json_object" } to enforce JSON output, or { "type": "text" } for plain text.
seed
integer
If specified, the system will attempt to sample deterministically so that repeated requests with the same seed and parameters return the same result.
reasoning_effort
string
Controls the depth of reasoning for models that support extended thinking. Accepted values: "low", "medium", "high".
logit_bias
object
Modify the likelihood of specific tokens appearing in the output. Maps token IDs (as strings) to bias values from -100 to 100.
user
string
An optional identifier for the end user. Useful for abuse detection and monitoring.
modalities
string[]
Output modalities to request. Supported values include "text" and "audio".
audio
object
Audio output configuration when modalities includes "audio". Specify voice and format.

Response fields

id
string
Unique identifier for the completion.
object
string
Always "chat.completion" for non-streaming responses.
created
integer
Unix timestamp (seconds) when the completion was created.
model
string
The model that generated the response.
choices
object[]
usage
object
system_fingerprint
string
An opaque string representing the system configuration that served the request.

Examples

curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer sk-your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ]
  }'

Streaming example

Set stream: true to receive incremental tokens as SSE events. The stream ends with a data: [DONE] sentinel.
cURL (streaming)
curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer sk-your-token" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Tell me a short story." }
    ]
  }'
Each streamed chunk looks like:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1715000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
The final chunk contains "finish_reason": "stop" and delta: {}, followed by data: [DONE].

Additional chat formats

Newapi also supports Gemini’s native API format at /v1beta/models/{model}:generateContent. Pass your Newapi token as a Bearer token in the Authorization header. This is useful if you are migrating from the Gemini SDK without changing request shapes.
The Anthropic Claude Messages format is available at POST /v1/messages. Include the anthropic-version: 2023-06-01 header alongside your Bearer token. See your admin for model name mappings. The OpenAI Responses API format is also supported at POST /v1/responses, which provides multi-turn conversation state management through previous_response_id.