Send a conversation history and receive a model-generated reply. The endpoint is compatible with the OpenAI Chat Completions API, so any existing OpenAI SDK or client works without modification.Documentation Index
Fetch the complete documentation index at: https://doc.hitopen.com/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Request parameters
The model ID to use, for example
gpt-4o or claude-3-5-sonnet-20241022. The available values depend on the channels your admin has configured.The conversation history as an ordered list of messages. Each message must have a
role (system, user, assistant, or tool) and a content field.Sampling temperature between
0 and 2. Lower values make responses more deterministic; higher values make them more varied.Nucleus sampling parameter between
0 and 1. An alternative to temperature — only the tokens comprising the top probability mass are considered.Number of completion choices to generate. Must be
1 or greater.When
true, the response is streamed as Server-Sent Events (SSE). Each chunk is a data: line containing a partial JSON delta, ending with data: [DONE].Options for streaming responses. Set
include_usage: true to include a final chunk with token usage statistics.Maximum number of tokens to generate in the response.
Alternative to
max_tokens. Sets the maximum number of tokens allowed in the completion, including reasoning tokens for models that support them.One or more sequences where generation stops. The model will stop before producing any of the specified sequences.
Penalizes new tokens based on whether they have appeared in the text so far. Valid range:
-2.0 to 2.0.Penalizes new tokens based on their frequency in the text so far. Valid range:
-2.0 to 2.0.A list of tools (functions) the model may call. Each tool must include a
type of function and a function object with name, description, and parameters.Controls how the model selects tools. Pass
"none" to disable tool calls, "auto" to let the model decide, or an object specifying a particular function to call.Constrains the output format. Set
{ "type": "json_object" } to enforce JSON output, or { "type": "text" } for plain text.If specified, the system will attempt to sample deterministically so that repeated requests with the same seed and parameters return the same result.
Controls the depth of reasoning for models that support extended thinking. Accepted values:
"low", "medium", "high".Modify the likelihood of specific tokens appearing in the output. Maps token IDs (as strings) to bias values from
-100 to 100.An optional identifier for the end user. Useful for abuse detection and monitoring.
Output modalities to request. Supported values include
"text" and "audio".Audio output configuration when
modalities includes "audio". Specify voice and format.Response fields
Unique identifier for the completion.
Always
"chat.completion" for non-streaming responses.Unix timestamp (seconds) when the completion was created.
The model that generated the response.
An opaque string representing the system configuration that served the request.
Examples
Streaming example
Setstream: true to receive incremental tokens as SSE events. The stream ends with a data: [DONE] sentinel.
cURL (streaming)
"finish_reason": "stop" and delta: {}, followed by data: [DONE].
Additional chat formats
Newapi also supports Gemini’s native API format at
/v1beta/models/{model}:generateContent. Pass your Newapi token as a Bearer token in the Authorization header. This is useful if you are migrating from the Gemini SDK without changing request shapes.POST /v1/messages. Include the anthropic-version: 2023-06-01 header alongside your Bearer token. See your admin for model name mappings.
The OpenAI Responses API format is also supported at POST /v1/responses, which provides multi-turn conversation state management through previous_response_id.