Chat Completions API

Send a conversation history and receive a model-generated reply. The endpoint is compatible with the OpenAI Chat Completions API, so any existing OpenAI SDK or client works without modification.

Endpoint

POST https://YOUR_NEWAPI_BASE_URL/v1/chat/completions

Request parameters

model

string

required

The model ID to use, for example gpt-4o or claude-3-5-sonnet-20241022. The available values depend on the channels your admin has configured.

messages

object[]

required

The conversation history as an ordered list of messages. Each message must have a role (system, user, assistant, or tool) and a content field.

temperature

number

default:"1"

Sampling temperature between 0 and 2. Lower values make responses more deterministic; higher values make them more varied.

top_p

number

default:"1"

Nucleus sampling parameter between 0 and 1. An alternative to temperature — only the tokens comprising the top probability mass are considered.

integer

default:"1"

Number of completion choices to generate. Must be 1 or greater.

stream

boolean

default:"false"

When true, the response is streamed as Server-Sent Events (SSE). Each chunk is a data: line containing a partial JSON delta, ending with data: [DONE].

stream_options

object

Options for streaming responses. Set include_usage: true to include a final chunk with token usage statistics.

max_tokens

integer

Maximum number of tokens to generate in the response.

max_completion_tokens

integer

Alternative to max_tokens. Sets the maximum number of tokens allowed in the completion, including reasoning tokens for models that support them.

stop

string | string[]

One or more sequences where generation stops. The model will stop before producing any of the specified sequences.

presence_penalty

number

default:"0"

Penalizes new tokens based on whether they have appeared in the text so far. Valid range: -2.0 to 2.0.

frequency_penalty

number

default:"0"

Penalizes new tokens based on their frequency in the text so far. Valid range: -2.0 to 2.0.

tools

object[]

A list of tools (functions) the model may call. Each tool must include a type of function and a function object with name, description, and parameters.

tool_choice

string | object

Controls how the model selects tools. Pass "none" to disable tool calls, "auto" to let the model decide, or an object specifying a particular function to call.

response_format

object

Constrains the output format. Set { "type": "json_object" } to enforce JSON output, or { "type": "text" } for plain text.

seed

integer

If specified, the system will attempt to sample deterministically so that repeated requests with the same seed and parameters return the same result.

reasoning_effort

string

Controls the depth of reasoning for models that support extended thinking. Accepted values: "low", "medium", "high".

logit_bias

object

Modify the likelihood of specific tokens appearing in the output. Maps token IDs (as strings) to bias values from -100 to 100.

user

string

An optional identifier for the end user. Useful for abuse detection and monitoring.

modalities

string[]

Output modalities to request. Supported values include "text" and "audio".

audio

object

Audio output configuration when modalities includes "audio". Specify voice and format.

Response fields

string

Unique identifier for the completion.

object

string

Always "chat.completion" for non-streaming responses.

created

integer

Unix timestamp (seconds) when the completion was created.

model

string

The model that generated the response.

choices

object[]

Show properties

index

integer

Zero-based index of this choice.

message

object

Show properties

role

string

Always "assistant".

content

string

The generated text. May be null when tool_calls are present.

tool_calls

object[]

Tool calls requested by the model. Each item has id, type, and a function object with name and arguments.

reasoning_content

string

Reasoning trace produced by the model, if supported and enabled.

finish_reason

string

Why the model stopped generating. Common values: "stop", "length", "tool_calls", "content_filter".

usage

object

Show properties

prompt_tokens

integer

Tokens in the input messages.

completion_tokens

integer

Tokens in the generated reply.

total_tokens

integer

Sum of prompt and completion tokens.

prompt_tokens_details

object

Breakdown of prompt tokens: cached_tokens, text_tokens, audio_tokens, image_tokens.

completion_tokens_details

object

Breakdown of completion tokens: text_tokens, audio_tokens, reasoning_tokens.

system_fingerprint

string

An opaque string representing the system configuration that served the request.

Examples

curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer sk-your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ]
  }'

Streaming example

Set stream: true to receive incremental tokens as SSE events. The stream ends with a data: [DONE] sentinel.

cURL (streaming)

curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer sk-your-token" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Tell me a short story." }
    ]
  }'

Each streamed chunk looks like:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1715000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

The final chunk contains "finish_reason": "stop" and delta: {}, followed by data: [DONE].

Additional chat formats

Newapi also supports Gemini’s native API format at /v1beta/models/{model}:generateContent. Pass your Newapi token as a Bearer token in the Authorization header. This is useful if you are migrating from the Gemini SDK without changing request shapes.

The Anthropic Claude Messages format is available at POST /v1/messages. Include the anthropic-version: 2023-06-01 header alongside your Bearer token. See your admin for model name mappings. The OpenAI Responses API format is also supported at POST /v1/responses, which provides multi-turn conversation state management through previous_response_id.

Overview

AI Model APIs

Management APIs

Chat Completions API

Endpoint

Request parameters

Response fields

Examples

Streaming example

Additional chat formats

Overview

AI Model APIs

Management APIs

Documentation Index

​Endpoint

​Request parameters

​Response fields

​Examples

​Streaming example

​Additional chat formats

Endpoint

Request parameters

Response fields

Examples

Streaming example

Additional chat formats