Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.hitopen.com/llms.txt

Use this file to discover all available pages before exploring further.

Newapi exposes three audio endpoints, all compatible with the OpenAI audio API: text-to-speech synthesis, speech-to-text transcription, and spoken audio translation into English.

Text to speech

POST https://YOUR_NEWAPI_BASE_URL/v1/audio/speech
Generate spoken audio from text. The response is a binary audio file in the format you specify.

Request parameters

model
string
required
The TTS model to use. Common values: tts-1 (faster, lower latency) and tts-1-hd (higher quality).
input
string
required
The text to convert to speech. Maximum 4,096 characters.
voice
string
required
The voice to use. Supported values: alloy, echo, fable, onyx, nova, shimmer.
response_format
string
default:"\"mp3\""
Audio format of the response. Accepted values: mp3, opus, aac, flac, wav, pcm.
speed
number
default:"1"
Playback speed multiplier. Valid range: 0.25 to 4.0.

Response

The response body is the raw audio file as binary data in the requested format (default: audio/mpeg). Stream or save it directly to a file.

Examples

curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/audio/speech" \
  -H "Authorization: Bearer sk-your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello! Welcome to Newapi.",
    "voice": "nova",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Audio transcription

POST https://YOUR_NEWAPI_BASE_URL/v1/audio/transcriptions
Transcribe spoken audio to text. The request uses multipart/form-data.

Request parameters

file
file
required
The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
model
string
required
The transcription model to use. whisper-1 is the standard model.
language
string
The language of the audio, as an ISO-639-1 code (for example en, fr, zh). Providing this improves accuracy and speed.
prompt
string
Optional text to guide the model’s style or supply context (for example, a list of proper nouns or a transcript excerpt). Must be in the same language as the audio.
response_format
string
default:"\"json\""
Format of the transcription output. Options: json (default), text, srt, verbose_json, vtt.
temperature
number
default:"0"
Sampling temperature between 0 and 1. Lower values produce more consistent transcriptions.
timestamp_granularities
string[]
Granularity of word or segment timestamps. Pass ["word"] or ["segment"]. Requires response_format to be verbose_json.

Response fields

text
string
The transcribed text. When response_format is verbose_json, additional fields include language, duration, words, and segments.

Examples

curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer sk-your-token" \
  -F file="@recording.mp3" \
  -F model="whisper-1" \
  -F language="en"

Audio translation

POST https://YOUR_NEWAPI_BASE_URL/v1/audio/translations
Transcribe audio in any supported language and translate the result into English. The request uses multipart/form-data.

Request parameters

file
file
required
The audio file to translate. Supported formats are the same as for transcriptions.
model
string
required
The model to use. whisper-1 is the standard model.
prompt
string
Optional English-language text to guide the model’s output style.
response_format
string
default:"\"json\""
Output format: json, text, srt, verbose_json, or vtt.
temperature
number
default:"0"
Sampling temperature between 0 and 1.

Response fields

text
string
The translated English transcription.

Example

cURL
curl -X POST "https://YOUR_NEWAPI_BASE_URL/v1/audio/translations" \
  -H "Authorization: Bearer sk-your-token" \
  -F file="@french_audio.mp3" \
  -F model="whisper-1"

Example response

{
  "text": "Hello, how are you doing today?"
}