Newapi exposes three audio endpoints, all compatible with the OpenAI audio API: text-to-speech synthesis, speech-to-text transcription, and spoken audio translation into English.Documentation Index
Fetch the complete documentation index at: https://doc.hitopen.com/llms.txt
Use this file to discover all available pages before exploring further.
Text to speech
Request parameters
The TTS model to use. Common values:
tts-1 (faster, lower latency) and tts-1-hd (higher quality).The text to convert to speech. Maximum 4,096 characters.
The voice to use. Supported values:
alloy, echo, fable, onyx, nova, shimmer.Audio format of the response. Accepted values:
mp3, opus, aac, flac, wav, pcm.Playback speed multiplier. Valid range:
0.25 to 4.0.Response
The response body is the raw audio file as binary data in the requested format (default:audio/mpeg). Stream or save it directly to a file.
Examples
Audio transcription
multipart/form-data.
Request parameters
The audio file to transcribe. Supported formats:
flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.The transcription model to use.
whisper-1 is the standard model.The language of the audio, as an ISO-639-1 code (for example
en, fr, zh). Providing this improves accuracy and speed.Optional text to guide the model’s style or supply context (for example, a list of proper nouns or a transcript excerpt). Must be in the same language as the audio.
Format of the transcription output. Options:
json (default), text, srt, verbose_json, vtt.Sampling temperature between
0 and 1. Lower values produce more consistent transcriptions.Granularity of word or segment timestamps. Pass
["word"] or ["segment"]. Requires response_format to be verbose_json.Response fields
The transcribed text. When
response_format is verbose_json, additional fields include language, duration, words, and segments.Examples
Audio translation
multipart/form-data.
Request parameters
The audio file to translate. Supported formats are the same as for transcriptions.
The model to use.
whisper-1 is the standard model.Optional English-language text to guide the model’s output style.
Output format:
json, text, srt, verbose_json, or vtt.Sampling temperature between
0 and 1.Response fields
The translated English transcription.
Example
cURL