API Reference

Batch Synthesis — POST /v1/audio/speech

Send text and receive a complete audio file in a single HTTP round-trip.


Endpoint

PROPERTYVALUE
MethodPOST
URLhttps://tts.shunyalabs.ai/v1/audio/speech
Content-Typeapplication/json
AuthAuthorization: Bearer <API_KEY>
ResponseAudio bytes in the requested format

Request Parameters

PARAMETERTYPEREQUIREDDESCRIPTION
modelstringYesModel name. Use "zero-indic".
voicestringYesSpeaker voice name.
inputstringYesText to synthesize. Max 10,000 characters.
response_formatstringNoOutput format. Default: "mp3".
speedfloatNoSpeed multiplier. Range: 0.25-4.0. Default: 1.0.
languagestringNoISO 639 language code for preprocessing.
trim_silenceboolNoRemove leading/trailing silence. Default: false.
volume_normalizationstringNo"peak" or "loudness".
background_audiostringNoPreset name or base64-encoded audio.
background_volumefloatNoBackground volume 0.0-1.0. Default: 0.1.
reference_wavstringNoBase64 reference audio for voice cloning.
reference_textstringNoTranscript of reference audio. Max 500 chars.
word_timestampsboolNoReturn word-level timestamps. Default: false.
max_tokensintNoMax tokens for LLM generation. Default: 2048.

Response Headers

HEADERDESCRIPTION
Content-TypeMIME type of the audio (e.g., audio/mpeg).
Content-LengthSize of the audio response in bytes.
X-Request-IdUnique request identifier for debugging.

Error Status Codes

CODEMEANINGDESCRIPTION
200OKAudio returned successfully.
400Bad RequestInvalid parameters, input too long, or missing required fields.
401UnauthorizedMissing or invalid API key.
500Internal Server ErrorServer-side failure. Retry with backoff.
503Service UnavailableServer overloaded or under maintenance. Retry later.
504Gateway TimeoutRequest exceeded server timeout (120 s).