API Reference
Batch Synthesis — POST /v1/audio/speech
Send text and receive a complete audio file in a single HTTP round-trip.
Endpoint
| PROPERTY | VALUE |
|---|
| Method | POST |
| URL | https://tts.shunyalabs.ai/v1/audio/speech |
| Content-Type | application/json |
| Auth | Authorization: Bearer <API_KEY> |
| Response | Audio bytes in the requested format |
Request Parameters
| PARAMETER | TYPE | REQUIRED | DESCRIPTION |
|---|
model | string | Yes | Model name. Use "zero-indic". |
voice | string | Yes | Speaker voice name. |
input | string | Yes | Text to synthesize. Max 10,000 characters. |
response_format | string | No | Output format. Default: "mp3". |
speed | float | No | Speed multiplier. Range: 0.25-4.0. Default: 1.0. |
language | string | No | ISO 639 language code for preprocessing. |
trim_silence | bool | No | Remove leading/trailing silence. Default: false. |
volume_normalization | string | No | "peak" or "loudness". |
background_audio | string | No | Preset name or base64-encoded audio. |
background_volume | float | No | Background volume 0.0-1.0. Default: 0.1. |
reference_wav | string | No | Base64 reference audio for voice cloning. |
reference_text | string | No | Transcript of reference audio. Max 500 chars. |
word_timestamps | bool | No | Return word-level timestamps. Default: false. |
max_tokens | int | No | Max tokens for LLM generation. Default: 2048. |
| HEADER | DESCRIPTION |
|---|
Content-Type | MIME type of the audio (e.g., audio/mpeg). |
Content-Length | Size of the audio response in bytes. |
X-Request-Id | Unique request identifier for debugging. |
Error Status Codes
| CODE | MEANING | DESCRIPTION |
|---|
200 | OK | Audio returned successfully. |
400 | Bad Request | Invalid parameters, input too long, or missing required fields. |
401 | Unauthorized | Missing or invalid API key. |
500 | Internal Server Error | Server-side failure. Retry with backoff. |
503 | Service Unavailable | Server overloaded or under maintenance. Retry later. |
504 | Gateway Timeout | Request exceeded server timeout (120 s). |