TTS API reference
Every endpoint under tts.shunyalabs.ai, every field, every error code.
Base URLs
| Interface | URL |
|---|---|
| Batch | https://tts.shunyalabs.ai |
| Streaming | wss://tts.shunyalabs.ai/ws |
| Health | https://tts.shunyalabs.ai/health |
Authentication
Same API key works for both transports. Pick the tab for the protocol you're using.
Authorization: Bearer sk-your-api-keySend the key as an Authorization header on the WebSocket upgrade. If your client can't set headers, fall back to a token query parameter:
wss://tts.shunyalabs.ai/ws?token=sk-your-api-keyPOST /v1/audio/speech
Batch synthesis. Returns audio bytes in the requested format.
| Property | Value |
|---|---|
| Method | POST |
| URL | https://tts.shunyalabs.ai/v1/audio/speech (also /tts, /) |
| Content-Type | application/json |
| Response body | Audio bytes in requested format |
Request fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | Use zero-indic. |
input | string | Yes | - | Text to synthesize. Max 10,000 chars. |
voice | string | Yes | - | Speaker name. See Voices & languages. |
response_format | string | No | mp3 | pcm, wav, mp3, ogg_opus, flac, mulaw, alaw. |
speed | float | No | 1.0 | 0.25 (slowest) to 4.0 (fastest). |
language | string | No | null | ISO 639 code for text preprocessing. |
trim_silence | bool | No | false | Remove leading/trailing silence. |
volume_normalization | string | No | null | peak (0 dBFS) or loudness (EBU R128). |
background_audio | string | No | null | Preset name (office, cafe, rain, street) or base64-encoded WAV/MP3. |
background_volume | float | No | 0.1 | Background mix ratio, 0.0-1.0. |
reference_wav | string | No | null | Base64 audio for voice cloning. WAV/FLAC/OGG. 1-6 s. |
reference_text | string | No | "" | Transcript of reference audio. Max 500 chars. |
word_timestamps | bool | No | false | Include word-level timestamps (batch only). |
max_tokens | int | No | 2048 | Max tokens for internal LLM generation. 1-8192. |
Response headers
| Header | Meaning |
|---|---|
Content-Type | MIME type (audio/mpeg, audio/wav, etc.) |
X-Sample-Rate | Sample rate in Hz |
X-Request-Id | Unique identifier, log for support |
WebSocket /ws
Full protocol in Streaming. Summary below.
Handshake
Three equivalent ways to send the synthesis request on connect, pick whichever fits your client.
Encode parameters in the URL, works in clients that can't easily send post-connect messages.
wss://tts.shunyalabs.ai/ws?model=zero-indic&voice=Varun&encoding=pcmConnect with no params, then send a config message. Useful when you want to reuse the connection.
{"type": "config", "model": "zero-indic", "voice": "Varun", "response_format": "pcm"}Send the full synthesis request in one message, the server treats it as both config and synthesis trigger.
{"model": "zero-indic", "input": "Hello!", "voice": "Sunita", "response_format": "pcm"}Inbound fields
| Field | Required | Description |
|---|---|---|
model | Yes | Use zero-indic. |
input | Yes | Text. Max 10,000 chars. |
voice | Yes | Speaker name. |
response_format | No | Default pcm on WebSocket. |
speed | No | 0.25-4.0, default 1.0. |
language | No | ISO 639 code for preprocessing. |
trim_silence | No | Remove silence, default false. |
Outbound message shape
For each synthesis, the server emits messages in this order: a chunk metadata JSON, then a binary audio frame, repeated for every chunk; then a final completion JSON. If something goes wrong, an error JSON arrives instead of completion.
1. Chunk metadata (JSON)
Sent before each binary audio frame.
{
"type": "chunk",
"request_id": "uuid",
"chunk_index": 0,
"is_final": false,
"format": "pcm",
"sample_rate": 16000
}2. Audio data (binary frame)
Raw audio bytes that immediately follow each chunk JSON. In the SDK, isinstance(msg, bytes) is True.
3. Completion (JSON)
Sent once after all audio chunks have been delivered.
{
"type": "completion",
"request_id": "uuid",
"status": "complete",
"total_chunks": 3,
"total_duration_seconds": 2.48,
"format": "pcm",
"sample_rate": 16000
}4. Error (JSON)
Sent instead of completion on failure. The connection closes after this message.
{
"type": "error",
"request_id": "uuid",
"error": "Error description"
}GET /health
Request:
curl https://tts.shunyalabs.ai/health \
-H "Authorization: Bearer $SHUNYALABS_API_KEY"Response:
{"status": "healthy", "triton_ready": true, "auth_ready": true}SDK exception hierarchy
All SDK exceptions inherit from ShunyalabsError.
| Exception | HTTP | Description |
|---|---|---|
AuthenticationError | 401 | Invalid or missing API key. |
PermissionDeniedError | 403 | API key lacks permission. |
RateLimitError | 429 | Rate limit exceeded. Back off. |
SynthesisError | 422 | Invalid text or config. |
ServerError | 5xx | Transient. Safe to retry. |
TimeoutError | - | Request exceeded timeout. |
ConnectionError | - | Network failure. |
Rate & concurrency limits
| Limit | Value |
|---|---|
| Max text length per request | 10,000 characters |
| Recommended per-request | Under 500 characters for best quality; split longer text |
| HTTP request timeout | Set to at least 120 s for long text |
| Concurrent requests (default tier) | 16 |
Rate-limit retry pattern
import asyncio
from shunyalabs.exceptions import RateLimitError
async def synthesize_with_backoff(client, text, config, retries=3):
for attempt in range(retries):
try:
return await client.tts.synthesize(text, config=config)
except RateLimitError:
wait = 2 ** attempt # 1s, 2s, 4s
await asyncio.sleep(wait)
raise RateLimitError("Max retries exceeded")Concurrency cap pattern
import asyncio
sem = asyncio.Semaphore(16)
async def safe_synthesize(client, text, config):
async with sem:
return await client.tts.synthesize(text, config=config)
tasks = [safe_synthesize(client, s, config) for s in scripts]
results = await asyncio.gather(*tasks)HTTP error status codes
| Status | Description |
|---|---|
| 200 | Success. Audio bytes in response body. |
| 400 | Missing or malformed fields. Body: {"detail": "..."}. |
| 401 | API key invalid or missing. |
| 422 | Invalid text or config. |
| 429 | Rate limited. |
| 500 | Internal server error. |
| 503 | Backend (Triton/Redis) temporarily unavailable. |
| 504 | Gateway timeout. |
SDK configuration reference
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | string | None | Falls back to SHUNYALABS_API_KEY env var. |
timeout | float | 60.0 | Request timeout (seconds). |
max_retries | int | 2 | Retries for 5xx and connection failures. |
tts_url | string | https://tts.shunyalabs.ai | Batch API base URL. Override for self-hosted. |
tts_ws_url | string | wss://tts.shunyalabs.ai/ws/v1/audio/speech | WebSocket URL. |
Self-hosted configuration
client = AsyncShunyaClient(
api_key="your-api-key",
tts_url="https://my-tts-server.example.com",
tts_ws_url="wss://my-tts-server.example.com/ws",
)