Text-to-Speech (TTS)
Zero TTS is Shunya's speech synthesis family. 46 speaker voices across 23 Indic languages and English. Every voice can speak every language. 11 expression styles. Voice cloning from a 3-6 second reference clip.
How it fits together
Batch vs Streaming
Two synthesis modes are available. Same model, same voices, different transport and different "when does the first byte of audio leave the server."
Send text via HTTP POST and receive a complete audio file in a single response.
- Pre-rendered voice prompts for IVR and telephony systems.
- Notification audio, order updates, alerts, reminders.
- Podcast, audiobook, and long-form content generation.
- Any use case where audio does not need to start playing before synthesis is complete.
- Transport
- HTTP POST
- Endpoint
https://tts.shunyalabs.ai/v1/audio/speech- Auth
- Bearer
<API_KEY> - Required
text,model,voice- Default format
mp3
Open a persistent WebSocket connection and receive audio chunks in real time as synthesis happens.
- Voice agents and conversational AI requiring sub-second audio start.
- IVR and telephony pipelines.
- Real-time audio playback in applications.
- Any use case where audio must begin playing before synthesis of the full text is complete.
- Transport
- WebSocket
- Endpoint
wss://tts.shunyalabs.ai/ws/v1/audio/speech
also:/ws/tts,/ws- Config
TTSConfig- Default format
mp3
Source: Shunyalabs TTS Developer Documentation v1.0 (March 2026), §2.1 Batch overview and §3.1 Streaming overview, text reproduced verbatim.
Endpoints
| Mode | Endpoint | Default format |
|---|---|---|
| Batch | POST https://tts.shunyalabs.ai/v1/audio/speech | mp3 |
| Streaming | wss://tts.shunyalabs.ai/ws/v1/audio/speech | mp3 (pcm recommended) |
| Health | GET https://tts.shunyalabs.ai/health | - |
Your first synthesis
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"zero-indic","input":"Hello, how are you today?","voice":"Varun"}' \
--output hello.mp3import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig
async def main():
async with AsyncShunyaClient() as client:
result = await client.tts.synthesize(
"Hello, how are you today?",
config=TTSConfig(model="zero-indic", voice="Varun"),
)
result.save("hello.mp3")
asyncio.run(main())from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tts.shunyalabs.ai/v1",
)
response = client.audio.speech.create(
model="zero-indic",
input="Hello, how are you today?",
voice="Varun",
response_format="mp3",
)
response.stream_to_file("output.mp3")Key features
Male & female speakers per language. Any voice, any language, cross-lingual synthesis is built in.
PCM, WAV, MP3, OGG Opus, FLAC, mulaw, alaw. Pick by use case, telephony, web, archival.
Happy, Sad, News, Narrative, Conversational, Enthusiastic, and more, prepended as tags to your text.
Clone a voice from 3-6 seconds of reference audio. Works across all 23 supported languages.
First audio in under 350 ms. Critical for voice agents.
Pipe OpenAI / Anthropic / Gemini tokens to TTS at sentence boundaries, the core pattern for low-latency voice agents.
Required fields, at a glance
{
"model": "zero-indic", // required, only "zero-indic" today
"input": "Your text here", // required, up to 10,000 chars
"voice": "Varun", // required, see Voices page for full list
"response_format": "mp3", // optional, default mp3
"speed": 1.0, // optional, 0.25 to 4.0
"language": "en", // optional, ISO code for preprocessing
"trim_silence": false, // optional, tight audio when true
"reference_wav": "...", // optional, base64 for voice cloning
"reference_text": "..." // optional, transcript for voice cloning
}