Text-to-Speech (TTS)

Zero TTS is Shunya's speech synthesis family. 46 speaker voices across 23 Indic languages and English. Every voice can speak every language. 11 expression styles. Voice cloning from a 3-6 second reference clip.

How it fits together

Batch vs Streaming

Two synthesis modes are available. Same model, same voices, different transport and different "when does the first byte of audio leave the server."

Batch
HTTP POST, returns a complete file

Send text via HTTP POST and receive a complete audio file in a single response.

  • Pre-rendered voice prompts for IVR and telephony systems.
  • Notification audio, order updates, alerts, reminders.
  • Podcast, audiobook, and long-form content generation.
  • Any use case where audio does not need to start playing before synthesis is complete.
Transport
HTTP POST
Endpoint
https://tts.shunyalabs.ai/v1/audio/speech
Auth
Bearer <API_KEY>
Required
text, model, voice
Default format
mp3
1POST text2Server synthesizes3Receive audio
Streaming
WebSocket, chunks arrive in real time

Open a persistent WebSocket connection and receive audio chunks in real time as synthesis happens.

  • Voice agents and conversational AI requiring sub-second audio start.
  • IVR and telephony pipelines.
  • Real-time audio playback in applications.
  • Any use case where audio must begin playing before synthesis of the full text is complete.
Transport
WebSocket
Endpoint
wss://tts.shunyalabs.ai/ws/v1/audio/speech
also: /ws/tts, /ws
Config
TTSConfig
Default format
mp3
1Connect2Receive chunks3Done

Source: Shunyalabs TTS Developer Documentation v1.0 (March 2026), §2.1 Batch overview and §3.1 Streaming overview, text reproduced verbatim.

Endpoints

ModeEndpointDefault format
BatchPOST https://tts.shunyalabs.ai/v1/audio/speechmp3
Streamingwss://tts.shunyalabs.ai/ws/v1/audio/speechmp3 (pcm recommended)
HealthGET https://tts.shunyalabs.ai/health-

Your first synthesis

shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"zero-indic","input":"Hello, how are you today?","voice":"Varun"}' \
  --output hello.mp3
python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        result = await client.tts.synthesize(
            "Hello, how are you today?",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("hello.mp3")

asyncio.run(main())
python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tts.shunyalabs.ai/v1",
)
response = client.audio.speech.create(
    model="zero-indic",
    input="Hello, how are you today?",
    voice="Varun",
    response_format="mp3",
)
response.stream_to_file("output.mp3")

Key features

Required fields, at a glance

json
{
  "model": "zero-indic",      // required, only "zero-indic" today
  "input": "Your text here",  // required, up to 10,000 chars
  "voice": "Varun",           // required, see Voices page for full list
  "response_format": "mp3",   // optional, default mp3
  "speed": 1.0,               // optional, 0.25 to 4.0
  "language": "en",           // optional, ISO code for preprocessing
  "trim_silence": false,      // optional, tight audio when true
  "reference_wav": "...",     // optional, base64 for voice cloning
  "reference_text": "..."     // optional, transcript for voice cloning
}