TTS quickstart

Install, authenticate, synthesize. By the end of this page you'll have an MP3 on disk and know how to switch voice, language, speed, and format.

1. Install the SDK (optional)

shell
pip install "shunyalabs[TTS]"         # TTS only
pip install "shunyalabs[all]"         # TTS + ASR + everything
pip install "shunyalabs[extras]"      # + audio playback helpers (sounddevice)

You can also call the REST API directly with requests or any HTTP client, the SDK is just a thin wrapper.

2. Configure authentication

shell
export SHUNYALABS_API_KEY="sk-your-key"

Or pass it in code:

python
client = AsyncShunyaClient(api_key="sk-your-key")

3. First synthesis

shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "zero-indic", "input": "Hello, how are you today?", "voice": "Varun"}' \
  --output output.mp3
python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        result = await client.tts.synthesize(
            "Hello, how are you today?",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("output.mp3")
        print(f"{len(result.audio_data)} bytes saved, {result.sample_rate} Hz")

asyncio.run(main())
python
import requests

response = requests.post(
    "https://tts.shunyalabs.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"model": "zero-indic", "input": "Hello!", "voice": "Varun"},
    timeout=120,
)
response.raise_for_status()
with open("output.mp3", "wb") as f:
    f.write(response.content)
python
from openai import OpenAI

client = OpenAI(api_key=API_KEY, base_url="https://tts.shunyalabs.ai/v1")
response = client.audio.speech.create(
    model="zero-indic",
    input="Hello!",
    voice="Varun",
    response_format="mp3",
)
response.stream_to_file("output.mp3")

4. Switch voice, language, speed, format

Pick a different voice

python
# Hindi female
TTSConfig(model="zero-indic", voice="Sunita")

# Tamil male
TTSConfig(model="zero-indic", voice="Murugan")

# English female
TTSConfig(model="zero-indic", voice="Nisha")

46 voices total. See Voices & languages for the full catalogue.

Change speed

python
TTSConfig(model="zero-indic", voice="Nisha", speed=1.3)   # fast notifications
TTSConfig(model="zero-indic", voice="Nisha", speed=0.85)  # slower dictation

Change output format

python
TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")    # real-time playback
TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")  # telephony
TTSConfig(model="zero-indic", voice="Varun", response_format="wav")    # editing

Full format list at Audio formats.

Add an expression style

python
await client.tts.synthesize(
    "<Happy> Welcome aboard!",
    config=TTSConfig(model="zero-indic", voice="Sunita"),
)

11 styles: Happy, Sad, Angry, Fearful, Surprised, Disgust, News, Conversational, Narrative, Enthusiastic, Neutral. See Expression styles.

5. Stream it

For real-time use (voice agents, IVR), stream audio as it synthesizes instead of waiting for the full file:

python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")

async for chunk in await client.tts.stream("Hello!", config=config):
    # play chunk bytes as they arrive
    speaker.write(chunk)

Full streaming details at Streaming.

6. Handle errors

python
from shunyalabs.exceptions import (
    AuthenticationError, RateLimitError,
    SynthesisError, ServerError, ShunyalabsError,
)

try:
    result = await client.tts.synthesize("Hello!", config=config)
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limited, back off and retry")
except SynthesisError as e:
    print(f"Bad input: {e}")
except ServerError:
    print("Server error, safe to retry")
except ShunyalabsError as e:
    print(f"SDK error: {e}")
You're done
You now have text → audio working. Next, skim voices and audio formats to pick the right ones for your use case.