For builders who want results in five lines

Integrate with the Python SDK

The shunyalabs SDK wraps the REST + WebSocket APIs in an idiomatic async client. Typed configuration, streaming generators, and a clean exception hierarchy, production-grade integration in under a dozen lines of code.

Your journey

Step 1: Install

shell
pip install shunyalabs

Step 2: Configure your API key

The SDK reads SHUNYALABS_API_KEY from the environment by default. Recommended:

shell
export SHUNYALABS_API_KEY="your-api-key"

Or pass directly to the client:

python
from shunyalabs import AsyncShunyaClient

client = AsyncShunyaClient(api_key="your-api-key")

Other env vars: SHUNYALABS_TTS_URL, SHUNYALABS_TTS_WS_URL override the batch and WebSocket endpoints, useful for on-prem deployments.

Step 3: Your first synthesis

python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        result = await client.tts.synthesize(
            "Hello, how are you today?",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("output.mp3")
        print(f"{len(result.audio_data)} bytes saved")

asyncio.run(main())
python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        config = TTSConfig(model="zero-indic", voice="Sunita", response_format="pcm")
        async for audio in await client.tts.stream("Hello!", config=config):
            play(audio)   # your playback function, bytes arrive ~1s after the call

asyncio.run(main())

What batch returns

result is a TTSResult with three fields:

FieldTypeDescription
audio_databytesDecoded audio bytes, write to file or pass to playback.
sample_rateintHz (e.g. 22050 for mp3, 8000 for mulaw).
formatstringMatches the response_format in TTSConfig.

Long-form streaming → disk

For long-form content where buffering the whole response is impractical, use stream_to_file(): constant memory regardless of length:

python
await client.tts.stream_to_file(
    "Long audiobook chapter text...",
    "chapter_01.pcm",
    config=TTSConfig(model="zero-indic", voice="Varun", response_format="pcm"),
)

Step 4: TTSConfig knobs you'll actually use

python
TTSConfig(
    model="zero-indic",          # required
    voice="Rajesh",              # required, see /tts/voices
    response_format="pcm",       # pcm, wav, mp3, ogg_opus, flac, mulaw, alaw
    speed=1.0,                   # 0.25 → 4.0
    language="hi",               # optional ISO 639 hint
    trim_silence=True,           # strip leading/trailing silence
    volume_normalization="loudness",  # "peak" or "loudness"
)

Other knobs worth knowing:

  • background_audio="cafe" + background_volume=0.1: preset ambient mix.
  • reference_wav + reference_text: voice cloning from a 1-6 second sample.
  • word_timestamps=True: per-word timing for captions and alignment (batch only).

Step 5: Error handling

All exceptions inherit from ShunyalabsError. Import from shunyalabs.exceptions:

python
from shunyalabs.exceptions import (
    AuthenticationError, RateLimitError,
    SynthesisError, ServerError, ShunyalabsError,
)

try:
    result = await client.tts.synthesize("Hello!", config=config)
except AuthenticationError:
    print("Invalid API key, check SHUNYALABS_API_KEY")
except RateLimitError:
    print("Rate limit hit, back off and retry")
except SynthesisError as e:
    print(f"Synthesis failed: {e}")
except ServerError:
    print("Server error, safe to retry")
except ShunyalabsError as e:
    print(f"SDK error: {e}")

Step 6: LLM → TTS pipeline (the conversational pattern)

Pipe LLM tokens into TTS at sentence boundaries. Cuts time-to-first-audio by 200-400 ms vs. waiting for the full response.

python
from openai import AsyncOpenAI
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def gpt_to_tts(user_message: str):
    oai     = AsyncOpenAI()
    shunya  = AsyncShunyaClient()
    config  = TTSConfig(model="zero-indic", voice="Sunita", response_format="pcm")

    buffer = ""
    stream = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}],
        stream=True,
    )
    async for chunk in stream:
        token = chunk.choices[0].delta.content or ""
        buffer += token
        if token in (".", "!", "?", ";") and len(buffer) > 15:
            async for audio in await shunya.tts.stream(buffer, config=config):
                play(audio)
            buffer = ""
    if buffer.strip():
        async for audio in await shunya.tts.stream(buffer, config=config):
            play(audio)

Step 7: Ship checklist

  • SHUNYALABS_API_KEY from environment, never hardcoded
  • async with AsyncShunyaClient() for proper connection cleanup
  • ✅ Error handler covers AuthenticationError, RateLimitError, ConnectionError
  • stream_to_file() for long-form synthesis (no memory pressure)
  • ✅ Reconnect / retry on ConnectionError for long-running streaming sessions
  • ✅ Sentence-boundary buffering when piping from an LLM (don't flush per token)
Already have OpenAI SDK code?
You can reuse it as-is, just change the base_url. See OpenAI compatible.