Streaming TTS
Open a persistent WebSocket and receive audio chunks in real time as synthesis happens. Best for voice agents, IVR, and any case where audio has to start playing before the full text is synthesized.
Endpoint
wss://tts.shunyalabs.ai/ws/v1/audio/speech
# Also accepts: /ws/tts, /wsConnection lifecycle
Your first stream
# Install: npm install -g wscat
wscat -c "wss://tts.shunyalabs.ai/ws/v1/audio/speech" \
-H "Authorization: Bearer $SHUNYALABS_API_KEY"
# Send synthesis request
> {"model": "zero-indic", "input": "Hello!", "voice": "Varun", "response_format": "pcm"}
# Receive metadata + binary + completion
< {"type": "chunk", "chunk_index": 0, "format": "pcm", "sample_rate": 16000, ...}
< [binary audio data]
< {"type": "completion", "total_chunks": 3, "total_duration_seconds": 0.8, ...}import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig
async def main():
async with AsyncShunyaClient() as client:
config = TTSConfig(
model="zero-indic",
voice="Varun",
response_format="pcm",
)
with open("output.pcm", "wb") as f:
async for audio in await client.tts.stream("Hello!", config=config):
f.write(audio)
asyncio.run(main())Streaming methods (SDK)
| Method | What it does |
|---|---|
stream(text, config) | Async generator yielding audio bytes per chunk. Primary streaming method. |
stream(text, config, detailed=True) | Yields (chunk_meta, audio_bytes) tuples with chunk_index and timing. |
synthesize_stream(text, config) | Collects all chunks internally and returns combined bytes. Convenience wrapper. |
stream_to_file(text, path, config) | Streams directly to disk. No in-memory buffer required. |
Pick the right variant
All four variants do the same job, generate audio for some text, but each one is shaped for a different consumer. Pick the tab that matches your scenario.
Use when: you want to play or process each audio chunk the moment it arrives, voice agents, IVR, real-time playback.
async for audio in await client.tts.stream("Hello!", config=config):
speaker.write(audio)Use when: you also need per-chunk metadata, index, timing, format, for logging, analytics, or buffering decisions.
async for chunk_meta, audio in await client.tts.stream("Hello!", config=config, detailed=True):
print(f"chunk {chunk_meta.chunk_index}: {len(audio)} bytes")Use when: you need the full audio as a single bytes object and don't care about per-chunk handling. Convenience wrapper that consumes the stream for you.
audio_bytes = await client.tts.synthesize_stream("Hello!", config=config)Use when: long-form synthesis (audiobooks, batch scripts) where buffering the full audio in RAM is impractical. Writes each chunk to disk as it arrives, constant memory regardless of length.
await client.tts.stream_to_file("Hello!", "output.pcm", config=config)Flush & close commands
For interactive sessions where you send multiple text messages over the same WebSocket, two control messages let you drive the lifecycle:
{"type": "flush"}: synthesize whatever buffered text the server has so far, without closing the connection.{"type": "close"}: end the session cleanly.
Flush mid-stream
Client sends:
{"type": "flush"}Server replies (binary audio chunks, then a flushed marker):
[binary audio data]
{"type": "flushed", "sequence_id": 1}Close the session
Client sends:
{"type": "close"}Error handling
Connection-level errors raise on stream start; server errors can arrive mid-stream.
from shunyalabs.exceptions import (
AuthenticationError, ConnectionError, ShunyalabsError,
)
try:
async for audio in await client.tts.stream("Hello!", config=config):
chunks.append(audio)
except AuthenticationError:
print("Invalid API key")
except ConnectionError:
print("WebSocket connection failed, check network or endpoint")
except ShunyalabsError as e:
print(f"SDK error: {e}")Reconnection pattern
MAX_RECONNECT = 3
async def stream_with_reconnect(text, config):
for attempt in range(MAX_RECONNECT):
try:
chunks = []
async for audio in await client.tts.stream(text, config=config):
chunks.append(audio)
return b"".join(chunks)
except ConnectionError:
wait = 2 ** attempt
print(f"Reconnecting in {wait}s...")
await asyncio.sleep(wait)
raise ConnectionError("Max reconnects exceeded")Tips for streaming
Picking the format
- PCM, real-time playback. No decoding overhead, lowest latency.
- mulaw, telephony pipelines. 8 kHz, forward directly to SIP.
- Avoid FLAC, lossless formats require full-file assembly before playback.
Minimising time-to-first-audio
- Pre-open the WebSocket before your LLM call, saves ~200 ms of connection setup.
- Use PCM, no client-side decoding overhead.
- Use shorter sentences as flush units if piping from an LLM, see LLM → TTS pipeline.
Production readiness checklist
- ✅
SHUNYALABS_API_KEYfrom env, not code - ✅ Error handler covers
ConnectionErrorandAuthenticationError - ✅ Reconnection logic for long-running sessions
- ✅ Audio played incrementally per chunk, not buffered
- ✅
stream_to_file()used for long-form synthesis (audiobooks) to avoid memory pressure