Streaming TTS

Text to Speech — Streaming

Open a persistent WebSocket connection and receive audio chunks in real time.


How it works

Streaming synthesis opens a WebSocket to the TTS server and yields audio chunks as they are generated, enabling real-time playback before the full utterance is complete.

PropertyValue
TransportWebSocket
Endpointwss://tts.shunyalabs.ai/ws/v1/audio/speech (also: /ws/tts, /ws)
ConfigTTSConfig
Default formatmp3

When to use streaming

  • Conversational AI and voice assistants that need low-latency audio
  • Real-time narration where playback must begin before synthesis completes
  • Telephony and call-center bots that stream audio to callers
  • Any use case where time-to-first-byte matters