LLM to TTS
Sentence Buffering & Flush Strategy
How to accumulate LLM tokens and flush them to TTS at the right boundaries for natural-sounding speech.
Why flush at sentence boundaries
Flushing at sentence boundaries, not word boundaries, gives the TTS model enough context to produce natural prosody. Single words or short fragments lead to robotic, choppy audio.
Flush strategy
- Accumulate tokens into a
bufferstring as they arrive from the LLM. - On each token, check whether it contains a sentence-ending delimiter:
.!?; - Only flush when the buffer length exceeds a minimum threshold of
15 characters— this prevents flushing on abbreviations like "Dr." or "U.S." - After the LLM stream ends, always flush the remaining buffer regardless of length.
Example implementation
Sentence buffer
SENTENCE_DELIMITERS = {".", "!", "?", ";"}
MIN_FLUSH_LENGTH = 15
buffer = ""
async for token in llm_stream:
buffer += token
# Flush on sentence boundary when buffer is long enough
if token in SENTENCE_DELIMITERS and len(buffer) >= MIN_FLUSH_LENGTH:
async for audio in await client.tts.stream(buffer, config=config):
yield audio
buffer = ""
# Always flush remaining text after LLM stream ends
if buffer.strip():
async for audio in await client.tts.stream(buffer, config=config):
yield audio