LLM to TTS

Sentence Buffering & Flush Strategy

How to accumulate LLM tokens and flush them to TTS at the right boundaries for natural-sounding speech.


Why flush at sentence boundaries

Flushing at sentence boundaries, not word boundaries, gives the TTS model enough context to produce natural prosody. Single words or short fragments lead to robotic, choppy audio.


Flush strategy

  1. Accumulate tokens into a buffer string as they arrive from the LLM.
  2. On each token, check whether it contains a sentence-ending delimiter: . ! ? ;
  3. Only flush when the buffer length exceeds a minimum threshold of 15 characters — this prevents flushing on abbreviations like "Dr." or "U.S."
  4. After the LLM stream ends, always flush the remaining buffer regardless of length.

Example implementation

Sentence buffer
SENTENCE_DELIMITERS = {".", "!", "?", ";"}
MIN_FLUSH_LENGTH = 15

buffer = ""

async for token in llm_stream:
    buffer += token

    # Flush on sentence boundary when buffer is long enough
    if token in SENTENCE_DELIMITERS and len(buffer) >= MIN_FLUSH_LENGTH:
        async for audio in await client.tts.stream(buffer, config=config):
            yield audio
        buffer = ""

# Always flush remaining text after LLM stream ends
if buffer.strip():
    async for audio in await client.tts.stream(buffer, config=config):
        yield audio