Batch TTS

TTSConfig — Parameters

All parameters are passed as a TTSConfig object. Only model and voice are required.


Parameters

ParameterTypeDefaultDescription
modelstringrequiredModel name. Use "zero-indic" for all Indic and English text.
voicestringrequiredSpeaker voice name. See Voices & Languages for the full list.
response_formatstring"mp3"Output audio format. Values: pcm, wav, mp3, ogg_opus, flac, mulaw, alaw.
speedfloat1.0Speaking speed multiplier. Range: 0.25 (slowest) to 4.0 (fastest).
languagestringnullISO 639 language code for text preprocessing. Optional — model handles mixed scripts natively.
trim_silenceboolfalseRemove leading and trailing silence from the audio output.
volume_normalizationstringnullNormalize audio loudness. Values: "peak" (0 dBFS) or "loudness" (EBU R128).
background_audiostringnullPreset name ("office", "cafe", "rain", "street") or base64-encoded WAV/MP3.
background_volumefloat0.1Volume of background audio relative to speech. Range: 0.0 to 1.0.
reference_wavstringNoneBase64-encoded reference audio for voice cloning (WAV, FLAC, or OGG). 1–6 seconds recommended.
reference_textstring""Transcript of reference audio. Improves cloning quality. Max 500 characters.
word_timestampsboolfalseReturn word-level timestamps alongside audio (batch HTTP mode only).
max_tokensint2048Maximum tokens for LLM generation. Range: 1–8192.