Batch TTS

TTSConfig — Parameters

All parameters are passed as a TTSConfig object. Only model and voice are required.

Parameters

Parameter	Type	Default	Description
`model`	`string`	`required`	Model name. Use "zero-indic" for all Indic and English text.
`voice`	`string`	`required`	Speaker voice name. See Voices & Languages for the full list.
`response_format`	`string`	`"mp3"`	Output audio format. Values: pcm, wav, mp3, ogg_opus, flac, mulaw, alaw.
`speed`	`float`	`1.0`	Speaking speed multiplier. Range: 0.25 (slowest) to 4.0 (fastest).
`language`	`string`	`null`	ISO 639 language code for text preprocessing. Optional — model handles mixed scripts natively.
`trim_silence`	`bool`	`false`	Remove leading and trailing silence from the audio output.
`volume_normalization`	`string`	`null`	Normalize audio loudness. Values: "peak" (0 dBFS) or "loudness" (EBU R128).
`background_audio`	`string`	`null`	Preset name ("office", "cafe", "rain", "street") or base64-encoded WAV/MP3.
`background_volume`	`float`	`0.1`	Volume of background audio relative to speech. Range: 0.0 to 1.0.
`reference_wav`	`string`	`None`	Base64-encoded reference audio for voice cloning (WAV, FLAC, or OGG). 1–6 seconds recommended.
`reference_text`	`string`	`""`	Transcript of reference audio. Improves cloning quality. Max 500 characters.
`word_timestamps`	`bool`	`false`	Return word-level timestamps alongside audio (batch HTTP mode only).
`max_tokens`	`int`	`2048`	Maximum tokens for LLM generation. Range: 1–8192.