Batch TTS
TTSConfig — Parameters
All parameters are passed as a TTSConfig object. Only model and voice are required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model name. Use "zero-indic" for all Indic and English text. |
voice | string | required | Speaker voice name. See Voices & Languages for the full list. |
response_format | string | "mp3" | Output audio format. Values: pcm, wav, mp3, ogg_opus, flac, mulaw, alaw. |
speed | float | 1.0 | Speaking speed multiplier. Range: 0.25 (slowest) to 4.0 (fastest). |
language | string | null | ISO 639 language code for text preprocessing. Optional — model handles mixed scripts natively. |
trim_silence | bool | false | Remove leading and trailing silence from the audio output. |
volume_normalization | string | null | Normalize audio loudness. Values: "peak" (0 dBFS) or "loudness" (EBU R128). |
background_audio | string | null | Preset name ("office", "cafe", "rain", "street") or base64-encoded WAV/MP3. |
background_volume | float | 0.1 | Volume of background audio relative to speech. Range: 0.0 to 1.0. |
reference_wav | string | None | Base64-encoded reference audio for voice cloning (WAV, FLAC, or OGG). 1–6 seconds recommended. |
reference_text | string | "" | Transcript of reference audio. Improves cloning quality. Max 500 characters. |
word_timestamps | bool | false | Return word-level timestamps alongside audio (batch HTTP mode only). |
max_tokens | int | 2048 | Maximum tokens for LLM generation. Range: 1–8192. |