Voice Cloning
Voice Cloning
Clone any speaker voice by providing a short reference audio sample. Works across all 23 supported Indic languages.
Voice Cloning is now live:
Use the
reference_wav and reference_text parameters in your requests to clone any voice from a short audio sample.Overview
Voice cloning allows you to reproduce the vocal characteristics of any speaker by providing a short reference audio sample. Instead of selecting a pre-built voice, you supply a recording of the target speaker and the system generates speech that matches their timbre, accent, and speaking style.
The feature is controlled by two parameters:
reference_wav— A base64-encoded audio file (WAV, FLAC, or OGG) containing 1 to 6 seconds of clear speech from the target speaker.reference_text— The transcript of the reference audio. Providing this significantly improves cloning quality by helping the model align the audio to the spoken content.
Once you provide a reference sample, the cloned voice can synthesize speech in any of the 23 supported languages — the same cross-lingual capability available with pre-built voices.