Voice Cloning

Voice Cloning

Clone any speaker voice by providing a short reference audio sample. Works across all 23 supported Indic languages.


Voice Cloning is now live:
Use the reference_wav and reference_text parameters in your requests to clone any voice from a short audio sample.

Overview

Voice cloning allows you to reproduce the vocal characteristics of any speaker by providing a short reference audio sample. Instead of selecting a pre-built voice, you supply a recording of the target speaker and the system generates speech that matches their timbre, accent, and speaking style.

The feature is controlled by two parameters:

  • reference_wav — A base64-encoded audio file (WAV, FLAC, or OGG) containing 1 to 6 seconds of clear speech from the target speaker.
  • reference_text — The transcript of the reference audio. Providing this significantly improves cloning quality by helping the model align the audio to the spoken content.

Once you provide a reference sample, the cloned voice can synthesize speech in any of the 23 supported languages — the same cross-lingual capability available with pre-built voices.