Pre-recorded Audio — Tips

Getting the Best Accuracy

Practical tips on audio quality, language configuration, and preprocessing to minimize WER.


Audio quality

  • Use 16kHz mono audio when possible — it is the recommended input format.
  • Remove background music and noise before sending audio to the API.
  • For telephony audio (8kHz), the API handles resampling automatically but quality is inherently lower.

Language configuration

  • Specify language_code explicitly (e.g. "hi") rather than "auto" when you know the language — auto-detection adds a small accuracy penalty.
  • For audio that mixes Hindi and English (or other Indic + English), use language_code="auto" — the model handles code-switching natively.

Preprocessing with ffmpeg

terminal
ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav