Pre-recorded Audio — Tips

Getting the Best Accuracy

Practical tips on audio quality, language configuration, and preprocessing to minimize WER.

Audio quality

Use 16kHz mono audio when possible — it is the recommended input format.
Remove background music and noise before sending audio to the API.
For telephony audio (8kHz), the API handles resampling automatically but quality is inherently lower.

Specify language_code explicitly (e.g. "hi") rather than "auto" when you know the language — auto-detection adds a small accuracy penalty.
For audio that mixes Hindi and English (or other Indic + English), use language_code="auto" — the model handles code-switching natively.

terminal

ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav