Pre-recorded Audio — Tips
Getting the Best Accuracy
Practical tips on audio quality, language configuration, and preprocessing to minimize WER.
Audio quality
- Use 16kHz mono audio when possible — it is the recommended input format.
- Remove background music and noise before sending audio to the API.
- For telephony audio (8kHz), the API handles resampling automatically but quality is inherently lower.
Language configuration
- Specify
language_codeexplicitly (e.g."hi") rather than"auto"when you know the language — auto-detection adds a small accuracy penalty. - For audio that mixes Hindi and English (or other Indic + English), use
language_code="auto"— the model handles code-switching natively.
Preprocessing with ffmpeg
terminal
ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav