Media Input Settings

Audio Encoding Guide

How to prepare audio correctly before sending to the API.


Best practices

  • Convert to mono (single channel) — stereo files are accepted but add unnecessary size.
  • Resample to 16,000 Hz for optimal accuracy on batch transcription.
  • Normalize audio loudness to -16 LUFS for consistent model performance.
  • Remove background music where possible — voice-only audio produces the best WER.

ffmpeg one-liner

terminal
ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav