Media Input Settings
Audio Encoding Guide
How to prepare audio correctly before sending to the API.
Best practices
- Convert to mono (single channel) — stereo files are accepted but add unnecessary size.
- Resample to 16,000 Hz for optimal accuracy on batch transcription.
- Normalize audio loudness to -16 LUFS for consistent model performance.
- Remove background music where possible — voice-only audio produces the best WER.
ffmpeg one-liner
terminal
ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav