Media Input Settings

Audio Encoding Guide

How to prepare audio correctly before sending to the API.

Best practices

Convert to mono (single channel) — stereo files are accepted but add unnecessary size.
Resample to 16,000 Hz for optimal accuracy on batch transcription.
Normalize audio loudness to -16 LUFS for consistent model performance.
Remove background music where possible — voice-only audio produces the best WER.

terminal

ffmpeg -i input.mp3 -ar 16000 -ac 1 -af loudnorm output.wav