Audio formats

Seven output formats covering real-time agents, web streaming, archival storage, and telephony. Pick by use case, they're all the same one-liner change.

Format reference

FormatValueContent-TypeSample rateBest for
MP3mp3audio/mpeg16 kHzGeneral storage, delivery, wide platform support.
PCMpcmaudio/pcm16 kHzReal-time pipelines. Zero decoding overhead.
WAVwavaudio/wav16 kHzAudio editing and high-quality local files.
OGG Opusogg_opusaudio/ogg16 kHzWeb streaming. Lower latency than MP3 at similar quality.
FLACflacaudio/flac16 kHzArchival and post-production. Lossless.
mulawmulawaudio/basic8 kHzIVR, PSTN telephony.
alawalawaudio/x-alaw8 kHzEuropean telephony standard.

Pick by use case

ScenarioUseWhy
Real-time voice agentpcmNo decoding overhead, constant memory.
IVR telephonymulawNative telephony format, 8 kHz.
European telephony / SIPalawG.711 A-law standard.
Web app audio playermp3 or ogg_opusSmallest download, widest browser support.
Post-production / editingwav or flacUncompressed or lossless, avoid re-encoding artifacts.
Long-term archivalflacLossless, future-proof.
Notification systemmp3 with trim_silence=TrueSmall, tight audio.

Per-format examples

Pick the tab that matches your use case. The only difference between formats is the response_format string.

python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")
result = await client.tts.synthesize("Hello!", config=config)

# PCM has no file header. Wrap in WAV for playback:
import wave
with wave.open("output.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)      # 16-bit = 2 bytes
    wf.setframerate(16000)
    wf.writeframes(result.audio_data)
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="wav")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.wav")
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mp3")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.mp3")
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="ogg_opus")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.ogg")
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="flac")
result = await client.tts.synthesize("Hello!", config=config)
result.save("archive.flac")
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
result = await client.tts.synthesize("<Conversational> Hello!", config=config)
# result.audio_data is ready to send to IVR system
python
config = TTSConfig(model="zero-indic", voice="Rajesh", response_format="alaw")
result = await client.tts.synthesize("Guten Morgen!", config=config)

Telephony: mulaw & alaw

mulaw (G.711 μ-law) and alaw (G.711 A-law) are the two standard codecs for telephone networks globally. Both encode audio at 8 kHz, 8-bit, which is the standard for PSTN, SIP, and most IVR platforms.

  • Use mulaw for most Indian PSTN systems, and North American telephony.
  • Use alaw for European telephony systems and some SIP providers.
  • Add trim_silence=true to remove dead air at the start and end of prompts.

Full telephony example

python
config = TTSConfig(
    model="zero-indic",
    voice="Sunita",
    response_format="mulaw",
    trim_silence=True,
)
result = await client.tts.synthesize(
    "<Conversational> नमस्ते! अपना खाता नंबर दबाएँ।",
    config=config,
)
# result.audio_data is ready to send to SIP
Don't stream FLAC
FLAC is lossless but requires the full file to decode, streaming it means your player sees one giant chunk instead of incrementally-playable pieces. Use PCM or OGG Opus for streaming instead.

HTTP response headers

Every batch response includes metadata headers so you can inspect the returned audio without parsing the body:

HeaderMeaning
Content-TypeMIME type matching the requested format
X-Sample-RateSample rate of returned audio in Hz
X-Request-IdUnique identifier, log for support escalations