Audio formats

Seven output formats covering real-time agents, web streaming, archival storage, and telephony. Pick by use case, they're all the same one-liner change.

Format reference

Format	Value	Content-Type	Sample rate	Best for
MP3	`mp3`	`audio/mpeg`	16 kHz	General storage, delivery, wide platform support.
PCM	`pcm`	`audio/pcm`	16 kHz	Real-time pipelines. Zero decoding overhead.
WAV	`wav`	`audio/wav`	16 kHz	Audio editing and high-quality local files.
OGG Opus	`ogg_opus`	`audio/ogg`	16 kHz	Web streaming. Lower latency than MP3 at similar quality.
FLAC	`flac`	`audio/flac`	16 kHz	Archival and post-production. Lossless.
mulaw	`mulaw`	`audio/basic`	8 kHz	IVR, PSTN telephony.
alaw	`alaw`	`audio/x-alaw`	8 kHz	European telephony standard.

Pick by use case

Scenario	Use	Why
Real-time voice agent	`pcm`	No decoding overhead, constant memory.
IVR telephony	`mulaw`	Native telephony format, 8 kHz.
European telephony / SIP	`alaw`	G.711 A-law standard.
Web app audio player	`mp3` or `ogg_opus`	Smallest download, widest browser support.
Post-production / editing	`wav` or `flac`	Uncompressed or lossless, avoid re-encoding artifacts.
Long-term archival	`flac`	Lossless, future-proof.
Notification system	`mp3` with `trim_silence=True`	Small, tight audio.

Per-format examples

Pick the tab that matches your use case. The only difference between formats is the response_format string.

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")
result = await client.tts.synthesize("Hello!", config=config)

# PCM has no file header. Wrap in WAV for playback:
import wave
with wave.open("output.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)      # 16-bit = 2 bytes
    wf.setframerate(16000)
    wf.writeframes(result.audio_data)

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="wav")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.wav")

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="mp3")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.mp3")

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="ogg_opus")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.ogg")

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="flac")
result = await client.tts.synthesize("Hello!", config=config)
result.save("archive.flac")

python

config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
result = await client.tts.synthesize("<Conversational> Hello!", config=config)
# result.audio_data is ready to send to IVR system

python

config = TTSConfig(model="zero-indic", voice="Rajesh", response_format="alaw")
result = await client.tts.synthesize("Guten Morgen!", config=config)

Telephony: mulaw & alaw

mulaw (G.711 μ-law) and alaw (G.711 A-law) are the two standard codecs for telephone networks globally. Both encode audio at 8 kHz, 8-bit, which is the standard for PSTN, SIP, and most IVR platforms.

Use mulaw for most Indian PSTN systems, and North American telephony.
Use alaw for European telephony systems and some SIP providers.
Add trim_silence=true to remove dead air at the start and end of prompts.

Full telephony example

config = TTSConfig(
    model="zero-indic",
    voice="Sunita",
    response_format="mulaw",
    trim_silence=True,
)
result = await client.tts.synthesize(
    "<Conversational> नमस्ते! अपना खाता नंबर दबाएँ।",
    config=config,
)
# result.audio_data is ready to send to SIP

Don't stream FLAC

FLAC is lossless but requires the full file to decode, streaming it means your player sees one giant chunk instead of incrementally-playable pieces. Use PCM or OGG Opus for streaming instead.

HTTP response headers

Every batch response includes metadata headers so you can inspect the returned audio without parsing the body:

Header	Meaning
`Content-Type`	MIME type matching the requested format
`X-Sample-Rate`	Sample rate of returned audio in Hz
`X-Request-Id`	Unique identifier, log for support escalations