Audio formats
Seven output formats covering real-time agents, web streaming, archival storage, and telephony. Pick by use case, they're all the same one-liner change.
Format reference
| Format | Value | Content-Type | Sample rate | Best for |
|---|---|---|---|---|
| MP3 | mp3 | audio/mpeg | 16 kHz | General storage, delivery, wide platform support. |
| PCM | pcm | audio/pcm | 16 kHz | Real-time pipelines. Zero decoding overhead. |
| WAV | wav | audio/wav | 16 kHz | Audio editing and high-quality local files. |
| OGG Opus | ogg_opus | audio/ogg | 16 kHz | Web streaming. Lower latency than MP3 at similar quality. |
| FLAC | flac | audio/flac | 16 kHz | Archival and post-production. Lossless. |
| mulaw | mulaw | audio/basic | 8 kHz | IVR, PSTN telephony. |
| alaw | alaw | audio/x-alaw | 8 kHz | European telephony standard. |
Pick by use case
| Scenario | Use | Why |
|---|---|---|
| Real-time voice agent | pcm | No decoding overhead, constant memory. |
| IVR telephony | mulaw | Native telephony format, 8 kHz. |
| European telephony / SIP | alaw | G.711 A-law standard. |
| Web app audio player | mp3 or ogg_opus | Smallest download, widest browser support. |
| Post-production / editing | wav or flac | Uncompressed or lossless, avoid re-encoding artifacts. |
| Long-term archival | flac | Lossless, future-proof. |
| Notification system | mp3 with trim_silence=True | Small, tight audio. |
Per-format examples
Pick the tab that matches your use case. The only difference between formats is the response_format string.
python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")
result = await client.tts.synthesize("Hello!", config=config)
# PCM has no file header. Wrap in WAV for playback:
import wave
with wave.open("output.wav", "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2) # 16-bit = 2 bytes
wf.setframerate(16000)
wf.writeframes(result.audio_data)python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="wav")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.wav")python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mp3")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.mp3")python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="ogg_opus")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.ogg")python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="flac")
result = await client.tts.synthesize("Hello!", config=config)
result.save("archive.flac")python
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
result = await client.tts.synthesize("<Conversational> Hello!", config=config)
# result.audio_data is ready to send to IVR systempython
config = TTSConfig(model="zero-indic", voice="Rajesh", response_format="alaw")
result = await client.tts.synthesize("Guten Morgen!", config=config)Telephony: mulaw & alaw
mulaw (G.711 μ-law) and alaw (G.711 A-law) are the two standard codecs for telephone networks globally. Both encode audio at 8 kHz, 8-bit, which is the standard for PSTN, SIP, and most IVR platforms.
- Use mulaw for most Indian PSTN systems, and North American telephony.
- Use alaw for European telephony systems and some SIP providers.
- Add
trim_silence=trueto remove dead air at the start and end of prompts.
Full telephony example
python
config = TTSConfig(
model="zero-indic",
voice="Sunita",
response_format="mulaw",
trim_silence=True,
)
result = await client.tts.synthesize(
"<Conversational> नमस्ते! अपना खाता नंबर दबाएँ।",
config=config,
)
# result.audio_data is ready to send to SIPDon't stream FLAC
FLAC is lossless but requires the full file to decode, streaming it means your player sees one giant chunk instead of incrementally-playable pieces. Use PCM or OGG Opus for streaming instead.HTTP response headers
Every batch response includes metadata headers so you can inspect the returned audio without parsing the body:
| Header | Meaning |
|---|---|
Content-Type | MIME type matching the requested format |
X-Sample-Rate | Sample rate of returned audio in Hz |
X-Request-Id | Unique identifier, log for support escalations |