SIP / PSTN
Shunya Labs supports the two G.711 codecs that telephone networks use globally, mulaw and alaw: for both ASR streaming and TTS output. Pick the codec that matches your geography or carrier.
Pick the codec
| Codec | When to use | Sample rate |
|---|---|---|
mulaw (G.711 μ-law) | Most Indian PSTN systems, North American telephony. | 8 kHz, 8-bit |
alaw (G.711 A-law) | European telephony systems and some SIP providers. | 8 kHz, 8-bit |
ASR: accept telephony audio over WebSocket
The streaming endpoint accepts μ-law and A-law frames natively, no resampling or codec conversion before sending.
// First message after connecting to wss://asr.shunyalabs.ai/ws
{
"api_key": "your-api-key",
"language": "hi",
"sample_rate": 8000,
"dtype": "ulaw", // or "alaw"
"model": "zero-indic"
}TTS: generate telephony-ready audio
Set response_format to the codec your trunk expects. The output is ready to forward to a SIP RTP stream or an IVR platform's audio sink.
# mulaw, North American / Indian PSTN
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
result = await client.tts.synthesize("<Conversational> Hello!", config=config)
# Forward result.audio_data directly to your media stream or IVR system
# alaw, European telephony
config = TTSConfig(model="zero-indic", voice="Rajesh", response_format="alaw")
result = await client.tts.synthesize("Guten Morgen!", config=config)Why these codecs are first-class here
From the ASR streaming docs, dtype values:
ulaw: G.711 mu-law (8-bit), Telephony (8 kHz)alaw: G.711 A-law (8-bit), Telephony (8 kHz)int16: 16-bit signed PCM, General recordingfloat32: 32-bit IEEE float, Pre-processed audio
From the TTS docs:
Latency tuning for telephony
For real-time agents over PSTN, lower the silence threshold so the server finalises segments quickly when the caller pauses:
{
"api_key": "your-api-key",
"model": "zero-indic",
"language": "hi",
"sample_rate": 8000,
"dtype": "ulaw",
"chunk_size_sec": 1.0,
"silence_threshold_sec": 0.6
}The silence_threshold_sec default is 0.8 s; dropping it to 0.5-0.6 makes the server finalise faster, good for interruption handling, but can truncate speakers who pause to think. Tune against your own audio.
Source: Shunyalabs ASR Gateway API Reference (31 March), WebSocket Streaming API → "Audio Formats"; Shunyalabs TTS Developer Documentation v1.0 (March 2026), §8.2 Telephony Formats. Codec descriptions reproduced verbatim.