SIP / PSTN

Shunya Labs supports the two G.711 codecs that telephone networks use globally, mulaw and alaw: for both ASR streaming and TTS output. Pick the codec that matches your geography or carrier.

Pick the codec

CodecWhen to useSample rate
mulaw (G.711 μ-law)Most Indian PSTN systems, North American telephony.8 kHz, 8-bit
alaw (G.711 A-law)European telephony systems and some SIP providers.8 kHz, 8-bit

ASR: accept telephony audio over WebSocket

The streaming endpoint accepts μ-law and A-law frames natively, no resampling or codec conversion before sending.

json
// First message after connecting to wss://asr.shunyalabs.ai/ws
{
  "api_key": "your-api-key",
  "language": "hi",
  "sample_rate": 8000,
  "dtype": "ulaw",     // or "alaw"
  "model": "zero-indic"
}

TTS: generate telephony-ready audio

Set response_format to the codec your trunk expects. The output is ready to forward to a SIP RTP stream or an IVR platform's audio sink.

python
# mulaw, North American / Indian PSTN
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
result = await client.tts.synthesize("<Conversational> Hello!", config=config)
# Forward result.audio_data directly to your media stream or IVR system

# alaw, European telephony
config = TTSConfig(model="zero-indic", voice="Rajesh", response_format="alaw")
result = await client.tts.synthesize("Guten Morgen!", config=config)

Why these codecs are first-class here

From the ASR streaming docs, dtype values:

  • ulaw: G.711 mu-law (8-bit), Telephony (8 kHz)
  • alaw: G.711 A-law (8-bit), Telephony (8 kHz)
  • int16: 16-bit signed PCM, General recording
  • float32: 32-bit IEEE float, Pre-processed audio

From the TTS docs:

mulaw (G.711 mu-law) and alaw (G.711 A-law) are the two standard codecs for telephone networks globally. Both encode audio at 8 kHz, 8-bit, which is the standard for PSTN, SIP, and most IVR platforms.

Latency tuning for telephony

For real-time agents over PSTN, lower the silence threshold so the server finalises segments quickly when the caller pauses:

json
{
  "api_key": "your-api-key",
  "model": "zero-indic",
  "language": "hi",
  "sample_rate": 8000,
  "dtype": "ulaw",
  "chunk_size_sec": 1.0,
  "silence_threshold_sec": 0.6
}

The silence_threshold_sec default is 0.8 s; dropping it to 0.5-0.6 makes the server finalise faster, good for interruption handling, but can truncate speakers who pause to think. Tune against your own audio.

WAV headers are auto-detected
If your audio frame has a WAV header, sample rate and dtype are extracted automatically from the first binary frame, you don't need to set them in the init.

Source: Shunyalabs ASR Gateway API Reference (31 March), WebSocket Streaming API → "Audio Formats"; Shunyalabs TTS Developer Documentation v1.0 (March 2026), §8.2 Telephony Formats. Codec descriptions reproduced verbatim.