Section 3
Streaming Audio
Open a persistent WebSocket connection and receive real-time transcript events as audio is spoken.
How it works
| Property | Value |
|---|---|
| Transport | WebSocket |
| Endpoint | wss://asr.shunyalabs.ai/ws |
| Config object | StreamingConfig |
| Audio format | Raw PCM — 16kHz mono, int16 (default) |
| NLP features | Core transcription only — use batch for NLP enrichment |
Event types
| Event | Model | Key Attributes |
|---|---|---|
PARTIAL | StreamingPartial | text, language, segment_id, latency_ms |
FINAL_SEGMENT | StreamingFinalSegment | text, language, segment_id, silence_duration_ms |
FINAL | StreamingFinal | text, language, audio_duration_sec, inference_time_ms |
DONE | StreamingDone | total_segments, total_audio_duration_sec |
ERROR | StreamingError | message, code |
Use cases
- Voice agents and conversational AI
- Live captions and real-time subtitling
- IVR and telephony pipelines
- Any use case requiring sub-second transcript latency