Section 3

Streaming Audio

Open a persistent WebSocket connection and receive real-time transcript events as audio is spoken.


How it works

PropertyValue
TransportWebSocket
Endpointwss://asr.shunyalabs.ai/ws
Config objectStreamingConfig
Audio formatRaw PCM — 16kHz mono, int16 (default)
NLP featuresCore transcription only — use batch for NLP enrichment

Event types

EventModelKey Attributes
PARTIALStreamingPartialtext, language, segment_id, latency_ms
FINAL_SEGMENTStreamingFinalSegmenttext, language, segment_id, silence_duration_ms
FINALStreamingFinaltext, language, audio_duration_sec, inference_time_ms
DONEStreamingDonetotal_segments, total_audio_duration_sec
ERRORStreamingErrormessage, code

Use cases

  • Voice agents and conversational AI
  • Live captions and real-time subtitling
  • IVR and telephony pipelines
  • Any use case requiring sub-second transcript latency