ASR API reference

Every endpoint under asr.shunyalabs.ai, their request fields, response shapes, and error codes.

1. Authentication

All endpoints except /health require a bearer token in the Authorization header.

http
Authorization: Bearer $SHUNYALABS_API_KEY

2. POST /v1/audio/transcriptions

Batch transcription of an audio file or URL. Returns the full transcript, per-segment timestamps, and any intelligence results you enabled.

Required fields: file (or url) and model. Content-Type: multipart/form-data. See Configuration for every parameter.

Request:

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" \
  -F "model=zero-indic" \
  -F "language_code=hi" \
  -F "response_format=verbose_json"

Response (verbose_json):

json
{
  "success": true,
  "request_id": "b3f1a2c4-...",
  "text": "नमस्ते मोहम्मद जी...",
  "segments": [
    { "start": 0.51, "end": 5.70, "text": "...", "speaker": "SPEAKER_00" }
  ],
  "detected_language": "Hindi",
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "audio_duration": 5.7,
  "inference_time_ms": 812.3,
  "nlp_analysis": {
    "intent": { "label": "...", "confidence": 0.92, "reasoning": "..." },
    "sentiment": { "label": "neutral", "score": 0.12, "explanation": "..." },
    "summary": "...",
    "translation": "...",
    "normalized_text": "..."
  }
}

Response (json: minimal, OpenAI-compatible):

json
{ "text": "नमस्ते मोहम्मद जी..." }

3. WebSocket /ws

Full protocol documented in Streaming. Summary of the lifecycle:

  1. Open wss://asr.shunyalabs.ai/ws
  2. Send JSON init with api_key, model, language, sample_rate, dtype
  3. Stream binary audio frames
  4. Send "END" / {"type":"end"} / empty binary to finalize
  5. Receive readyspeech_startpartial* → speech_endfinal_segment → ... → end_of_transcript done

4. GET /health

Unauthenticated. Use for deployment smoke tests.

Request:

shell
curl https://asr.shunyalabs.ai/health

Response:

json
{
  "status": "ok",
  "services": {
    "triton": "ready",
    "qwen3_asr_streaming": "ready",
    "qwen3_asr_bls": "ready",
    "auth": "connected"
  }
}

status is "ok" when Triton is healthy, "degraded" otherwise.

5. GET /languages

Returns the full supported language list with ISO codes and script mappings.

Request:

shell
curl https://asr.shunyalabs.ai/languages \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY"

6. Speaker APIs

Diarization produces SPEAKER_00-style labels. To map those to actual names, register voice profiles using these four endpoints.

Reference clip requirements
5-15 seconds, speaker alone, no background music, no overlapping voices, 16 kHz or higher sample rate.

6.1 POST /v1/speakers/register

Request:

shell
curl -X POST https://asr.shunyalabs.ai/v1/speakers/register \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "name=Priya" \
  -F "file=@priya_sample.wav" \
  -F "project=support_team"

Response:

json
{ "success": true, "speaker": "Priya", "message": "Registered successfully" }

6.2 GET /v1/speakers/list

Request:

shell
curl "https://asr.shunyalabs.ai/v1/speakers/list?project=support_team" \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY"

Response:

json
{
  "speakers": [
    { "name": "Priya", "project": "support_team", "registered_at": "2026-03-17T10:00:00" },
    { "name": "Rahul", "project": "support_team", "registered_at": "2026-03-17T10:05:00" }
  ]
}

6.3 POST /v1/speakers/identify

Standalone speaker ID, skip diarization, just identify who is in a clip.

Request:

shell
curl -X POST https://asr.shunyalabs.ai/v1/speakers/identify \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@unknown.wav" \
  -F "project=support_team"

Response (recognised):

json
{ "speaker": "Priya", "confidence": 0.91 }

Response (below threshold):

json
{ "speaker": "unknown", "confidence": 0.23 }

6.4 DELETE /v1/speakers/delete

Request:

shell
curl -X DELETE https://asr.shunyalabs.ai/v1/speakers/delete \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "name=Priya" \
  -F "project=support_team"

Response:

json
{ "success": true }

7. HTTP error codes

StatusMeaning
200Success, audio or JSON body returned.
400Bad request, missing or malformed fields. Response body: {"detail": "..."}.
401Unauthorized, API key invalid or missing.
422Synthesis / transcription error (invalid text or config).
429Rate limit exceeded. Back off and retry.
500Internal server error, unexpected server-side failure.
503Service unavailable, Triton or Redis temporarily down.
504Gateway timeout, request exceeded the processing window.

8. Retry patterns

Safe to retry: 429, 500, 502, 503, 504. Not safe: 400, 401, 422: fix the request first.

Exponential backoff (Python)

python
import time, requests

def transcribe_with_retry(file_path, retries=3):
    for attempt in range(retries):
        with open(file_path, "rb") as f:
            r = requests.post(
                "https://asr.shunyalabs.ai/v1/audio/transcriptions",
                headers={"Authorization": f"Bearer {API_KEY}"},
                files={"file": f},
                data={"model": "zero-indic"},
                timeout=120,
            )
        if r.status_code == 200:
            return r.json()
        if r.status_code in (429, 500, 502, 503, 504):
            time.sleep(2 ** attempt)  # 1s, 2s, 4s
            continue
        r.raise_for_status()
    raise RuntimeError("Max retries exceeded")

WebSocket reconnection

python
async def stream_with_reconnect(max_attempts=3):
    for attempt in range(max_attempts):
        try:
            async with websockets.connect("wss://asr.shunyalabs.ai/ws") as ws:
                await ws.send(json.dumps({...}))
                async for msg in ws:
                    yield json.loads(msg)
                return
        except websockets.ConnectionClosed:
            await asyncio.sleep(2 ** attempt)
    raise RuntimeError("Max reconnects exceeded")

9. Rate limits

LimitValue
Max file size500 MB
Max audio duration per file4 hours
Concurrent requests (default tier)16
HTTP request timeoutUse at least 120 s for long audio
WebSocket inactivity timeout300 s (configurable up to 3600 s)

10. Request IDs

Every response includes request_id (and an X-Request-Id header). Log it, Shunya support uses it to trace issues. On WebSocket, the session_id in the ready event is the equivalent.