ASR API reference
Every endpoint under asr.shunyalabs.ai, their request fields, response shapes, and error codes.
1. Authentication
All endpoints except /health require a bearer token in the Authorization header.
Authorization: Bearer $SHUNYALABS_API_KEY2. POST /v1/audio/transcriptions
Batch transcription of an audio file or URL. Returns the full transcript, per-segment timestamps, and any intelligence results you enabled.
Required fields: file (or url) and model. Content-Type: multipart/form-data. See Configuration for every parameter.
Request:
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@call.wav" \
-F "model=zero-indic" \
-F "language_code=hi" \
-F "response_format=verbose_json"Response (verbose_json):
{
"success": true,
"request_id": "b3f1a2c4-...",
"text": "नमस्ते मोहम्मद जी...",
"segments": [
{ "start": 0.51, "end": 5.70, "text": "...", "speaker": "SPEAKER_00" }
],
"detected_language": "Hindi",
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"audio_duration": 5.7,
"inference_time_ms": 812.3,
"nlp_analysis": {
"intent": { "label": "...", "confidence": 0.92, "reasoning": "..." },
"sentiment": { "label": "neutral", "score": 0.12, "explanation": "..." },
"summary": "...",
"translation": "...",
"normalized_text": "..."
}
}Response (json: minimal, OpenAI-compatible):
{ "text": "नमस्ते मोहम्मद जी..." }3. WebSocket /ws
Full protocol documented in Streaming. Summary of the lifecycle:
- Open
wss://asr.shunyalabs.ai/ws - Send JSON init with
api_key,model,language,sample_rate,dtype - Stream binary audio frames
- Send
"END"/{"type":"end"}/ empty binary to finalize - Receive
ready→speech_start→partial* →speech_end→final_segment→ ... →end_of_transcript→done
4. GET /health
Unauthenticated. Use for deployment smoke tests.
Request:
curl https://asr.shunyalabs.ai/healthResponse:
{
"status": "ok",
"services": {
"triton": "ready",
"qwen3_asr_streaming": "ready",
"qwen3_asr_bls": "ready",
"auth": "connected"
}
}status is "ok" when Triton is healthy, "degraded" otherwise.
5. GET /languages
Returns the full supported language list with ISO codes and script mappings.
Request:
curl https://asr.shunyalabs.ai/languages \
-H "Authorization: Bearer $SHUNYALABS_API_KEY"6. Speaker APIs
Diarization produces SPEAKER_00-style labels. To map those to actual names, register voice profiles using these four endpoints.
6.1 POST /v1/speakers/register
Request:
curl -X POST https://asr.shunyalabs.ai/v1/speakers/register \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "name=Priya" \
-F "file=@priya_sample.wav" \
-F "project=support_team"Response:
{ "success": true, "speaker": "Priya", "message": "Registered successfully" }6.2 GET /v1/speakers/list
Request:
curl "https://asr.shunyalabs.ai/v1/speakers/list?project=support_team" \
-H "Authorization: Bearer $SHUNYALABS_API_KEY"Response:
{
"speakers": [
{ "name": "Priya", "project": "support_team", "registered_at": "2026-03-17T10:00:00" },
{ "name": "Rahul", "project": "support_team", "registered_at": "2026-03-17T10:05:00" }
]
}6.3 POST /v1/speakers/identify
Standalone speaker ID, skip diarization, just identify who is in a clip.
Request:
curl -X POST https://asr.shunyalabs.ai/v1/speakers/identify \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@unknown.wav" \
-F "project=support_team"Response (recognised):
{ "speaker": "Priya", "confidence": 0.91 }Response (below threshold):
{ "speaker": "unknown", "confidence": 0.23 }6.4 DELETE /v1/speakers/delete
Request:
curl -X DELETE https://asr.shunyalabs.ai/v1/speakers/delete \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "name=Priya" \
-F "project=support_team"Response:
{ "success": true }7. HTTP error codes
| Status | Meaning |
|---|---|
| 200 | Success, audio or JSON body returned. |
| 400 | Bad request, missing or malformed fields. Response body: {"detail": "..."}. |
| 401 | Unauthorized, API key invalid or missing. |
| 422 | Synthesis / transcription error (invalid text or config). |
| 429 | Rate limit exceeded. Back off and retry. |
| 500 | Internal server error, unexpected server-side failure. |
| 503 | Service unavailable, Triton or Redis temporarily down. |
| 504 | Gateway timeout, request exceeded the processing window. |
8. Retry patterns
Safe to retry: 429, 500, 502, 503, 504. Not safe: 400, 401, 422: fix the request first.
Exponential backoff (Python)
import time, requests
def transcribe_with_retry(file_path, retries=3):
for attempt in range(retries):
with open(file_path, "rb") as f:
r = requests.post(
"https://asr.shunyalabs.ai/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"model": "zero-indic"},
timeout=120,
)
if r.status_code == 200:
return r.json()
if r.status_code in (429, 500, 502, 503, 504):
time.sleep(2 ** attempt) # 1s, 2s, 4s
continue
r.raise_for_status()
raise RuntimeError("Max retries exceeded")WebSocket reconnection
async def stream_with_reconnect(max_attempts=3):
for attempt in range(max_attempts):
try:
async with websockets.connect("wss://asr.shunyalabs.ai/ws") as ws:
await ws.send(json.dumps({...}))
async for msg in ws:
yield json.loads(msg)
return
except websockets.ConnectionClosed:
await asyncio.sleep(2 ** attempt)
raise RuntimeError("Max reconnects exceeded")9. Rate limits
| Limit | Value |
|---|---|
| Max file size | 500 MB |
| Max audio duration per file | 4 hours |
| Concurrent requests (default tier) | 16 |
| HTTP request timeout | Use at least 120 s for long audio |
| WebSocket inactivity timeout | 300 s (configurable up to 3600 s) |
10. Request IDs
Every response includes request_id (and an X-Request-Id header). Log it, Shunya support uses it to trace issues. On WebSocket, the session_id in the ready event is the equivalent.