ASR models
Four speech recognition models tuned for different domains. Pass the name in the model field. You can't mix models within a single request, but you can switch between requests freely.
Side-by-side
| Model | Languages | Special behaviour | Price ($/min) |
|---|---|---|---|
zero-indic | 55+ Indian languages | Standard transcription | $0.0045 |
zero-universal | 204 languages | Standard transcription, strongest on English | $0.0039 |
zero-med | English + Indic | Auto medical-terminology correction (MedGemma) | $0.0050 |
zero-codeswitch | Hinglish, Tanglish, etc. | Auto English-token restoration to Latin script | $0.0050 |
zero-indic
The default for Indian languages. Native handling of Hindi, Tamil, Telugu, Kannada, Marathi, Bengali, Gujarati, Punjabi, Malayalam, Odia, Urdu, Assamese, Maithili, and more.
When to use
- Single-language Indic audio where you know (or can detect) the language
- Call-centre audio for Indian customers in their native tongue
- Meetings, interviews, dictation, media transcription
Example
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@call.wav" \
-F "model=zero-indic" \
-F "language_code=hi"zero-universal
99-language Whisper-class foundation model (docs say 204; the published leaderboard number varies by evaluation set). Strongest on English and European languages. Use when the input could be anything.
When to use
- English or non-Indian-language content
- Audio where you don't know the language in advance, set
language_code=auto - International meetings with mixed English and other languages
Benchmarks
Composite WER on the HuggingFace Open ASR leaderboard: 3.10%, claimed 48% fewer errors than the next-best model.
See per-dataset numbers →Example
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@podcast.wav" \
-F "model=zero-universal"zero-med
Clinical-grade. Trained to recognise drug names, procedures, anatomy, dosages, and diagnostic terms. Auto-applies medical terminology correction via MedGemma (no extra flag needed).
When to use
- Doctor-patient consultations
- OT / surgical notes dictation
- Case note transcription for EHR ingest
Compliance
Example
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@consult.wav" \
-F "model=zero-med" \
-F "language_code=en"Output shape
Same response schema as other models; the value-add is in the text itself, drug names in correct spelling, procedure names in standard terminology. Pair with enable_keyterm_normalization for further normalization against your own glossary.
zero-codeswitch
Tuned for speech that mixes two languages within a single utterance, Hinglish (mujhe yeh passbook update karna hai), Tanglish, Banglish, etc. Automatically restores English words to Latin script so the final transcript reads cleanly.
When to use
- Urban Indian call-centre conversations (customers routinely switch mid-sentence)
- Social-media, podcast, or creator content with code-mixed speech
- Any conversation where separating languages would force unnatural phrasing
Example
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@hinglish.wav" \
-F "model=zero-codeswitch"What the output looks like
{
"text": "मुझे अपना EMI details check करना है",
"detected_language": "Hinglish",
"segments": [
{ "start": 0.3, "end": 3.1, "text": "मुझे अपना EMI details check करना है" }
]
}Picking the right model, decision flow
zero-indic if the speakers are Indian; zero-universal otherwise. Swap to zero-codeswitch if you see English words coming through phonetically in the Indic transcript, or zero-med if drug/procedure terms are being butchered.