ASR models

Four speech recognition models tuned for different domains. Pass the name in the model field. You can't mix models within a single request, but you can switch between requests freely.

Side-by-side

Model	Languages	Special behaviour	Price ($/min)
`zero-indic`	55+ Indian languages	Standard transcription	$0.0045
`zero-universal`	204 languages	Standard transcription, strongest on English	$0.0039
`zero-med`	English + Indic	Auto medical-terminology correction (MedGemma)	$0.0050
`zero-codeswitch`	Hinglish, Tanglish, etc.	Auto English-token restoration to Latin script	$0.0050

`zero-indic`

The default for Indian languages. Native handling of Hindi, Tamil, Telugu, Kannada, Marathi, Bengali, Gujarati, Punjabi, Malayalam, Odia, Urdu, Assamese, Maithili, and more.

When to use

Single-language Indic audio where you know (or can detect) the language
Call-centre audio for Indian customers in their native tongue
Meetings, interviews, dictation, media transcription

Example

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" \
  -F "model=zero-indic" \
  -F "language_code=hi"

99-language Whisper-class foundation model (docs say 204; the published leaderboard number varies by evaluation set). Strongest on English and European languages. Use when the input could be anything.

When to use

English or non-Indian-language content
Audio where you don't know the language in advance, set language_code=auto
International meetings with mixed English and other languages

Benchmarks

Composite WER on the HuggingFace Open ASR leaderboard: 3.10%, claimed 48% fewer errors than the next-best model.

See per-dataset numbers →

Example

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@podcast.wav" \
  -F "model=zero-universal"

`zero-med`

Clinical-grade. Trained to recognise drug names, procedures, anatomy, dosages, and diagnostic terms. Auto-applies medical terminology correction via MedGemma (no extra flag needed).

When to use

Doctor-patient consultations
OT / surgical notes dictation
Case note transcription for EHR ingest

Compliance

HIPAA path

Zero STT Med is the model cleared for PHI handling under the Shunya BAA. Route PHI only through this model, and prefer on-prem deployment for the strongest data-residency story.

Example

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@consult.wav" \
  -F "model=zero-med" \
  -F "language_code=en"

Output shape

Same response schema as other models; the value-add is in the text itself, drug names in correct spelling, procedure names in standard terminology. Pair with enable_keyterm_normalization for further normalization against your own glossary.

`zero-codeswitch`

Tuned for speech that mixes two languages within a single utterance, Hinglish (mujhe yeh passbook update karna hai), Tanglish, Banglish, etc. Automatically restores English words to Latin script so the final transcript reads cleanly.

When to use

Urban Indian call-centre conversations (customers routinely switch mid-sentence)
Social-media, podcast, or creator content with code-mixed speech
Any conversation where separating languages would force unnatural phrasing

Example

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@hinglish.wav" \
  -F "model=zero-codeswitch"

What the output looks like

{
  "text": "मुझे अपना EMI details check करना है",
  "detected_language": "Hinglish",
  "segments": [
    { "start": 0.3, "end": 3.1, "text": "मुझे अपना EMI details check करना है" }
  ]
}

Picking the right model, decision flow

Not sure which?

Start with zero-indic if the speakers are Indian; zero-universal otherwise. Swap to zero-codeswitch if you see English words coming through phonetically in the Indic transcript, or zero-med if drug/procedure terms are being butchered.