ASR models

Four speech recognition models tuned for different domains. Pass the name in the model field. You can't mix models within a single request, but you can switch between requests freely.

Side-by-side

ModelLanguagesSpecial behaviourPrice ($/min)
zero-indic55+ Indian languagesStandard transcription$0.0045
zero-universal204 languagesStandard transcription, strongest on English$0.0039
zero-medEnglish + IndicAuto medical-terminology correction (MedGemma)$0.0050
zero-codeswitchHinglish, Tanglish, etc.Auto English-token restoration to Latin script$0.0050

zero-indic

The default for Indian languages. Native handling of Hindi, Tamil, Telugu, Kannada, Marathi, Bengali, Gujarati, Punjabi, Malayalam, Odia, Urdu, Assamese, Maithili, and more.

When to use

  • Single-language Indic audio where you know (or can detect) the language
  • Call-centre audio for Indian customers in their native tongue
  • Meetings, interviews, dictation, media transcription

Example

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" \
  -F "model=zero-indic" \
  -F "language_code=hi"

zero-universal

99-language Whisper-class foundation model (docs say 204; the published leaderboard number varies by evaluation set). Strongest on English and European languages. Use when the input could be anything.

When to use

  • English or non-Indian-language content
  • Audio where you don't know the language in advance, set language_code=auto
  • International meetings with mixed English and other languages

Benchmarks

Composite WER on the HuggingFace Open ASR leaderboard: 3.10%, claimed 48% fewer errors than the next-best model.

See per-dataset numbers →

Example

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@podcast.wav" \
  -F "model=zero-universal"

zero-med

Clinical-grade. Trained to recognise drug names, procedures, anatomy, dosages, and diagnostic terms. Auto-applies medical terminology correction via MedGemma (no extra flag needed).

When to use

  • Doctor-patient consultations
  • OT / surgical notes dictation
  • Case note transcription for EHR ingest

Compliance

HIPAA path
Zero STT Med is the model cleared for PHI handling under the Shunya BAA. Route PHI only through this model, and prefer on-prem deployment for the strongest data-residency story.

Example

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@consult.wav" \
  -F "model=zero-med" \
  -F "language_code=en"

Output shape

Same response schema as other models; the value-add is in the text itself, drug names in correct spelling, procedure names in standard terminology. Pair with enable_keyterm_normalization for further normalization against your own glossary.

zero-codeswitch

Tuned for speech that mixes two languages within a single utterance, Hinglish (mujhe yeh passbook update karna hai), Tanglish, Banglish, etc. Automatically restores English words to Latin script so the final transcript reads cleanly.

When to use

  • Urban Indian call-centre conversations (customers routinely switch mid-sentence)
  • Social-media, podcast, or creator content with code-mixed speech
  • Any conversation where separating languages would force unnatural phrasing

Example

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@hinglish.wav" \
  -F "model=zero-codeswitch"

What the output looks like

json
{
  "text": "मुझे अपना EMI details check करना है",
  "detected_language": "Hinglish",
  "segments": [
    { "start": 0.3, "end": 3.1, "text": "मुझे अपना EMI details check करना है" }
  ]
}

Picking the right model, decision flow

Not sure which?
Start with zero-indic if the speakers are Indian; zero-universal otherwise. Swap to zero-codeswitch if you see English words coming through phonetically in the Indic transcript, or zero-med if drug/procedure terms are being butchered.