Capability matrix

One view across every model. Use this to pick the right endpoint for your use case.

Models at a glance

ModelUse forLanguagesKey metric$/min (std)
zero-universalGeneral purpose, English + international2043.10% composite WER$0.0039
zero-indicHindi, Tamil, Telugu, Bengali, Kannada, and 50+ Indian languages55+Streaming-first$0.0045
zero-codeswitchHinglish, Tanglish, mixed Indic + EnglishIndic + EnglishNative code-switch$0.0050
zero-medClinical / medical speechEnglish (+ Indic)FHIR / HL7 compatible$0.0050
zero-indicZero TTS, speech synthesis, voice agents, IVR23 Indic + English46 voices, 11 stylesSee pricing
vak-translate-1.3b-ct2Vāķ Translate, Indic ↔ Indic text55, 2,970 pairsBLEU 38.5 weightedSee pricing

Speech-to-Text (STT)

ASR accuracy: OpenASR composite

Reading this table
Same Shunya model evaluated across eight industry-standard test datasets. The rows below are benchmarks, not competing models.

Zero STT Universal (model: pingala-v1-universal, API: zero-universal) benchmarked on the HuggingFace Open ASR leaderboard. Lower WER = fewer errors. Composite WER: 3.10%, Average RTFx: 146.23.

DatasetWER %RTFxDomain
LibriSpeech Test Clean0.71158.74Audiobooks, clean read
LibriSpeech Test Other2.17142.40Noisy audiobooks
TedLium Test1.43153.34TED-style presentations
SPGISpeech Test1.10170.85Finance / earnings calls
AMI Test4.1970.22Multi-party meetings
Earnings22 Test5.83101.52Earnings calls (noisy)
GigaSpeech Test4.99131.09Podcasts, YouTube
VoxPopuli Test4.34179.28Parliamentary proceedings

RTFx = real-time factor. RTFx of 100 means 1 second of audio is transcribed in 10 ms. Source: Shunya Labs' published model card on Hugging Face, huggingface.co/shunyalabs/pingala-v1-universal. The numbers above are reproduced verbatim from the "OpenASR Leaderboard Results" section of that model card (verified 2026-04-24). Independent reproduction on your own audio is recommended before relying on them for production decisions.

Language coverage

Indic languages (Zero STT Indic)

Zero STT Indic supports 55+ Indian languages. The table below lists the 23 Indic languages plus English that overlap with Zero TTS. For the full STT list, call GET https://asr.shunyalabs.ai/languages or see ASR overview.

LanguageISOScript
HindihiDevanagari
BengalibnBengali
TeluguteTelugu
MarathimrDevanagari
TamiltaTamil
UrduurPerso-Arabic
GujaratiguGujarati
KannadaknKannada
OdiaorOdia
MalayalammlMalayalam
PunjabipaGurmukhi
AssameseasAssamese
MaithilimaiDevanagari
SanskritsaDevanagari
NepalineDevanagari
KonkanigomDevanagari
DogridoiDevanagari
KashmiriksPerso-Arabic
SindhisdPerso-Arabic
BodobrxDevanagari
ManipurimniMeitei
SantalisatOl Chiki
EnglishenLatin

Universal model (Zero STT Universal)

204 languages and dialects spanning Indo-European, Dravidian, Austroasiatic, Sino-Tibetan, and Semitic families. The universal model autodetects the input language; pass language_code to lock it.

Latency expectations

PathFirst audio / partialEnd-to-end
Streaming ASR (PCM in → partial out)< 100 msReal-time
Batch ASR (upload → transcript)-~812 ms for 5.7s clip

Text-to-Speech (TTS)

Hugging Face

Zero TTS runs on the cloud API (zero-indic). Shunya Labs does not publish open TTS weights as a downloadable model repo on Hugging Face (unlike pingala-v1-universal for STT). You can try synthesis in the browser via the Vāķ Text to Speech Space, or browse the org at huggingface.co/shunyalabs.

Language coverage

Indic languages (Zero TTS)

All 23 Indic languages and English supported by model zero-indic. Pass the matching language code in TTSConfig to hint the text preprocessor. Voice names on Voices & languages.

LanguageISOScript
AssameseasAssamese
BengalibnBengali
BodobrxDevanagari
DogridoiDevanagari
EnglishenLatin
GujaratiguGujarati
HindihiDevanagari
KannadaknKannada
KashmiriksPerso-Arabic
KonkanigomDevanagari
MaithilimaiDevanagari
MalayalammlMalayalam
ManipurimniMeitei
MarathimrDevanagari
NepalineDevanagari
OdiaorOdia
PunjabipaGurmukhi
SanskritsaDevanagari
SantalisatOl Chiki
SindhisdPerso-Arabic
TamiltaTamil
TeluguteTelugu
UrduurPerso-Arabic

Voices

Voice catalogue (Zero TTS)

46 speaker voices, one male and one female per language. The voice parameter controls vocal character; the input text controls the language spoken. Any voice can speak any supported language.

LanguageISOMaleFemale
AssameseasBimalAnjana
BengalibnArjunPriyanka
BodobrxDaimaluHasina
DogridoiVishalNeelam
EnglishenVarunNisha
GujaratiguRakeshPooja
HindihiRajeshSunita
KannadaknKiranShreya
KashmiriksFarooqHabba
KonkanigomMohanSarita
MaithilimaiSureshMeera
MalayalammlKrishnanDeepa
ManipurimniTombaIbemhal
MarathimrSiddharthAnanya
NepalineBikashSapana
OdiaorBijaySujata
PunjabipaGurpreetSimran
SanskritsaVedantGayatri
SantalisatChanduRoshni
SindhisdAmjadKavita
TamiltaMuruganThangam
TeluguteVishnuLakshmi
UrduurSalmanFatima

Latency expectations

PathFirst audio / partialEnd-to-end
Streaming TTS (text in → first audio)< 350 msReal-time
LLM → TTS pipeline~350 ms (sentence-first)Saves 200-400 ms vs full-text

Pricing tiers

Free Pay-as-you-go

$200 credit, no card, no expiration. Great for prototypes.

Popular Volume

$500 prepaid, up to 10% discount. For production workloads.

Custom Enterprise

Best-rate pricing, custom STT training, highest concurrency, self-hosted, SLAs.