Capability matrix
One view across every model. Use this to pick the right endpoint for your use case.
Models at a glance
| Model | Use for | Languages | Key metric | $/min (std) |
|---|---|---|---|---|
| zero-universal | General purpose, English + international | 204 | 3.10% composite WER | $0.0039 |
| zero-indic | Hindi, Tamil, Telugu, Bengali, Kannada, and 50+ Indian languages | 55+ | Streaming-first | $0.0045 |
| zero-codeswitch | Hinglish, Tanglish, mixed Indic + English | Indic + English | Native code-switch | $0.0050 |
| zero-med | Clinical / medical speech | English (+ Indic) | FHIR / HL7 compatible | $0.0050 |
| zero-indic | Zero TTS, speech synthesis, voice agents, IVR | 23 Indic + English | 46 voices, 11 styles | See pricing |
| vak-translate-1.3b-ct2 | Vāķ Translate, Indic ↔ Indic text | 55, 2,970 pairs | BLEU 38.5 weighted | See pricing |
Speech-to-Text (STT)
ASR accuracy: OpenASR composite
Zero STT Universal (model: pingala-v1-universal, API: zero-universal) benchmarked on the HuggingFace Open ASR leaderboard. Lower WER = fewer errors. Composite WER: 3.10%, Average RTFx: 146.23.
| Dataset | WER % | RTFx | Domain |
|---|---|---|---|
| LibriSpeech Test Clean | 0.71 | 158.74 | Audiobooks, clean read |
| LibriSpeech Test Other | 2.17 | 142.40 | Noisy audiobooks |
| TedLium Test | 1.43 | 153.34 | TED-style presentations |
| SPGISpeech Test | 1.10 | 170.85 | Finance / earnings calls |
| AMI Test | 4.19 | 70.22 | Multi-party meetings |
| Earnings22 Test | 5.83 | 101.52 | Earnings calls (noisy) |
| GigaSpeech Test | 4.99 | 131.09 | Podcasts, YouTube |
| VoxPopuli Test | 4.34 | 179.28 | Parliamentary proceedings |
RTFx = real-time factor. RTFx of 100 means 1 second of audio is transcribed in 10 ms. Source: Shunya Labs' published model card on Hugging Face, huggingface.co/shunyalabs/pingala-v1-universal. The numbers above are reproduced verbatim from the "OpenASR Leaderboard Results" section of that model card (verified 2026-04-24). Independent reproduction on your own audio is recommended before relying on them for production decisions.
Language coverage
Indic languages (Zero STT Indic)
Zero STT Indic supports 55+ Indian languages. The table below lists the 23 Indic languages plus English that overlap with Zero TTS. For the full STT list, call GET https://asr.shunyalabs.ai/languages or see ASR overview.
| Language | ISO | Script |
|---|---|---|
| Hindi | hi | Devanagari |
| Bengali | bn | Bengali |
| Telugu | te | Telugu |
| Marathi | mr | Devanagari |
| Tamil | ta | Tamil |
| Urdu | ur | Perso-Arabic |
| Gujarati | gu | Gujarati |
| Kannada | kn | Kannada |
| Odia | or | Odia |
| Malayalam | ml | Malayalam |
| Punjabi | pa | Gurmukhi |
| Assamese | as | Assamese |
| Maithili | mai | Devanagari |
| Sanskrit | sa | Devanagari |
| Nepali | ne | Devanagari |
| Konkani | gom | Devanagari |
| Dogri | doi | Devanagari |
| Kashmiri | ks | Perso-Arabic |
| Sindhi | sd | Perso-Arabic |
| Bodo | brx | Devanagari |
| Manipuri | mni | Meitei |
| Santali | sat | Ol Chiki |
| English | en | Latin |
Universal model (Zero STT Universal)
204 languages and dialects spanning Indo-European, Dravidian, Austroasiatic, Sino-Tibetan, and Semitic families. The universal model autodetects the input language; pass language_code to lock it.
Latency expectations
| Path | First audio / partial | End-to-end |
|---|---|---|
| Streaming ASR (PCM in → partial out) | < 100 ms | Real-time |
| Batch ASR (upload → transcript) | - | ~812 ms for 5.7s clip |
Text-to-Speech (TTS)
Zero TTS runs on the cloud API (zero-indic). Shunya Labs does not publish open TTS weights as a downloadable model repo on Hugging Face (unlike pingala-v1-universal for STT). You can try synthesis in the browser via the Vāķ Text to Speech Space, or browse the org at huggingface.co/shunyalabs.
Language coverage
Indic languages (Zero TTS)
All 23 Indic languages and English supported by model zero-indic. Pass the matching language code in TTSConfig to hint the text preprocessor. Voice names on Voices & languages.
| Language | ISO | Script |
|---|---|---|
| Assamese | as | Assamese |
| Bengali | bn | Bengali |
| Bodo | brx | Devanagari |
| Dogri | doi | Devanagari |
| English | en | Latin |
| Gujarati | gu | Gujarati |
| Hindi | hi | Devanagari |
| Kannada | kn | Kannada |
| Kashmiri | ks | Perso-Arabic |
| Konkani | gom | Devanagari |
| Maithili | mai | Devanagari |
| Malayalam | ml | Malayalam |
| Manipuri | mni | Meitei |
| Marathi | mr | Devanagari |
| Nepali | ne | Devanagari |
| Odia | or | Odia |
| Punjabi | pa | Gurmukhi |
| Sanskrit | sa | Devanagari |
| Santali | sat | Ol Chiki |
| Sindhi | sd | Perso-Arabic |
| Tamil | ta | Tamil |
| Telugu | te | Telugu |
| Urdu | ur | Perso-Arabic |
Voices
Voice catalogue (Zero TTS)
46 speaker voices, one male and one female per language. The voice parameter controls vocal character; the input text controls the language spoken. Any voice can speak any supported language.
| Language | ISO | Male | Female |
|---|---|---|---|
| Assamese | as | Bimal | Anjana |
| Bengali | bn | Arjun | Priyanka |
| Bodo | brx | Daimalu | Hasina |
| Dogri | doi | Vishal | Neelam |
| English | en | Varun | Nisha |
| Gujarati | gu | Rakesh | Pooja |
| Hindi | hi | Rajesh | Sunita |
| Kannada | kn | Kiran | Shreya |
| Kashmiri | ks | Farooq | Habba |
| Konkani | gom | Mohan | Sarita |
| Maithili | mai | Suresh | Meera |
| Malayalam | ml | Krishnan | Deepa |
| Manipuri | mni | Tomba | Ibemhal |
| Marathi | mr | Siddharth | Ananya |
| Nepali | ne | Bikash | Sapana |
| Odia | or | Bijay | Sujata |
| Punjabi | pa | Gurpreet | Simran |
| Sanskrit | sa | Vedant | Gayatri |
| Santali | sat | Chandu | Roshni |
| Sindhi | sd | Amjad | Kavita |
| Tamil | ta | Murugan | Thangam |
| Telugu | te | Vishnu | Lakshmi |
| Urdu | ur | Salman | Fatima |
Latency expectations
| Path | First audio / partial | End-to-end |
|---|---|---|
| Streaming TTS (text in → first audio) | < 350 ms | Real-time |
| LLM → TTS pipeline | ~350 ms (sentence-first) | Saves 200-400 ms vs full-text |
Pricing tiers
$200 credit, no card, no expiration. Great for prototypes.
$500 prepaid, up to 10% discount. For production workloads.
Best-rate pricing, custom STT training, highest concurrency, self-hosted, SLAs.