Capability matrix

One view across every model. Use this to pick the right endpoint for your use case.

Models at a glance

Model	Use for	Languages	Key metric	$/min (std)
zero-universal	General purpose, English + international	204	3.10% composite WER	$0.0039
zero-indic	Hindi, Tamil, Telugu, Bengali, Kannada, and 50+ Indian languages	55+	Streaming-first	$0.0045
zero-codeswitch	Hinglish, Tanglish, mixed Indic + English	Indic + English	Native code-switch	$0.0050
zero-med	Clinical / medical speech	English (+ Indic)	FHIR / HL7 compatible	$0.0050
zero-indic	Zero TTS, speech synthesis, voice agents, IVR	23 Indic + English	46 voices, 11 styles	See pricing
vak-translate-1.3b-ct2	Vāķ Translate, Indic ↔ Indic text	55, 2,970 pairs	BLEU 38.5 weighted	See pricing

Speech-to-Text (STT)

ASR accuracy: OpenASR composite

Reading this table

Same Shunya model evaluated across eight industry-standard test datasets. The rows below are benchmarks, not competing models.

Zero STT Universal (model: pingala-v1-universal, API: zero-universal) benchmarked on the HuggingFace Open ASR leaderboard. Lower WER = fewer errors. Composite WER: 3.10%, Average RTFx: 146.23.

Dataset	WER %	RTFx	Domain
LibriSpeech Test Clean	0.71	158.74	Audiobooks, clean read
LibriSpeech Test Other	2.17	142.40	Noisy audiobooks
TedLium Test	1.43	153.34	TED-style presentations
SPGISpeech Test	1.10	170.85	Finance / earnings calls
AMI Test	4.19	70.22	Multi-party meetings
Earnings22 Test	5.83	101.52	Earnings calls (noisy)
GigaSpeech Test	4.99	131.09	Podcasts, YouTube
VoxPopuli Test	4.34	179.28	Parliamentary proceedings

RTFx = real-time factor. RTFx of 100 means 1 second of audio is transcribed in 10 ms. Source: Shunya Labs' published model card on Hugging Face, huggingface.co/shunyalabs/pingala-v1-universal. The numbers above are reproduced verbatim from the "OpenASR Leaderboard Results" section of that model card (verified 2026-04-24). Independent reproduction on your own audio is recommended before relying on them for production decisions.

Language coverage

Indic languages (Zero STT Indic)

Zero STT Indic supports 55+ Indian languages. The table below lists the 23 Indic languages plus English that overlap with Zero TTS. For the full STT list, call GET https://asr.shunyalabs.ai/languages or see ASR overview.

Language	ISO	Script
Hindi	hi	Devanagari
Bengali	bn	Bengali
Telugu	te	Telugu
Marathi	mr	Devanagari
Tamil	ta	Tamil
Urdu	ur	Perso-Arabic
Gujarati	gu	Gujarati
Kannada	kn	Kannada
Odia	or	Odia
Malayalam	ml	Malayalam
Punjabi	pa	Gurmukhi
Assamese	as	Assamese
Maithili	mai	Devanagari
Sanskrit	sa	Devanagari
Nepali	ne	Devanagari
Konkani	gom	Devanagari
Dogri	doi	Devanagari
Kashmiri	ks	Perso-Arabic
Sindhi	sd	Perso-Arabic
Bodo	brx	Devanagari
Manipuri	mni	Meitei
Santali	sat	Ol Chiki
English	en	Latin

Universal model (Zero STT Universal)

204 languages and dialects spanning Indo-European, Dravidian, Austroasiatic, Sino-Tibetan, and Semitic families. The universal model autodetects the input language; pass language_code to lock it.

Latency expectations

Path	First audio / partial	End-to-end
Streaming ASR (PCM in → partial out)	< 100 ms	Real-time
Batch ASR (upload → transcript)	-	~812 ms for 5.7s clip

Text-to-Speech (TTS)

Hugging Face

Zero TTS runs on the cloud API (zero-indic). Shunya Labs does not publish open TTS weights as a downloadable model repo on Hugging Face (unlike pingala-v1-universal for STT). You can try synthesis in the browser via the Vāķ Text to Speech Space, or browse the org at huggingface.co/shunyalabs.

Language coverage

Indic languages (Zero TTS)

All 23 Indic languages and English supported by model zero-indic. Pass the matching language code in TTSConfig to hint the text preprocessor. Voice names on Voices & languages.

Language	ISO	Script
Assamese	as	Assamese
Bengali	bn	Bengali
Bodo	brx	Devanagari
Dogri	doi	Devanagari
English	en	Latin
Gujarati	gu	Gujarati
Hindi	hi	Devanagari
Kannada	kn	Kannada
Kashmiri	ks	Perso-Arabic
Konkani	gom	Devanagari
Maithili	mai	Devanagari
Malayalam	ml	Malayalam
Manipuri	mni	Meitei
Marathi	mr	Devanagari
Nepali	ne	Devanagari
Odia	or	Odia
Punjabi	pa	Gurmukhi
Sanskrit	sa	Devanagari
Santali	sat	Ol Chiki
Sindhi	sd	Perso-Arabic
Tamil	ta	Tamil
Telugu	te	Telugu
Urdu	ur	Perso-Arabic

Voices

Voice catalogue (Zero TTS)

46 speaker voices, one male and one female per language. The voice parameter controls vocal character; the input text controls the language spoken. Any voice can speak any supported language.

Language	ISO	Male	Female
Assamese	as	`Bimal`	`Anjana`
Bengali	bn	`Arjun`	`Priyanka`
Bodo	brx	`Daimalu`	`Hasina`
Dogri	doi	`Vishal`	`Neelam`
English	en	`Varun`	`Nisha`
Gujarati	gu	`Rakesh`	`Pooja`
Hindi	hi	`Rajesh`	`Sunita`
Kannada	kn	`Kiran`	`Shreya`
Kashmiri	ks	`Farooq`	`Habba`
Konkani	gom	`Mohan`	`Sarita`
Maithili	mai	`Suresh`	`Meera`
Malayalam	ml	`Krishnan`	`Deepa`
Manipuri	mni	`Tomba`	`Ibemhal`
Marathi	mr	`Siddharth`	`Ananya`
Nepali	ne	`Bikash`	`Sapana`
Odia	or	`Bijay`	`Sujata`
Punjabi	pa	`Gurpreet`	`Simran`
Sanskrit	sa	`Vedant`	`Gayatri`
Santali	sat	`Chandu`	`Roshni`
Sindhi	sd	`Amjad`	`Kavita`
Tamil	ta	`Murugan`	`Thangam`
Telugu	te	`Vishnu`	`Lakshmi`
Urdu	ur	`Salman`	`Fatima`

Latency expectations

Path	First audio / partial	End-to-end
Streaming TTS (text in → first audio)	< 350 ms	Real-time
LLM → TTS pipeline	~350 ms (sentence-first)	Saves 200-400 ms vs full-text

Pricing tiers

Free Pay-as-you-go

$200 credit, no card, no expiration. Great for prototypes.

Popular Volume

$500 prepaid, up to 10% discount. For production workloads.

Custom Enterprise

Best-rate pricing, custom STT training, highest concurrency, self-hosted, SLAs.