Voice AI that doesn't stop at English
Transcribe, translate, and generate speech in the languages people actually speak, on our cloud API or entirely on your own hardware.
Capabilities
Everything the Shunya stack does, grouped by what you want to build.
Zero STT, 3.10% composite WER, 204 languages, streaming & batch.
Diarization, intent, sentiment, summary, redaction, flags on the same ASR request.
46 voices, 23 Indic languages, 11 expression styles, voice cloning.
55 Indian languages, 2,970 translation pairs, BLEU 38.5 weighted.
Pipe any LLM into TTS for sub-second conversational AI.
Pick your path
Every journey is three clicks or less. Start where you are.
pip install shunyalabs. Async client, typed config, streaming generators.Models at a glance
| Capability | Scale | Latency | Highlight |
|---|---|---|---|
| Zero STT (Universal) | 204 languages | RTFx 70-179 | 3.10% composite WER |
| Zero STT Indic | 55+ Indian languages | Streaming, real-time | Code-switch ready |
| Zero STT Med | Clinical vocab | Streaming | FHIR / HL7 compatible |
| Zero TTS | 23 langs, 46 voices | First-audio < 350 ms | Voice cloning, 11 styles |
| Vāķ Translate | 2,970 language pairs | Batch | BLEU 38.5 weighted avg |
Cloud API for speed. On-prem for data sovereignty. CPU-compatible for air-gapped. Same API surface everywhere.
Read the deployment guide →SOC 2 Type II, ISO 27001:2022, HIPAA, GDPR, CCPA. AES-256 at rest, TLS 1.3 in transit, audio deleted on completion.
Read the security policy →