What is Shunya?
Shunya Labs is a voice AI platform built for languages and deployments that most speech stacks ignore. One API covers speech-to-text, text-to-speech, translation, and voice agents, and you can run it in the cloud, on your own servers, or on a commodity CPU.
Three things that make Shunya different
Trained on Indian speech patterns, accents, and code-switching. Not English models with Indic fine-tuning bolted on.
Runs on x86 commodity hardware. No GPU required for inference in many configurations, lowers infra cost and enables air-gapped deploys.
Cloud, on-prem, or fully air-gapped. SOC 2, ISO 27001, HIPAA. Your audio is deleted on completion.
The stack, end to end
You compose a voice product by picking the pieces you need:
What each piece does
| Component | Does | Best for |
|---|---|---|
| Zero STT | Transcribes audio to text. 204 languages. Streaming or batch. | Call centres, meeting notes, media captions, medical scribe. |
| Zero TTS | Generates natural speech in 23 Indic languages and English. 46 voices, 11 styles. | IVR, notifications, voice agents, audiobooks. |
| Vāķ Translate | Translates between any two Indian languages. 2,970 pairs. | Live conversation translation, multilingual content. |
| Intelligence layer | Intent, entity, sentiment, diarization, summarization, on top of transcripts. | Agent assist, analytics, compliance monitoring. |
| Voice agents | Composes the above into conversational workflows. | Automated support, debt collection, appointment booking. |
How it runs
Same API, two deployment shapes. Pick what fits your compliance story:
Cloud API, fastest to try. Global endpoint, bearer-token auth, streaming over WebSocket. You pay per minute.
On-prem, the same Triton containers inside your VPC. Data never leaves your network.
Batch or Streaming?
Every Shunya API, Speech-to-Text and Text-to-Speech, works in two transport modes. Same models, same voices, different transport. Pick the one that fits how your audio arrives.
Send via HTTP POST and receive a complete result in a single response.
- ASR
- POST
https://asr.shunyalabs.ai/v1/audio/transcriptions - TTS
- POST
https://tts.shunyalabs.ai/v1/audio/speech
- Uploaded files, post-processing, async jobs.
- Pre-rendered voice prompts for IVR and telephony systems.
- Notification audio, order updates, alerts, reminders.
- Podcast, audiobook, and long-form content generation.
- Any use case where audio does not need to start playing before synthesis is complete.
Open a persistent WebSocket connection and exchange audio frames in real time as they're produced.
- ASR
wss://asr.shunyalabs.ai/ws- TTS
wss://tts.shunyalabs.ai/ws/v1/audio/speech
- Live transcription, voice agents, IVR.
- Voice agents and conversational AI requiring sub-second audio start.
- IVR and telephony pipelines.
- Real-time audio playback in applications.
- Any use case where audio must begin playing before synthesis of the full text is complete.
Source: Shunyalabs ASR Gateway API Reference (31 March), Base Call + WebSocket Streaming API; Shunyalabs TTS Developer Documentation v1.0 (March 2026), §2.1 + §3.1. Text reproduced verbatim.
For the full set of models, parameters, and connection details for each transport, jump to ASR overview or TTS overview.