What is Shunya?

Shunya Labs is a voice AI platform built for languages and deployments that most speech stacks ignore. One API covers speech-to-text, text-to-speech, translation, and voice agents, and you can run it in the cloud, on your own servers, or on a commodity CPU.

Three things that make Shunya different

Indic-first

Trained on Indian speech patterns, accents, and code-switching. Not English models with Indic fine-tuning bolted on.

CPU-compatible

Runs on x86 commodity hardware. No GPU required for inference in many configurations, lowers infra cost and enables air-gapped deploys.

Sovereign by default

Cloud, on-prem, or fully air-gapped. SOC 2, ISO 27001, HIPAA. Your audio is deleted on completion.

The stack, end to end

You compose a voice product by picking the pieces you need:

What each piece does

Component	Does	Best for
Zero STT	Transcribes audio to text. 204 languages. Streaming or batch.	Call centres, meeting notes, media captions, medical scribe.
Zero TTS	Generates natural speech in 23 Indic languages and English. 46 voices, 11 styles.	IVR, notifications, voice agents, audiobooks.
Vāķ Translate	Translates between any two Indian languages. 2,970 pairs.	Live conversation translation, multilingual content.
Intelligence layer	Intent, entity, sentiment, diarization, summarization, on top of transcripts.	Agent assist, analytics, compliance monitoring.
Voice agents	Composes the above into conversational workflows.	Automated support, debt collection, appointment booking.

How it runs

Same API, two deployment shapes. Pick what fits your compliance story:

Managed

Cloud API, fastest to try. Global endpoint, bearer-token auth, streaming over WebSocket. You pay per minute.

Self-hosted

On-prem, the same Triton containers inside your VPC. Data never leaves your network.

Why "Shunya"?

Shunya (शून्य) is the Sanskrit origin of the modern concept of zero, a South Asian mathematical contribution that rewired the world. The voice stack carries the same spirit: one primitive that lets a lot of things exist.

Batch or Streaming?

Every Shunya API, Speech-to-Text and Text-to-Speech, works in two transport modes. Same models, same voices, different transport. Pick the one that fits how your audio arrives.

Batch

HTTP POST, one request, complete response

Send via HTTP POST and receive a complete result in a single response.

Endpoints

ASR: POST https://asr.shunyalabs.ai/v1/audio/transcriptions
TTS: POST https://tts.shunyalabs.ai/v1/audio/speech

Best for (ASR)

Uploaded files, post-processing, async jobs.

Best for (TTS)

Pre-rendered voice prompts for IVR and telephony systems.
Notification audio, order updates, alerts, reminders.
Podcast, audiobook, and long-form content generation.
Any use case where audio does not need to start playing before synthesis is complete.

Streaming

WebSocket, audio chunks flow in real time

Open a persistent WebSocket connection and exchange audio frames in real time as they're produced.

Endpoints

ASR: wss://asr.shunyalabs.ai/ws
TTS: wss://tts.shunyalabs.ai/ws/v1/audio/speech

Best for (ASR)

Live transcription, voice agents, IVR.

Best for (TTS)

Voice agents and conversational AI requiring sub-second audio start.
IVR and telephony pipelines.
Real-time audio playback in applications.
Any use case where audio must begin playing before synthesis of the full text is complete.

Source: Shunyalabs ASR Gateway API Reference (31 March), Base Call + WebSocket Streaming API; Shunyalabs TTS Developer Documentation v1.0 (March 2026), §2.1 + §3.1. Text reproduced verbatim.

For the full set of models, parameters, and connection details for each transport, jump to ASR overview or TTS overview.