What is Shunya?

Shunya Labs is a voice AI platform built for languages and deployments that most speech stacks ignore. One API covers speech-to-text, text-to-speech, translation, and voice agents, and you can run it in the cloud, on your own servers, or on a commodity CPU.

Three things that make Shunya different

Indic-first

Trained on Indian speech patterns, accents, and code-switching. Not English models with Indic fine-tuning bolted on.

CPU-compatible

Runs on x86 commodity hardware. No GPU required for inference in many configurations, lowers infra cost and enables air-gapped deploys.

Sovereign by default

Cloud, on-prem, or fully air-gapped. SOC 2, ISO 27001, HIPAA. Your audio is deleted on completion.

The stack, end to end

You compose a voice product by picking the pieces you need:

What each piece does

ComponentDoesBest for
Zero STTTranscribes audio to text. 204 languages. Streaming or batch.Call centres, meeting notes, media captions, medical scribe.
Zero TTSGenerates natural speech in 23 Indic languages and English. 46 voices, 11 styles.IVR, notifications, voice agents, audiobooks.
Vāķ TranslateTranslates between any two Indian languages. 2,970 pairs.Live conversation translation, multilingual content.
Intelligence layerIntent, entity, sentiment, diarization, summarization, on top of transcripts.Agent assist, analytics, compliance monitoring.
Voice agentsComposes the above into conversational workflows.Automated support, debt collection, appointment booking.

How it runs

Same API, two deployment shapes. Pick what fits your compliance story:

Managed

Cloud API, fastest to try. Global endpoint, bearer-token auth, streaming over WebSocket. You pay per minute.

Self-hosted

On-prem, the same Triton containers inside your VPC. Data never leaves your network.

Why "Shunya"?
Shunya (शून्य) is the Sanskrit origin of the modern concept of zero, a South Asian mathematical contribution that rewired the world. The voice stack carries the same spirit: one primitive that lets a lot of things exist.

Batch or Streaming?

Every Shunya API, Speech-to-Text and Text-to-Speech, works in two transport modes. Same models, same voices, different transport. Pick the one that fits how your audio arrives.

Batch
HTTP POST, one request, complete response

Send via HTTP POST and receive a complete result in a single response.

ASR
POST https://asr.shunyalabs.ai/v1/audio/transcriptions
TTS
POST https://tts.shunyalabs.ai/v1/audio/speech
  • Uploaded files, post-processing, async jobs.
  • Pre-rendered voice prompts for IVR and telephony systems.
  • Notification audio, order updates, alerts, reminders.
  • Podcast, audiobook, and long-form content generation.
  • Any use case where audio does not need to start playing before synthesis is complete.
Streaming
WebSocket, audio chunks flow in real time

Open a persistent WebSocket connection and exchange audio frames in real time as they're produced.

ASR
wss://asr.shunyalabs.ai/ws
TTS
wss://tts.shunyalabs.ai/ws/v1/audio/speech
  • Live transcription, voice agents, IVR.
  • Voice agents and conversational AI requiring sub-second audio start.
  • IVR and telephony pipelines.
  • Real-time audio playback in applications.
  • Any use case where audio must begin playing before synthesis of the full text is complete.

Source: Shunyalabs ASR Gateway API Reference (31 March), Base Call + WebSocket Streaming API; Shunyalabs TTS Developer Documentation v1.0 (March 2026), §2.1 + §3.1. Text reproduced verbatim.

For the full set of models, parameters, and connection details for each transport, jump to ASR overview or TTS overview.