Deployment options

Same API surface, two deployment shapes. Pick what fits your compliance, latency, and operating-cost constraints. You can switch later without changing a line of calling code.

The two shapes

Fastest to start

Cloud API

Managed endpoints at asr.shunyalabs.ai and tts.shunyalabs.ai. Bearer token auth. Auto-scaled, 99.9% SLA.

Picks when: prototypes, SaaS products, latency is acceptable over WAN.

Read the Cloud API guide →

Data residency

On-prem / VPC

Same Triton containers inside your network. Data never leaves the VPC. Custom pricing.

Picks when: regulated industries (BFSI, healthcare, gov), strict data residency, predictable load.

Read the On-prem guide →

Cloud API

The fastest path. Sign up → get API key → start calling. All features available, including the intelligence layer.

Endpoints

ASR batch:     https://asr.shunyalabs.ai/v1/audio/transcriptions
ASR streaming: wss://asr.shunyalabs.ai/ws
TTS batch:     https://tts.shunyalabs.ai/v1/audio/speech
TTS streaming: wss://tts.shunyalabs.ai/ws/v1/audio/speech

SLA & regions

Default region: Asia (Mumbai).
Enterprise can request US or EU endpoints.
99.9% uptime on standard tier; 99.95% on Enterprise.
Planned maintenance windows announced 14 days in advance.

On-prem / VPC

Container parity, the exact same Triton containers that power the cloud, running on infrastructure you operate.

What you get

Triton Inference Server containers for ASR, TTS, and Vāķ
REST and WebSocket edge services with the same API contract
Management plane for API keys, quotas, and monitoring
Helm charts (Kubernetes) or docker-compose for smaller setups
Update channel, you control the cadence

Sizing

Hardware requirements depend on concurrency, codec, audio length, NLP add-ons, and Triton configuration. Shunya doesn't publish a sizing matrix, contact Shunya Labs ↗ for a sizing exercise before specifying hardware.

Network topology

SDK pointed at self-hosted

client = AsyncShunyaClient(
    api_key="your-api-key",
    tts_url="https://voice.yourcompany.internal",
    tts_ws_url="wss://voice.yourcompany.internal/ws",
)

The only models of their kind on Hugging Face

For environments with no network connectivity to the internet, defence, offline kiosks, mobile edge, air-gapped labs. Open weights from Shunya's Hugging Face organization, run on commodity x86.

Pull models from Hugging Face (on an internet-connected machine)

huggingface-cli login
huggingface-cli download shunyalabs/pingala-v1-universal --local-dir ./models/pingala
huggingface-cli download shunyalabs/vak-translate-1.3b-ct2 --local-dir ./models/vak
huggingface-cli download shunyalabs/zero-stt-hinglish --local-dir ./models/hinglish

Transfer & run

Package the ./models directory as a signed tarball.
Verify checksums, transfer via signed media to the secure zone.
Load from local path:

from pingala_shunya import PingalaTranscriber
tx = PingalaTranscriber(model_path="./models/pingala")
segments = tx.transcribe("meeting.wav")

Production serving in an air-gapped zone

If you need production-quality serving (multi-tenant, streaming, horizontally scaled) in the secure zone, the on-prem package is easier than rolling your own Triton config. Contact Shunya Labs ↗.

License reminder

Model	License	Constraints
`pingala-v1-universal`	RAIL-M	Free up to 10,000 hours/month. No redistribution, no derivatives. Attribution if outputs public.
`zero-stt-hinglish`	openrail	Permissive, with responsible-use restrictions.
`vak-translate-1.3b-ct2`	CC-BY-SA-4.0	Share-alike, derivative works must carry the same license.