Deployment options
Same API surface, two deployment shapes. Pick what fits your compliance, latency, and operating-cost constraints. You can switch later without changing a line of calling code.
The two shapes
Managed endpoints at asr.shunyalabs.ai and tts.shunyalabs.ai. Bearer token auth. Auto-scaled, 99.9% SLA.
Picks when: prototypes, SaaS products, latency is acceptable over WAN.
Read the Cloud API guide →Same Triton containers inside your network. Data never leaves the VPC. Custom pricing.
Picks when: regulated industries (BFSI, healthcare, gov), strict data residency, predictable load.
Read the On-prem guide →Cloud API
The fastest path. Sign up → get API key → start calling. All features available, including the intelligence layer.
Endpoints
ASR batch: https://asr.shunyalabs.ai/v1/audio/transcriptions
ASR streaming: wss://asr.shunyalabs.ai/ws
TTS batch: https://tts.shunyalabs.ai/v1/audio/speech
TTS streaming: wss://tts.shunyalabs.ai/ws/v1/audio/speechSLA & regions
- Default region: Asia (Mumbai).
- Enterprise can request US or EU endpoints.
- 99.9% uptime on standard tier; 99.95% on Enterprise.
- Planned maintenance windows announced 14 days in advance.
On-prem / VPC
Container parity, the exact same Triton containers that power the cloud, running on infrastructure you operate.
What you get
- Triton Inference Server containers for ASR, TTS, and Vāķ
- REST and WebSocket edge services with the same API contract
- Management plane for API keys, quotas, and monitoring
- Helm charts (Kubernetes) or
docker-composefor smaller setups - Update channel, you control the cadence
Sizing
Hardware requirements depend on concurrency, codec, audio length, NLP add-ons, and Triton configuration. Shunya doesn't publish a sizing matrix, contact Shunya Labs ↗ for a sizing exercise before specifying hardware.
Network topology
SDK pointed at self-hosted
client = AsyncShunyaClient(
api_key="your-api-key",
tts_url="https://voice.yourcompany.internal",
tts_ws_url="wss://voice.yourcompany.internal/ws",
)The only models of their kind on Hugging Face
For environments with no network connectivity to the internet, defence, offline kiosks, mobile edge, air-gapped labs. Open weights from Shunya's Hugging Face organization, run on commodity x86.
Pull models from Hugging Face (on an internet-connected machine)
huggingface-cli login
huggingface-cli download shunyalabs/pingala-v1-universal --local-dir ./models/pingala
huggingface-cli download shunyalabs/vak-translate-1.3b-ct2 --local-dir ./models/vak
huggingface-cli download shunyalabs/zero-stt-hinglish --local-dir ./models/hinglishTransfer & run
- Package the
./modelsdirectory as a signed tarball. - Verify checksums, transfer via signed media to the secure zone.
- Load from local path:
from pingala_shunya import PingalaTranscriber
tx = PingalaTranscriber(model_path="./models/pingala")
segments = tx.transcribe("meeting.wav")License reminder
| Model | License | Constraints |
|---|---|---|
pingala-v1-universal | RAIL-M | Free up to 10,000 hours/month. No redistribution, no derivatives. Attribution if outputs public. |
zero-stt-hinglish | openrail | Permissive, with responsible-use restrictions. |
vak-translate-1.3b-ct2 | CC-BY-SA-4.0 | Share-alike, derivative works must carry the same license. |