Deployment options

Same API surface, two deployment shapes. Pick what fits your compliance, latency, and operating-cost constraints. You can switch later without changing a line of calling code.

The two shapes

Cloud API

The fastest path. Sign up → get API key → start calling. All features available, including the intelligence layer.

Endpoints

text
ASR batch:     https://asr.shunyalabs.ai/v1/audio/transcriptions
ASR streaming: wss://asr.shunyalabs.ai/ws
TTS batch:     https://tts.shunyalabs.ai/v1/audio/speech
TTS streaming: wss://tts.shunyalabs.ai/ws/v1/audio/speech

SLA & regions

  • Default region: Asia (Mumbai).
  • Enterprise can request US or EU endpoints.
  • 99.9% uptime on standard tier; 99.95% on Enterprise.
  • Planned maintenance windows announced 14 days in advance.

On-prem / VPC

Container parity, the exact same Triton containers that power the cloud, running on infrastructure you operate.

What you get

  • Triton Inference Server containers for ASR, TTS, and Vāķ
  • REST and WebSocket edge services with the same API contract
  • Management plane for API keys, quotas, and monitoring
  • Helm charts (Kubernetes) or docker-compose for smaller setups
  • Update channel, you control the cadence

Sizing

Hardware requirements depend on concurrency, codec, audio length, NLP add-ons, and Triton configuration. Shunya doesn't publish a sizing matrix, contact Shunya Labs ↗ for a sizing exercise before specifying hardware.

Network topology

SDK pointed at self-hosted

python
client = AsyncShunyaClient(
    api_key="your-api-key",
    tts_url="https://voice.yourcompany.internal",
    tts_ws_url="wss://voice.yourcompany.internal/ws",
)

The only models of their kind on Hugging Face

For environments with no network connectivity to the internet, defence, offline kiosks, mobile edge, air-gapped labs. Open weights from Shunya's Hugging Face organization, run on commodity x86.

Pull models from Hugging Face (on an internet-connected machine)

shell
huggingface-cli login
huggingface-cli download shunyalabs/pingala-v1-universal --local-dir ./models/pingala
huggingface-cli download shunyalabs/vak-translate-1.3b-ct2 --local-dir ./models/vak
huggingface-cli download shunyalabs/zero-stt-hinglish --local-dir ./models/hinglish

Transfer & run

  1. Package the ./models directory as a signed tarball.
  2. Verify checksums, transfer via signed media to the secure zone.
  3. Load from local path:
python
from pingala_shunya import PingalaTranscriber
tx = PingalaTranscriber(model_path="./models/pingala")
segments = tx.transcribe("meeting.wav")
Production serving in an air-gapped zone
If you need production-quality serving (multi-tenant, streaming, horizontally scaled) in the secure zone, the on-prem package is easier than rolling your own Triton config. Contact Shunya Labs ↗.

License reminder

ModelLicenseConstraints
pingala-v1-universalRAIL-MFree up to 10,000 hours/month. No redistribution, no derivatives. Attribution if outputs public.
zero-stt-hinglishopenrailPermissive, with responsible-use restrictions.
vak-translate-1.3b-ct2CC-BY-SA-4.0Share-alike, derivative works must carry the same license.