For teams that integrate at the protocol level
Integrate via REST & WebSocket
No SDK, no abstractions, just HTTP and WebSocket calls with a Bearer token. Pick this path when you need full control, are working in a language without an SDK, or are wiring Shunya into a low-level pipeline (telephony, embedded, custom runtime).
Your journey
Step 1: Authentication
All requests use Bearer-token authentication. Every HTTP request and WebSocket handshake must include an Authorization header.
shell
Authorization: Bearer <your-api-key>Generate the key from the dashboard (API Keys → Create New Key). Copy and store it securely, it will not be shown again.
- Never hardcode keys in source, use environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager).
- Add
.envto.gitignore. - Rotate immediately if a key is compromised.
- Use separate keys per environment (dev / staging / prod) so you can revoke one without breaking others.
Step 2: Pick your endpoint
| Endpoint | Protocol | Use for |
|---|---|---|
POST https://tts.shunyalabs.ai/v1/audio/speech | HTTP | Batch TTS, pre-rendered prompts, notifications, podcast/audiobook generation. |
wss://tts.shunyalabs.ai/ws/v1/audio/speech | WebSocket | Streaming TTS, voice agents, IVR, real-time playback. |
GET https://tts.shunyalabs.ai/health | HTTP | Health check, wire into your deploy smoke tests. |
Step 3: Make your first request
shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zero-indic",
"input": "नमस्ते, यह एक परीक्षण है।",
"voice": "Kavita",
"language": "hi",
"response_format": "wav",
"speed": 1.0,
"trim_silence": true,
"volume_normalization": "loudness"
}' \
--output output.wavpython
import requests
resp = requests.post(
"https://tts.shunyalabs.ai/v1/audio/speech",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "zero-indic",
"input": "Hello, how are you today?",
"voice": "Varun",
"response_format": "mp3",
},
timeout=120,
)
resp.raise_for_status()
with open("output.mp3", "wb") as f:
f.write(resp.content)shell
# npm install -g wscat
wscat -c "wss://tts.shunyalabs.ai/ws/v1/audio/speech" \
-H "Authorization: Bearer $SHUNYALABS_API_KEY"
# Send a synthesis request
> {"model": "zero-indic", "input": "Hello!", "voice": "Varun", "response_format": "pcm"}
# Server responds with chunk metadata, binary audio frames, then completion
< {"type": "chunk", "chunk_index": 0, "format": "pcm", "sample_rate": 16000}
< [binary audio bytes]
< {"type": "completion", "total_chunks": 3, "total_duration_seconds": 0.8}Step 4: Request parameters that matter
| Parameter | Type | Default | Notes |
|---|---|---|---|
model | string | required | Use "zero-indic" for all Indic + English text. |
input / target_text | string | required | Text to synthesize (1-10,000 characters). |
voice / speaker_id | string | required | Speaker name. 46 voices across 23 Indic languages, see the voice list. |
response_format | string | mp3 | pcm, wav, mp3, ogg_opus, flac, mulaw, alaw. |
speed | float | 1.0 | 0.25 (slowest) → 4.0 (fastest). Pitch-preserving. |
language | string | null | ISO 639 hint for text preprocessing (e.g. "hi"). Optional, model handles mixed scripts natively. |
trim_silence | bool | false | Trim leading/trailing silence (-40 dB threshold). |
volume_normalization | string | null | "peak" (0 dBFS) or "loudness" (EBU R128). |
Picking a format
mp3: general storage / delivery, widely supported.pcmorwav: real-time pipelines, no decoding overhead.mulaw/alaw: telephony (IVR, PSTN), 8 kHz.ogg_opus: web streaming, lower latency than MP3.flac: lossless archival. Avoid for streaming, full file assembly required.
Step 5: Error handling
| Code | Meaning | What to do |
|---|---|---|
| 400 | Invalid request (missing fields, out-of-range values) | Validate inputs client-side. |
| 401 | Invalid or missing API key | Check SHUNYALABS_API_KEY and Authorization header. |
| 403 | API key lacks required permissions | Generate a new key with the right scope. |
| 422 | Invalid text or configuration | Check parameter types and ranges. |
| 429 | Rate limit hit | Exponential backoff; concurrent cap is 16 on default tier. |
| 5xx / 503 | Transient server / Triton not ready | Safe to retry with backoff. |
| 504 | Timeout (>300s batch, >30s per streaming chunk) | Split long inputs; check network. |
Step 6: Ship checklist
- ✅ API key loaded from environment, never source-controlled
- ✅ HTTP client timeout set to ≥120s for long batch synthesis
- ✅ Errors mapped to user-facing messages (401, 429, 5xx)
- ✅
response_formatmatched to use case (PCM/μ-law for real-time, MP3 for storage) - ✅ Reconnect + exponential backoff on WebSocket drops
- ✅
GET /healthwired into deploy smoke tests - ✅
X-Request-Idlogged for support debugging
Want less boilerplate?
The Python SDK wraps these endpoints with typed config, async streaming generators, and built-in error classes. Same protocol, fewer lines.