For teams that integrate at the protocol level

Integrate via REST & WebSocket

No SDK, no abstractions, just HTTP and WebSocket calls with a Bearer token. Pick this path when you need full control, are working in a language without an SDK, or are wiring Shunya into a low-level pipeline (telephony, embedded, custom runtime).

Your journey

Step 1: Authentication

All requests use Bearer-token authentication. Every HTTP request and WebSocket handshake must include an Authorization header.

Authorization: Bearer <your-api-key>

Generate the key from the dashboard (API Keys → Create New Key). Copy and store it securely, it will not be shown again.

Never hardcode keys in source, use environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager).
Add .env to .gitignore.
Rotate immediately if a key is compromised.
Use separate keys per environment (dev / staging / prod) so you can revoke one without breaking others.

Step 2: Pick your endpoint

Endpoint	Protocol	Use for
`POST https://tts.shunyalabs.ai/v1/audio/speech`	HTTP	Batch TTS, pre-rendered prompts, notifications, podcast/audiobook generation.
`wss://tts.shunyalabs.ai/ws/v1/audio/speech`	WebSocket	Streaming TTS, voice agents, IVR, real-time playback.
`GET https://tts.shunyalabs.ai/health`	HTTP	Health check, wire into your deploy smoke tests.

Step 3: Make your first request

shell

curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zero-indic",
    "input": "नमस्ते, यह एक परीक्षण है।",
    "voice": "Kavita",
    "language": "hi",
    "response_format": "wav",
    "speed": 1.0,
    "trim_silence": true,
    "volume_normalization": "loudness"
  }' \
  --output output.wav

python

import requests

resp = requests.post(
    "https://tts.shunyalabs.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "zero-indic",
        "input": "Hello, how are you today?",
        "voice": "Varun",
        "response_format": "mp3",
    },
    timeout=120,
)
resp.raise_for_status()
with open("output.mp3", "wb") as f:
    f.write(resp.content)

shell

# npm install -g wscat
wscat -c "wss://tts.shunyalabs.ai/ws/v1/audio/speech" \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY"

# Send a synthesis request
> {"model": "zero-indic", "input": "Hello!", "voice": "Varun", "response_format": "pcm"}

# Server responds with chunk metadata, binary audio frames, then completion
< {"type": "chunk", "chunk_index": 0, "format": "pcm", "sample_rate": 16000}
< [binary audio bytes]
< {"type": "completion", "total_chunks": 3, "total_duration_seconds": 0.8}

Step 4: Request parameters that matter

Parameter	Type	Default	Notes
`model`	string	required	Use `"zero-indic"` for all Indic + English text.
`input` / `target_text`	string	required	Text to synthesize (1-10,000 characters).
`voice` / `speaker_id`	string	required	Speaker name. 46 voices across 23 Indic languages, see the voice list.
`response_format`	string	`mp3`	`pcm`, `wav`, `mp3`, `ogg_opus`, `flac`, `mulaw`, `alaw`.
`speed`	float	1.0	0.25 (slowest) → 4.0 (fastest). Pitch-preserving.
`language`	string	null	ISO 639 hint for text preprocessing (e.g. `"hi"`). Optional, model handles mixed scripts natively.
`trim_silence`	bool	false	Trim leading/trailing silence (-40 dB threshold).
`volume_normalization`	string	null	`"peak"` (0 dBFS) or `"loudness"` (EBU R128).

Picking a format

mp3: general storage / delivery, widely supported.
pcm or wav: real-time pipelines, no decoding overhead.
mulaw / alaw: telephony (IVR, PSTN), 8 kHz.
ogg_opus: web streaming, lower latency than MP3.
flac: lossless archival. Avoid for streaming, full file assembly required.

Step 5: Error handling

Code	Meaning	What to do
400	Invalid request (missing fields, out-of-range values)	Validate inputs client-side.
401	Invalid or missing API key	Check `SHUNYALABS_API_KEY` and Authorization header.
403	API key lacks required permissions	Generate a new key with the right scope.
422	Invalid text or configuration	Check parameter types and ranges.
429	Rate limit hit	Exponential backoff; concurrent cap is 16 on default tier.
5xx / 503	Transient server / Triton not ready	Safe to retry with backoff.
504	Timeout (>300s batch, >30s per streaming chunk)	Split long inputs; check network.

Step 6: Ship checklist

✅ API key loaded from environment, never source-controlled
✅ HTTP client timeout set to ≥120s for long batch synthesis
✅ Errors mapped to user-facing messages (401, 429, 5xx)
✅ response_format matched to use case (PCM/μ-law for real-time, MP3 for storage)
✅ Reconnect + exponential backoff on WebSocket drops
✅ GET /health wired into deploy smoke tests
✅ X-Request-Id logged for support debugging

Want less boilerplate?

The Python SDK wraps these endpoints with typed config, async streaming generators, and built-in error classes. Same protocol, fewer lines.