For teams that integrate at the protocol level

Integrate via REST & WebSocket

No SDK, no abstractions, just HTTP and WebSocket calls with a Bearer token. Pick this path when you need full control, are working in a language without an SDK, or are wiring Shunya into a low-level pipeline (telephony, embedded, custom runtime).

Your journey

Step 1: Authentication

All requests use Bearer-token authentication. Every HTTP request and WebSocket handshake must include an Authorization header.

shell
Authorization: Bearer <your-api-key>

Generate the key from the dashboard (API Keys → Create New Key). Copy and store it securely, it will not be shown again.

  • Never hardcode keys in source, use environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager).
  • Add .env to .gitignore.
  • Rotate immediately if a key is compromised.
  • Use separate keys per environment (dev / staging / prod) so you can revoke one without breaking others.

Step 2: Pick your endpoint

EndpointProtocolUse for
POST https://tts.shunyalabs.ai/v1/audio/speechHTTPBatch TTS, pre-rendered prompts, notifications, podcast/audiobook generation.
wss://tts.shunyalabs.ai/ws/v1/audio/speechWebSocketStreaming TTS, voice agents, IVR, real-time playback.
GET https://tts.shunyalabs.ai/healthHTTPHealth check, wire into your deploy smoke tests.

Step 3: Make your first request

shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zero-indic",
    "input": "नमस्ते, यह एक परीक्षण है।",
    "voice": "Kavita",
    "language": "hi",
    "response_format": "wav",
    "speed": 1.0,
    "trim_silence": true,
    "volume_normalization": "loudness"
  }' \
  --output output.wav
python
import requests

resp = requests.post(
    "https://tts.shunyalabs.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "zero-indic",
        "input": "Hello, how are you today?",
        "voice": "Varun",
        "response_format": "mp3",
    },
    timeout=120,
)
resp.raise_for_status()
with open("output.mp3", "wb") as f:
    f.write(resp.content)
shell
# npm install -g wscat
wscat -c "wss://tts.shunyalabs.ai/ws/v1/audio/speech" \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY"

# Send a synthesis request
> {"model": "zero-indic", "input": "Hello!", "voice": "Varun", "response_format": "pcm"}

# Server responds with chunk metadata, binary audio frames, then completion
< {"type": "chunk", "chunk_index": 0, "format": "pcm", "sample_rate": 16000}
< [binary audio bytes]
< {"type": "completion", "total_chunks": 3, "total_duration_seconds": 0.8}

Step 4: Request parameters that matter

ParameterTypeDefaultNotes
modelstringrequiredUse "zero-indic" for all Indic + English text.
input / target_textstringrequiredText to synthesize (1-10,000 characters).
voice / speaker_idstringrequiredSpeaker name. 46 voices across 23 Indic languages, see the voice list.
response_formatstringmp3pcm, wav, mp3, ogg_opus, flac, mulaw, alaw.
speedfloat1.00.25 (slowest) → 4.0 (fastest). Pitch-preserving.
languagestringnullISO 639 hint for text preprocessing (e.g. "hi"). Optional, model handles mixed scripts natively.
trim_silenceboolfalseTrim leading/trailing silence (-40 dB threshold).
volume_normalizationstringnull"peak" (0 dBFS) or "loudness" (EBU R128).

Picking a format

  • mp3: general storage / delivery, widely supported.
  • pcm or wav: real-time pipelines, no decoding overhead.
  • mulaw / alaw: telephony (IVR, PSTN), 8 kHz.
  • ogg_opus: web streaming, lower latency than MP3.
  • flac: lossless archival. Avoid for streaming, full file assembly required.

Step 5: Error handling

CodeMeaningWhat to do
400Invalid request (missing fields, out-of-range values)Validate inputs client-side.
401Invalid or missing API keyCheck SHUNYALABS_API_KEY and Authorization header.
403API key lacks required permissionsGenerate a new key with the right scope.
422Invalid text or configurationCheck parameter types and ranges.
429Rate limit hitExponential backoff; concurrent cap is 16 on default tier.
5xx / 503Transient server / Triton not readySafe to retry with backoff.
504Timeout (>300s batch, >30s per streaming chunk)Split long inputs; check network.

Step 6: Ship checklist

  • ✅ API key loaded from environment, never source-controlled
  • ✅ HTTP client timeout set to ≥120s for long batch synthesis
  • ✅ Errors mapped to user-facing messages (401, 429, 5xx)
  • response_format matched to use case (PCM/μ-law for real-time, MP3 for storage)
  • ✅ Reconnect + exponential backoff on WebSocket drops
  • GET /health wired into deploy smoke tests
  • X-Request-Id logged for support debugging
Want less boilerplate?
The Python SDK wraps these endpoints with typed config, async streaming generators, and built-in error classes. Same protocol, fewer lines.