Your quickstart

Get from nothing to a working transcription and a working synthesis in five minutes. You'll need an API key and about three lines of code per direction.

1. Get an API key

  1. Sign in at accounts.shunyalabs.ai.
  2. Navigate to API Keys and click Create New Key.
  3. Copy the key immediately, it's shown once.
Keep the key out of source control
Store it in a .env file or a secrets manager. Add .env to .gitignore. Rotate if leaked.

Set the environment variable

shell
export SHUNYALABS_API_KEY="sk-your-key-here"
shell
$env:SHUNYALABS_API_KEY = "sk-your-key-here"

2. Transcribe an audio file

Send audio to POST /v1/audio/transcriptions. The response is a JSON object with the transcript and per-segment timestamps.

shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@meeting.wav" \
  -F "model=zero-indic"
python
import os, requests

with open("meeting.wav", "rb") as f:
    r = requests.post(
        "https://asr.shunyalabs.ai/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
        files={"file": f},
        data={"model": "zero-indic"},
    )
r.raise_for_status()
print(r.json()["text"])
node
import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("model", "zero-indic");

const r = await fetch("https://asr.shunyalabs.ai/v1/audio/transcriptions", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.SHUNYALABS_API_KEY}` },
  body: form,
});
const data = await r.json();
console.log(data.text);
python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["SHUNYALABS_API_KEY"],
    base_url="https://asr.shunyalabs.ai/v1",
)
r = client.audio.transcriptions.create(
    model="zero-indic",
    file=open("meeting.wav", "rb"),
)
print(r.text)

Response

json
{
  "success": true,
  "request_id": "b3f1a2c4-...",
  "text": "नमस्ते मोहम्मद जी, ये एक ज़रूरी कॉल है।",
  "segments": [
    { "start": 0.51, "end": 5.70, "text": "नमस्ते मोहम्मद जी..." }
  ],
  "detected_language": "Hindi",
  "audio_duration": 5.7,
  "inference_time_ms": 812.3
}

3. Generate speech

Now send text to POST /v1/audio/speech. The response body is audio bytes in your requested format.

shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"zero-indic","input":"Hello, how are you today?","voice":"Varun"}' \
  --output hello.mp3
python
import os, requests

r = requests.post(
    "https://tts.shunyalabs.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
    json={"model": "zero-indic", "input": "Hello, how are you today?", "voice": "Varun"},
    timeout=120,
)
r.raise_for_status()
with open("hello.mp3", "wb") as f:
    f.write(r.content)
python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        result = await client.tts.synthesize(
            "Hello, how are you today?",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("hello.mp3")

asyncio.run(main())

4. Stream in real time

For voice agents and IVR, both ASR and TTS support streaming over WebSocket, see ASR streaming and TTS streaming. You get partial transcripts as speech is happening, and synthesized audio as text is generated.