Your quickstart

Get from nothing to a working transcription and a working synthesis in five minutes. You'll need an API key and about three lines of code per direction.

1. Get an API key

Sign in at accounts.shunyalabs.ai.
Navigate to API Keys and click Create New Key.
Copy the key immediately, it's shown once.

Keep the key out of source control

Store it in a .env file or a secrets manager. Add .env to .gitignore. Rotate if leaked.

Set the environment variable

export SHUNYALABS_API_KEY="sk-your-key-here"

$env:SHUNYALABS_API_KEY = "sk-your-key-here"

2. Transcribe an audio file

Send audio to POST /v1/audio/transcriptions. The response is a JSON object with the transcript and per-segment timestamps.

shell

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@meeting.wav" \
  -F "model=zero-indic"

python

import os, requests

with open("meeting.wav", "rb") as f:
    r = requests.post(
        "https://asr.shunyalabs.ai/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
        files={"file": f},
        data={"model": "zero-indic"},
    )
r.raise_for_status()
print(r.json()["text"])

node

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("model", "zero-indic");

const r = await fetch("https://asr.shunyalabs.ai/v1/audio/transcriptions", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.SHUNYALABS_API_KEY}` },
  body: form,
});
const data = await r.json();
console.log(data.text);

python

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["SHUNYALABS_API_KEY"],
    base_url="https://asr.shunyalabs.ai/v1",
)
r = client.audio.transcriptions.create(
    model="zero-indic",
    file=open("meeting.wav", "rb"),
)
print(r.text)

Response

{
  "success": true,
  "request_id": "b3f1a2c4-...",
  "text": "नमस्ते मोहम्मद जी, ये एक ज़रूरी कॉल है।",
  "segments": [
    { "start": 0.51, "end": 5.70, "text": "नमस्ते मोहम्मद जी..." }
  ],
  "detected_language": "Hindi",
  "audio_duration": 5.7,
  "inference_time_ms": 812.3
}

3. Generate speech

Now send text to POST /v1/audio/speech. The response body is audio bytes in your requested format.

shell

curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"zero-indic","input":"Hello, how are you today?","voice":"Varun"}' \
  --output hello.mp3

python

import os, requests

r = requests.post(
    "https://tts.shunyalabs.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
    json={"model": "zero-indic", "input": "Hello, how are you today?", "voice": "Varun"},
    timeout=120,
)
r.raise_for_status()
with open("hello.mp3", "wb") as f:
    f.write(r.content)

python

import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient() as client:
        result = await client.tts.synthesize(
            "Hello, how are you today?",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("hello.mp3")

asyncio.run(main())

4. Stream in real time

For voice agents and IVR, both ASR and TTS support streaming over WebSocket, see ASR streaming and TTS streaming. You get partial transcripts as speech is happening, and synthesized audio as text is generated.