Your quickstart
Get from nothing to a working transcription and a working synthesis in five minutes. You'll need an API key and about three lines of code per direction.
1. Get an API key
- Sign in at accounts.shunyalabs.ai.
- Navigate to API Keys and click Create New Key.
- Copy the key immediately, it's shown once.
Keep the key out of source control
Store it in a .env file or a secrets manager. Add .env to .gitignore. Rotate if leaked.Set the environment variable
shell
export SHUNYALABS_API_KEY="sk-your-key-here"shell
$env:SHUNYALABS_API_KEY = "sk-your-key-here"2. Transcribe an audio file
Send audio to POST /v1/audio/transcriptions. The response is a JSON object with the transcript and per-segment timestamps.
shell
curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-F "file=@meeting.wav" \
-F "model=zero-indic"python
import os, requests
with open("meeting.wav", "rb") as f:
r = requests.post(
"https://asr.shunyalabs.ai/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
files={"file": f},
data={"model": "zero-indic"},
)
r.raise_for_status()
print(r.json()["text"])node
import fs from "node:fs";
const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("model", "zero-indic");
const r = await fetch("https://asr.shunyalabs.ai/v1/audio/transcriptions", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.SHUNYALABS_API_KEY}` },
body: form,
});
const data = await r.json();
console.log(data.text);python
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["SHUNYALABS_API_KEY"],
base_url="https://asr.shunyalabs.ai/v1",
)
r = client.audio.transcriptions.create(
model="zero-indic",
file=open("meeting.wav", "rb"),
)
print(r.text)Response
json
{
"success": true,
"request_id": "b3f1a2c4-...",
"text": "नमस्ते मोहम्मद जी, ये एक ज़रूरी कॉल है।",
"segments": [
{ "start": 0.51, "end": 5.70, "text": "नमस्ते मोहम्मद जी..." }
],
"detected_language": "Hindi",
"audio_duration": 5.7,
"inference_time_ms": 812.3
}3. Generate speech
Now send text to POST /v1/audio/speech. The response body is audio bytes in your requested format.
shell
curl -X POST https://tts.shunyalabs.ai/v1/audio/speech \
-H "Authorization: Bearer $SHUNYALABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"zero-indic","input":"Hello, how are you today?","voice":"Varun"}' \
--output hello.mp3python
import os, requests
r = requests.post(
"https://tts.shunyalabs.ai/v1/audio/speech",
headers={"Authorization": f"Bearer {os.environ['SHUNYALABS_API_KEY']}"},
json={"model": "zero-indic", "input": "Hello, how are you today?", "voice": "Varun"},
timeout=120,
)
r.raise_for_status()
with open("hello.mp3", "wb") as f:
f.write(r.content)python
import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig
async def main():
async with AsyncShunyaClient() as client:
result = await client.tts.synthesize(
"Hello, how are you today?",
config=TTSConfig(model="zero-indic", voice="Varun"),
)
result.save("hello.mp3")
asyncio.run(main())4. Stream in real time
For voice agents and IVR, both ASR and TTS support streaming over WebSocket, see ASR streaming and TTS streaming. You get partial transcripts as speech is happening, and synthesized audio as text is generated.