Input

Supported Audio Formats

ShunyaLabs accepts a wide range of audio and video formats:

Audio: WAV, MP3, M4A, FLAC, OGG, AAC, WMA
Video: MP4, MKV, MOV, AVI, WebM (audio track is extracted)

File Size Limits

Maximum file size: 10 MB
Files larger than 10 MB will be rejected with a 413 error

Need to transcribe larger files?

Split audio into smaller segments
Reduce bitrate or sample rate
Convert to a compressed format like MP3

Request Parameters

Parameter	Type	Default	Description
file	file	Required	Your audio or video file
language_code	string	auto	Target language for transcription
output_script	string	auto	Writing script for output text

Language Codes and Output Scripts

See supported languages for audio language detection.

See supported scripts for output formatting.

WebSocket API

An alternative connection method for batch transcription that provides real-time connection feedback.

Step 1: Install websockets

pip install websockets

Step 2: Connect and transcribe

import asyncio
import websockets
import base64
import json

async def transcribe_audio():
    uri = "wss://tb.shunyalabs.ai/ws"

    async with websockets.connect(uri) as websocket:
        # Send authentication and configuration
        config = {
            "api_key": "your_api_key_here",
            "language_code": "auto",
            "chunk_size": 120,
            "enable_diarization": True,
            "output_script": "auto"
        }

        # Read and encode audio file
        with open("your_audio.wav", "rb") as audio_file:
            audio_data = base64.b64encode(audio_file.read()).decode()

        # Send request
        message = {**config, "audio": audio_data}
        await websocket.send(json.dumps(message))

        # Receive transcription
        response = await websocket.recv()
        result = json.loads(response)
        print(result["text"])

asyncio.run(transcribe_audio())

Don’t forget to replace YOUR_API_KEY with your own secret key.

Quicklinks

Output

Understand response formats, retrieve transcriptions, and handle results

Troubleshooting

Resolve common issues with the Batch API

Speaker diarization

Identify and separate different speakers in your audio files

More features

Add timestamps, sentiment analysis, custom vocabularies, and advanced options