Input

Supported Audio Formats

ShunyaLabs accepts a wide range of audio and video formats:

  • Audio: WAV, MP3, M4A, FLAC, OGG, AAC, WMA
  • Video: MP4, MKV, MOV, AVI, WebM (audio track is extracted)

File Size Limits

  • Maximum file size: 10 MB
  • Files larger than 10 MB will be rejected with a 413 error

Need to transcribe larger files?

  • Split audio into smaller segments
  • Reduce bitrate or sample rate
  • Convert to a compressed format like MP3

Request Parameters

ParameterTypeDefaultDescription
filefileRequiredYour audio or video file
language_codestringautoTarget language for transcription
output_scriptstringautoWriting script for output text

Language Codes and Output Scripts

See supported languages for audio language detection.

See supported scripts for output formatting.

WebSocket API

An alternative connection method for batch transcription that provides real-time connection feedback.

Step 1: Install websockets

pip install websockets

Step 2: Connect and transcribe

import asyncio
import websockets
import base64
import json

async def transcribe_audio():
    uri = "wss://tb.shunyalabs.ai/ws"

    async with websockets.connect(uri) as websocket:
        # Send authentication and configuration
        config = {
            "api_key": "your_api_key_here",
            "language_code": "auto",
            "chunk_size": 120,
            "enable_diarization": True,
            "output_script": "auto"
        }

        # Read and encode audio file
        with open("your_audio.wav", "rb") as audio_file:
            audio_data = base64.b64encode(audio_file.read()).decode()

        # Send request
        message = {**config, "audio": audio_data}
        await websocket.send(json.dumps(message))

        # Receive transcription
        response = await websocket.recv()
        result = json.loads(response)
        print(result["text"])

asyncio.run(transcribe_audio())
Don’t forget to replace YOUR_API_KEY with your own secret key.

Quicklinks