Input
Supported Audio Formats
ShunyaLabs accepts a wide range of audio and video formats:
- Audio: WAV, MP3, M4A, FLAC, OGG, AAC, WMA
- Video: MP4, MKV, MOV, AVI, WebM (audio track is extracted)
File Size Limits
- Maximum file size: 10 MB
- Files larger than 10 MB will be rejected with a
413error
Need to transcribe larger files?
- Split audio into smaller segments
- Reduce bitrate or sample rate
- Convert to a compressed format like MP3
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| file | file | Required | Your audio or video file |
| language_code | string | auto | Target language for transcription |
| output_script | string | auto | Writing script for output text |
Language Codes and Output Scripts
See supported languages for audio language detection.
See supported scripts for output formatting.
WebSocket API
An alternative connection method for batch transcription that provides real-time connection feedback.
Step 1: Install websockets
pip install websocketsStep 2: Connect and transcribe
import asyncio
import websockets
import base64
import json
async def transcribe_audio():
uri = "wss://tb.shunyalabs.ai/ws"
async with websockets.connect(uri) as websocket:
# Send authentication and configuration
config = {
"api_key": "your_api_key_here",
"language_code": "auto",
"chunk_size": 120,
"enable_diarization": True,
"output_script": "auto"
}
# Read and encode audio file
with open("your_audio.wav", "rb") as audio_file:
audio_data = base64.b64encode(audio_file.read()).decode()
# Send request
message = {**config, "audio": audio_data}
await websocket.send(json.dumps(message))
# Receive transcription
response = await websocket.recv()
result = json.loads(response)
print(result["text"])
asyncio.run(transcribe_audio())Don’t forget to replace YOUR_API_KEY with your own secret key.
Quicklinks
Output
Understand response formats, retrieve transcriptions, and handle results
Troubleshooting
Resolve common issues with the Batch API
Speaker diarization
Identify and separate different speakers in your audio files
More features
Add timestamps, sentiment analysis, custom vocabularies, and advanced options