Zero STT Med – Batch Transcription Documentation

Medical-grade Speech-to-Text powered by ShunyaLabs

Zero STT Med is a domain-specific speech recognition model optimized for medical terminology, procedures, and clinical documentation.

Prerequisites

Python 3.8 or higher
Valid API key (contact [email protected])
Supported formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, MP4, MKV, MOV, AVI, WebM

Installation:

pip install requests

Input – REST API

import requests

url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": YOUR_API_KEY}

with open("your_audio_file.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "language_code": "med-en",
        "enable_diarization": "true"
    }

    response = requests.post(url, headers=headers, files=files, data=data)
    result = response.json()

print(result["text"])

Input – cURL

curl -X POST "https://tb.shunyalabs.ai/transcribe" \
-H "X-API-Key: <YOUR_API_KEY>" \
-F "file=@your_audio_file.wav" \
-F "language_code=med-en" \
-F "enable_diarization=true"

File Size Limits

Maximum file size: 30 MB

Files larger than 30MB must be split before processing.

Parameters

Parameter	Value	Description
file	<audio_file>	Path to audio file
language_code	med-en	Use Zero STT Med model
enable_diarization	true	Enable speaker identification
api_key	<YOUR_API_KEY>	Authentication key
api_url	https://tb.shunyalabs.ai	API endpoint

Output

Segments appear only when diarization is enabled.

{
  "success": true,
  "text": "Patient presents with acute onset chest pain radiating to left arm. History of hypertension and diabetes mellitus type 2.",
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Patient presents with acute onset chest pain radiating to left arm.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 5.5,
      "end": 9.8,
      "text": "History of hypertension and diabetes mellitus type 2.",
      "speaker": "SPEAKER_00"
    }
  ],
  "total_segments": 2,
  "filename": "your_audio_file.wav",
  "unique_speakers": ["SPEAKER_00"]
}

Relevant Response Fields

Field	Type	Description
success	boolean	Whether transcription succeeded
text	string	Complete transcription text
segments	array	Speaker-segmented transcription
total_segments	integer	Number of segments
filename	string	Original filename
unique_speakers	array	List of speaker IDs

Segment Object

Field	Type	Description
start	float	Start time in seconds
end	float	End time in seconds
text	string	Transcribed text
speaker	string	Speaker identifier

Best Practices

Always use language_code=med-en
Split files larger than 30MB
Enable diarization for multi-speaker conversations