Zero STT Med – Batch Transcription Documentation

Medical-grade Speech-to-Text powered by ShunyaLabs

Zero STT Med is a domain-specific speech recognition model optimized for medical terminology, procedures, and clinical documentation.

Prerequisites

  • Python 3.8 or higher
  • Valid API key (contact [email protected])
  • Supported formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, MP4, MKV, MOV, AVI, WebM

Installation:

pip install requests

Input – REST API

import requests

url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": YOUR_API_KEY}

with open("your_audio_file.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "language_code": "med-en",
        "enable_diarization": "true"
    }

    response = requests.post(url, headers=headers, files=files, data=data)
    result = response.json()

print(result["text"])

Input – cURL

curl -X POST "https://tb.shunyalabs.ai/transcribe" \
-H "X-API-Key: <YOUR_API_KEY>" \
-F "file=@your_audio_file.wav" \
-F "language_code=med-en" \
-F "enable_diarization=true"

File Size Limits

Maximum file size: 30 MB

Files larger than 30MB must be split before processing.

Parameters

ParameterValueDescription
file<audio_file>Path to audio file
language_codemed-enUse Zero STT Med model
enable_diarizationtrueEnable speaker identification
api_key<YOUR_API_KEY>Authentication key
api_urlhttps://tb.shunyalabs.aiAPI endpoint

Output

Segments appear only when diarization is enabled.

{
  "success": true,
  "text": "Patient presents with acute onset chest pain radiating to left arm. History of hypertension and diabetes mellitus type 2.",
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Patient presents with acute onset chest pain radiating to left arm.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 5.5,
      "end": 9.8,
      "text": "History of hypertension and diabetes mellitus type 2.",
      "speaker": "SPEAKER_00"
    }
  ],
  "total_segments": 2,
  "filename": "your_audio_file.wav",
  "unique_speakers": ["SPEAKER_00"]
}

Relevant Response Fields

FieldTypeDescription
successbooleanWhether transcription succeeded
textstringComplete transcription text
segmentsarraySpeaker-segmented transcription
total_segmentsintegerNumber of segments
filenamestringOriginal filename
unique_speakersarrayList of speaker IDs

Segment Object

FieldTypeDescription
startfloatStart time in seconds
endfloatEnd time in seconds
textstringTranscribed text
speakerstringSpeaker identifier

Best Practices

  • Always use language_code=med-en
  • Split files larger than 30MB
  • Enable diarization for multi-speaker conversations