Zero STT Med – Batch Transcription Documentation
Medical-grade Speech-to-Text powered by ShunyaLabs
Zero STT Med is a domain-specific speech recognition model optimized for medical terminology, procedures, and clinical documentation.
Prerequisites
- Python 3.8 or higher
- Valid API key (contact [email protected])
- Supported formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, MP4, MKV, MOV, AVI, WebM
Installation:
pip install requestsInput – REST API
import requests
url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": YOUR_API_KEY}
with open("your_audio_file.wav", "rb") as audio_file:
files = {"file": audio_file}
data = {
"language_code": "med-en",
"enable_diarization": "true"
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(result["text"])Input – cURL
curl -X POST "https://tb.shunyalabs.ai/transcribe" \
-H "X-API-Key: <YOUR_API_KEY>" \
-F "file=@your_audio_file.wav" \
-F "language_code=med-en" \
-F "enable_diarization=true"File Size Limits
Maximum file size: 30 MB
Files larger than 30MB must be split before processing.
Parameters
| Parameter | Value | Description |
|---|---|---|
| file | <audio_file> | Path to audio file |
| language_code | med-en | Use Zero STT Med model |
| enable_diarization | true | Enable speaker identification |
| api_key | <YOUR_API_KEY> | Authentication key |
| api_url | https://tb.shunyalabs.ai | API endpoint |
Output
Segments appear only when diarization is enabled.
{
"success": true,
"text": "Patient presents with acute onset chest pain radiating to left arm. History of hypertension and diabetes mellitus type 2.",
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Patient presents with acute onset chest pain radiating to left arm.",
"speaker": "SPEAKER_00"
},
{
"start": 5.5,
"end": 9.8,
"text": "History of hypertension and diabetes mellitus type 2.",
"speaker": "SPEAKER_00"
}
],
"total_segments": 2,
"filename": "your_audio_file.wav",
"unique_speakers": ["SPEAKER_00"]
}
Relevant Response Fields
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether transcription succeeded |
| text | string | Complete transcription text |
| segments | array | Speaker-segmented transcription |
| total_segments | integer | Number of segments |
| filename | string | Original filename |
| unique_speakers | array | List of speaker IDs |
Segment Object
| Field | Type | Description |
|---|---|---|
| start | float | Start time in seconds |
| end | float | End time in seconds |
| text | string | Transcribed text |
| speaker | string | Speaker identifier |
Best Practices
- Always use
language_code=med-en - Split files larger than 30MB
- Enable diarization for multi-speaker conversations