Output
The API returns a JSON object containing the transcription result and detailed metadata.
Success Response
{
"success": true,
"text": "Hello everyone, welcome to today's meeting. Let's begin with the quarterly review.",
"segments": [
{
"start": 0.0,
"end": 3.5,
"text": "Hello everyone, welcome to today's meeting.",
"speaker": "SPEAKER_00"
},
{
"start": 3.5,
"end": 5.8,
"text": "Let's begin with the quarterly review.",
"speaker": "SPEAKER_01"
}
],
"detected_language": "English",
"language_probability": 0.99,
"total_segments": 2,
"chunks_processed": 1,
"chunk_size_seconds": 120,
"filename": "meeting.wav",
"total_time": 2.34,
"model_used": "english_en",
"has_speaker_diarization": true,
"unique_speakers": ["SPEAKER_00", "SPEAKER_01"],
"diarization_time": 1.12
}Response Fields
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether transcription succeeded |
| text | string | Complete transcription text |
| segments | array | Timestamped segments with speaker labels |
| detected_language | string | Automatically detected language |
| language_probability | float | Language confidence score (0–1) |
| total_segments | integer | Number of segments |
| chunks_processed | integer | Audio chunks processed |
| filename | string | Original filename |
| total_time | float | Processing time in seconds |
| model_used | string | ASR model used |
| has_speaker_diarization | boolean | Speaker labels enabled |
| unique_speakers | array | Detected speaker IDs |
| diarization_time | float | Speaker diarization time (sec) |
Segment Object
Each item in the segments array represents a timestamped speech segment.
| Field | Type | Description |
|---|---|---|
| start | float | Start time in seconds |
| end | float | End time in seconds |
| text | string | Segment transcription |
| speaker | string | Speaker ID (e.g. SPEAKER_00) |
Working with Results
Extract full transcript
result = transcribe_file("audio.wav", api_key)
print(result["text"])Process timestamped segments
for segment in result["segments"]:
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['speaker']}: {segment['text']}")Identify unique speakers
speakers = result["unique_speakers"]
print(f"Found {len(speakers)} speakers: {', '.join(speakers)}")Check language confidence
if result["language_probability"] > 0.9:
print(f"Confident detection: {result['detected_language']}")
else:
print(f"Uncertain detection: {result['detected_language']} ({result['language_probability']:.2%})")