Output

The API returns a JSON object containing the transcription result and detailed metadata.

Success Response

{
  "success": true,
  "text": "Hello everyone, welcome to today's meeting. Let's begin with the quarterly review.",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": "Hello everyone, welcome to today's meeting.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 3.5,
      "end": 5.8,
      "text": "Let's begin with the quarterly review.",
      "speaker": "SPEAKER_01"
    }
  ],
  "detected_language": "English",
  "language_probability": 0.99,
  "total_segments": 2,
  "chunks_processed": 1,
  "chunk_size_seconds": 120,
  "filename": "meeting.wav",
  "total_time": 2.34,
  "model_used": "english_en",
  "has_speaker_diarization": true,
  "unique_speakers": ["SPEAKER_00", "SPEAKER_01"],
  "diarization_time": 1.12
}

Response Fields

FieldTypeDescription
successbooleanWhether transcription succeeded
textstringComplete transcription text
segmentsarrayTimestamped segments with speaker labels
detected_languagestringAutomatically detected language
language_probabilityfloatLanguage confidence score (0–1)
total_segmentsintegerNumber of segments
chunks_processedintegerAudio chunks processed
filenamestringOriginal filename
total_timefloatProcessing time in seconds
model_usedstringASR model used
has_speaker_diarizationbooleanSpeaker labels enabled
unique_speakersarrayDetected speaker IDs
diarization_timefloatSpeaker diarization time (sec)

Segment Object

Each item in the segments array represents a timestamped speech segment.

FieldTypeDescription
startfloatStart time in seconds
endfloatEnd time in seconds
textstringSegment transcription
speakerstringSpeaker ID (e.g. SPEAKER_00)

Working with Results

Extract full transcript

result = transcribe_file("audio.wav", api_key)
print(result["text"])

Process timestamped segments

for segment in result["segments"]:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['speaker']}: {segment['text']}")

Identify unique speakers

speakers = result["unique_speakers"]
print(f"Found {len(speakers)} speakers: {', '.join(speakers)}")

Check language confidence

if result["language_probability"] > 0.9:
    print(f"Confident detection: {result['detected_language']}")
else:
    print(f"Uncertain detection: {result['detected_language']} ({result['language_probability']:.2%})")