Pre-recorded Audio

Response Object Reference

Every successful transcription returns a TranscriptionResult object.


TranscriptionResult schema

FieldTypeDescription
successboolWhether transcription succeeded.
request_idstringUnique identifier. Use when contacting support.
textstringFull transcript. Prefixed with [SPEAKER_XX] tags when diarization is enabled.
detected_languagestringDetected language in lowercase e.g. "hindi", "telugu". Always populated.
audio_durationfloatTotal audio duration in seconds.
inference_time_msfloatServer-side processing time in milliseconds.
segmentsarrayTime-aligned segments with start, end, text. May include speaker, emotion, words[].
speakersarrayUnique speaker IDs. Empty array when diarization is off.
nlp_analysisobject|nullNLP results. Populated when enable_* flags are set.

SegmentResult schema

FieldTypeDescription
startfloatSegment start time in seconds.
endfloatSegment end time in seconds.
textstringTranscript text for this segment.
speakerstring|nullSpeaker label e.g. SPEAKER_00. Present when enable_diarization=true.
emotionstring|nullDetected emotion. Present when enable_emotion_diarization=true. NOT in nlp_analysis.
wordsarray|nullPer-word timestamps (word, start, end, score). Present when word_timestamps=true.
Important: emotion lives in segments[] — it is NOT in nlp_analysis.

NLPAnalysis schema

FieldTypeDescription
sentimentobject|nullSet when enable_sentiment_analysis=true. Contains label (string), score (float −1 to 1), explanation (string).
intentobject|nullSet when enable_intent_detection=true. Contains label, confidence (float), reasoning (string).
summarystring|nullSet when enable_summarization=true.
translationstring|nullSet when output_language is provided.
normalized_textstring|nullSet when enable_keyterm_normalization=true. Original text field is unchanged.