Pre-recorded Audio
Response Object Reference
Every successful transcription returns a TranscriptionResult object.
TranscriptionResult schema
| Field | Type | Description |
|---|---|---|
success | bool | Whether transcription succeeded. |
request_id | string | Unique identifier. Use when contacting support. |
text | string | Full transcript. Prefixed with [SPEAKER_XX] tags when diarization is enabled. |
detected_language | string | Detected language in lowercase e.g. "hindi", "telugu". Always populated. |
audio_duration | float | Total audio duration in seconds. |
inference_time_ms | float | Server-side processing time in milliseconds. |
segments | array | Time-aligned segments with start, end, text. May include speaker, emotion, words[]. |
speakers | array | Unique speaker IDs. Empty array when diarization is off. |
nlp_analysis | object|null | NLP results. Populated when enable_* flags are set. |
SegmentResult schema
| Field | Type | Description |
|---|---|---|
start | float | Segment start time in seconds. |
end | float | Segment end time in seconds. |
text | string | Transcript text for this segment. |
speaker | string|null | Speaker label e.g. SPEAKER_00. Present when enable_diarization=true. |
emotion | string|null | Detected emotion. Present when enable_emotion_diarization=true. NOT in nlp_analysis. |
words | array|null | Per-word timestamps (word, start, end, score). Present when word_timestamps=true. |
Important:
emotion lives in segments[] — it is NOT in nlp_analysis.NLPAnalysis schema
| Field | Type | Description |
|---|---|---|
sentiment | object|null | Set when enable_sentiment_analysis=true. Contains label (string), score (float −1 to 1), explanation (string). |
intent | object|null | Set when enable_intent_detection=true. Contains label, confidence (float), reasoning (string). |
summary | string|null | Set when enable_summarization=true. |
translation | string|null | Set when output_language is provided. |
normalized_text | string|null | Set when enable_keyterm_normalization=true. Original text field is unchanged. |