Word Timestamps
Word-level timestamps provide precise timing information for each individual word in the transcription, along with confidence scores. Each segment is broken down into words with exact start and end times and a probability score indicating recognition confidence.
How to Enable
"include_word_timestamps": "true"Request
Don’t forget to replace YOUR_API_KEY with your own secret key.
import requests
url = "https://tb2.shunyalabs.ai/v1/transcriptions"
headers = {"X-API-Key": "your-api-key"}
with open("sample.wav", "rb") as f:
files = {"file": f}
data = {
"include_word_timestamps": "true"
}
response = requests.post(
url,
headers=headers,
files=files,
data=data
)
print(response.json())Example Output
{
"success": true,
"text": "Hello, how are you doing today?",
"segments": [
{
"start": 0.0,
"end": 3.5,
"text": "Hello, how are you doing today?",
"speaker": "SPEAKER_00",
"words": [
{ "word": "Hello", "start": 0.0, "end": 0.5, "probability": 0.96 },
{ "word": "how", "start": 0.8, "end": 1.0, "probability": 0.94 },
{ "word": "are", "start": 1.1, "end": 1.3, "probability": 0.92 },
{ "word": "you", "start": 1.4, "end": 1.7, "probability": 0.95 },
{ "word": "doing", "start": 1.8, "end": 2.2, "probability": 0.89 },
{ "word": "today", "start": 2.3, "end": 2.8, "probability": 0.91 }
]
}
]
}Understanding Confidence Scores
Each word includes a probability value between 0.0 and 1.0 indicating recognition confidence:
- ≥ 0.9 — Very high confidence
- 0.8 – 0.9 — High confidence
- 0.7 – 0.8 — Moderate confidence
- < 0.7 — Low confidence (may require review)
Best Practices
- Enable only when needed — adds ~5–10% additional processing time
- Use for quality control by reviewing words with low confidence
- Combine with speaker diarization to track who said each word
- Balance detail vs performance depending on your use case
Use Cases
- Video subtitles with precise word-level timing
- Karaoke and lyrics synchronization
- Quality assurance and transcript review
- Audio editing and content navigation
- Accessibility for hearing-impaired users
- Language learning and pronunciation analysis