Language Identification
Automatically detect the spoken language in your audio recordings with confidence scores and standardized language codes.
Language Identification enables automatic detection of the spoken language in an audio file. The API returns the detected language code, full language name, and a confidence score indicating detection accuracy.
How to Enable
"language_code": "auto"Request Example
Don’t forget to replace YOUR_API_KEY with your own secret key.
import requests
url = "https://tb2.shunyalabs.ai/v1/transcriptions"
headers = {"X-API-Key": "your-api-key"}
with open("audio.wav", "rb") as f:
files = {"file": f}
data = {
"language_code": "auto"
}
response = requests.post(
url,
headers=headers,
files=files,
data=data
)
print(response.json())Example Output
{
"success": true,
"text": "Hello, how can I help you today?",
"segments": [...],
"language_identification": {
"status": "success",
"language": "en",
"confidence": 0.9876
}
}Understanding Confidence Scores
The confidence value ranges from 0.0 to 1.0 and indicates the accuracy of language detection.
- Scores > 0.9 — Very high confidence
- Scores 0.8 – 0.9 — High confidence
- Scores 0.7 – 0.8 — Moderate confidence
- Scores < 0.7 — Low confidence (language may be ambiguous)
Best Practices
- Provide at least 3–5 seconds of clear audio
- Use continuous speech instead of isolated words
- Avoid heavy background noise
Use Cases
- Automatically route calls to language-specific support teams
- Organize audio libraries by detected language
- Verify language usage in educational content
- International call center language detection
- Media analytics and content classification
- Compliance and regulatory validation