Language Identification

Automatically detect the spoken language in your audio recordings with confidence scores and standardized language codes.

Language Identification enables automatic detection of the spoken language in an audio file. The API returns the detected language code, full language name, and a confidence score indicating detection accuracy.

How to Enable

"language_code": "auto"

Request Example

Don’t forget to replace YOUR_API_KEY with your own secret key.

import requests

url = "https://tb2.shunyalabs.ai/v1/transcriptions"
headers = {"X-API-Key": "your-api-key"}

with open("audio.wav", "rb") as f:
    files = {"file": f}
    data = {
        "language_code": "auto"
    }

    response = requests.post(
        url,
        headers=headers,
        files=files,
        data=data
    )

print(response.json())

Example Output

{
  "success": true,
  "text": "Hello, how can I help you today?",
  "segments": [...],
  "language_identification": {
    "status": "success",
    "language": "en",
    "confidence": 0.9876
  }
}

Understanding Confidence Scores

The confidence value ranges from 0.0 to 1.0 and indicates the accuracy of language detection.

Scores > 0.9 — Very high confidence
Scores 0.8 – 0.9 — High confidence
Scores 0.7 – 0.8 — Moderate confidence
Scores < 0.7 — Low confidence (language may be ambiguous)

Best Practices

Provide at least 3–5 seconds of clear audio
Use continuous speech instead of isolated words
Avoid heavy background noise

Use Cases

Automatically route calls to language-specific support teams
Organize audio libraries by detected language
Verify language usage in educational content
International call center language detection
Media analytics and content classification
Compliance and regulatory validation