Intelligence Layer

Optional capabilities on top of Zero STT. Turn each one on with a boolean flag on the same transcription request, results come back in the same JSON as your transcript. Mix and match freely.

Works with batch ASR today

Add flags to POST https://asr.shunyalabs.ai/v1/audio/transcriptions. Every example below uses that endpoint unless noted.

How to enable

Pick a feature below, copy the flag into your request, and open Show request & response on the card for a ready-to-run example.

Send your audio file (or URL) with model=zero-indic (or another Zero STT model).
Add one or more enable_* fields, all are optional.
Read enriched fields on the response: segments, speakers, nlp_analysis, and more.

All features

Diarization

enable_diarization=true

Who spoke when. Adds speaker: SPEAKER_XX on each segment and a top-level speakers array.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@meeting.wav" \
  -F "model=zero-indic" \
  -F "enable_diarization=true"

Response (excerpt)

{
  "text": "[SPEAKER_00] नमस्ते... [SPEAKER_01] मैं ठीक हूँ...",
  "segments": [
    { "start": 0.5, "end": 3.2, "text": "...", "speaker": "SPEAKER_00" },
    { "start": 4.1, "end": 6.8, "text": "...", "speaker": "SPEAKER_01" }
  ],
  "speakers": ["SPEAKER_00", "SPEAKER_01"]
}

Speaker identification

enable_speaker_identification=true

Replace anonymous labels with registered names. Requires diarization and a voice profile per speaker.

Show request & response

First register a speaker with a 5-15 s clean clip, then transcribe with project set to your voice library.

Request

curl -X POST https://asr.shunyalabs.ai/v1/speakers/register \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "name=Priya" -F "file=@priya_sample.wav" -F "project=support_team"

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_diarization=true" \
  -F "enable_speaker_identification=true" \
  -F "project=support_team"

Emotion diarization

enable_emotion_diarization=true

Dominant emotion per segment, e.g. angry, neutral. Works with or without speaker diarization.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_diarization=true" \
  -F "enable_emotion_diarization=true"

Response (excerpt)

{
  "segments": [
    { "speaker": "SPEAKER_00", "emotion": "angry", "text": "..." },
    { "speaker": "SPEAKER_01", "emotion": "neutral", "text": "..." }
  ]
}

Intent detection

enable_intent_detection=true

Classify the call into your taxonomy. Optional intent_choices JSON array constrains labels.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_intent_detection=true" \
  -F 'intent_choices=["complaint","inquiry","service_request"]'

Response (excerpt)

{
  "nlp_analysis": {
    "intent": {
      "label": "service_request",
      "confidence": 0.92,
      "reasoning": "Caller is requesting roadside assistance..."
    }
  }
}

Sentiment analysis

enable_sentiment_analysis=true

Overall sentiment of the transcript: label, score (−1 to 1), and a short explanation.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_sentiment_analysis=true"

Response (excerpt)

{
  "nlp_analysis": {
    "sentiment": {
      "label": "negative",
      "score": -0.72,
      "explanation": "Customer expresses frustration..."
    }
  }
}

Summarization

enable_summarization=true

Short summary of the full transcript. Use summary_max_length for an approximate word cap (default 150).

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_summarization=true" \
  -F "summary_max_length=50"

Response (excerpt)

{
  "nlp_analysis": {
    "summary": "Customer called about a vehicle breakdown. Agent confirmed the complaint was registered..."
  }
}

Keyterm normalization

enable_keyterm_normalization=true

Fix domain terms the ASR may spell informally, e.g. emi → EMI. Optional keyterm_keywords glossary.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_keyterm_normalization=true" \
  -F 'keyterm_keywords=["EMI","NACH mandate","bounce charge"]'

Normalized text is in nlp_analysis.normalized_text; the original transcript is unchanged.

Translation (`output_language`)

output_language=en

Translate the full transcript after ASR. ISO code (en) or name (English).

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "output_language=en"

Response (excerpt)

{
  "nlp_analysis": {
    "translation": "Hello, this is an urgent call."
  }
}

Profanity hashing

enable_profanity_hashing=true

Mask profane words with **** in the transcript and every segment.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "enable_profanity_hashing=true"

Custom keyword redaction

hash_keywords=[...]

Regex masking for PII and sensitive phrases, fast, no LLM. Pass a JSON array to hash_keywords.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F 'hash_keywords=["account number","card number","OTP","aadhaar"]'

Response (excerpt)

{
  "text": "आपका **** 4321 है और आपका **** कल भेजा गया था"
}

Word timestamps

word_timestamps=true

Per-word start, end, and confidence on each segment. ONNX alignment, no extra API call.

Show request & response

Request

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" \
  -F "word_timestamps=true"

Response (excerpt)

{
  "segments": [{
    "text": "नमस्ते मोहम्मद जी",
    "words": [
      { "word": "नमस्ते", "start": 0.532, "end": 0.932, "score": 0.85 }
    ]
  }]
}

Combine features

Enable everything you need on one request. Typical contact-centre stack:

curl -X POST https://asr.shunyalabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $SHUNYALABS_API_KEY" \
  -F "file=@call.wav" -F "model=zero-indic" -F "language_code=hi" \
  -F "enable_diarization=true" \
  -F "enable_speaker_identification=true" -F "project=support_team" \
  -F "enable_emotion_diarization=true" -F "word_timestamps=true" \
  -F "enable_intent_detection=true" \
  -F 'intent_choices=["complaint","inquiry","service_request"]' \
  -F "enable_summarization=true" -F "enable_sentiment_analysis=true" \
  -F 'hash_keywords=["account number","card number","OTP"]'

Latency trade-off

NLP features (intent, sentiment, summary, keyterms, translation, profanity) add a Gemini pass on top of ASR. Enable only what you will use in production.