Speaker Diarization

Speaker diarization automatically separates different speakers in an audio recording, labeling each segment with a speaker tag (e.g., SPEAKER_00, SPEAKER_01). This tells you who spoke when in conversations, meetings, or interviews.

You can also pre-identify speakers and customize speaker tags to match your context so transcripts automatically recognize known speakers. Learn more in speaker identification.

How to Enable

"enable_diarization": "true"

Full Speaker Diarization Request

Don’t forget to replace YOUR_API_KEY with your own secret key.
import requests

url = "https://tb2.shunyalabs.ai/v1/transcriptions"
headers = {"X-API-Key": "your_api_key_here"}

with open("your_audio.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "enable_diarization": "true"
    }

    response = requests.post(
        url,
        headers=headers,
        files=files,
        data=data
    )

result = response.json()
print(result["text"])

Example Output

{
  "success": true,
  "text": "Hello, thank you for calling customer support. How can I help you today? Hi, yes, I'm having trouble with my account login. I keep getting an error message. I'm sorry to hear that. Let me pull up your account and see what's going on.",
  "segments": [
    {
      "start": 0.0,
      "end": 5.5,
      "text": "Hello, thank you for calling customer support. How can I help you today?",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 6.0,
      "end": 12.3,
      "text": "Hi, yes, I'm having trouble with my account login. I keep getting an error message.",
      "speaker": "SPEAKER_01"
    },
    {
      "start": 12.8,
      "end": 14.5,
      "text": "I'm sorry to hear that.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 14.6,
      "end": 18.9,
      "text": "Let me pull up your account and see what's going on.",
      "speaker": "SPEAKER_00"
    }
  ]
}

For custom speaker labels, refer to speaker identification.

Use Cases

  • Meeting transcriptions with participant-level attribution
  • Interview analysis (interviewer vs interviewee)
  • Customer support calls (agent vs customer)
  • Podcast and panel discussions with multiple speakers
  • Legal recordings and courtroom proceedings