Pipecat integration

pipecat-shunyalabs v1.0.3

Native Shunyalabs STT and TTS services for Pipecat pipelines. Drop ShunyalabsSTTService and ShunyalabsTTSService into any Pipecat pipeline and get real-time streaming ASR with 46 speakers across 23 languages - no glue code required.

Requirements

Python 3.9+, Pipecat framework, and a valid Shunyalabs API key. Install a Pipecat transport (e.g. pipecat-ai[daily]) for WebRTC support.

Installation

Install the package from PyPI:

Terminal

pip install pipecat-shunyalabs

To include a transport (e.g. Daily WebRTC):

Terminal

pip install pipecat-shunyalabs pipecat-ai[daily]

Authentication

Set your API key as an environment variable (recommended) or pass it directly to the service classes:

Environment variable

export SHUNYALABS_API_KEY="your-api-key"

Python — inline

stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")

Security

Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.

Quick start

A minimal pipeline wiring Shunyalabs STT → OpenAI LLM → Shunyalabs TTS on a local audio transport:

Python

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def main():
    transport = LocalAudioTransport()

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="en",
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="en",
        style="<Conversational>",
    )

    pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(main())

STT - `ShunyalabsSTTService`

Real-time streaming speech-to-text over a persistent WebSocket connection. Supports 23 Indian and international languages with optional automatic language detection.

Parameters

Parameter	Type	Default	Description
`api_key`	`str`	`None`	API key. Falls back to `SHUNYALABS_API_KEY` env var.
`language`	`str`	`"auto"`	Language code (e.g. `"en"`, `"hi"`) or `"auto"` for detection.
`url`	`str`	`wss://asr.shunyalabs.ai/ws`	WebSocket endpoint URL.
`sample_rate`	`int`	`16000`	Expected audio sample rate in Hz. Must match transport input.

Frame mapping

Shunyalabs event	Pipecat frame
`PARTIAL`	`InterimTranscriptionFrame` - emitted continuously as speech is recognised
`FINAL_SEGMENT`	`TranscriptionFrame` - emitted at speech segment boundary
`FINAL`	`TranscriptionFrame` - emitted when full utterance is finalised

Python — STT example

from pipecat_shunyalabs import ShunyalabsSTTService

stt = ShunyalabsSTTService(
    language="hi",       # Hindi; use "auto" for detection
    sample_rate=16000,
)

Auto-reconnect

If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.

TTS - `ShunyalabsTTSService`

Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection and streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages - any speaker can synthesise in any language.

Parameters

Parameter	Type	Default	Description
`api_key`	`str`	`None`	API key. Falls back to `SHUNYALABS_API_KEY` env var.
`url`	`str`	`wss://tts.shunyalabs.ai/ws`	WebSocket endpoint URL.
`model`	`str`	`"zero-indic"`	TTS model identifier.
`voice`	`str`	`"Rajesh"`	Speaker voice name.
`style`	`str`	`"<Neutral>"`	Emotion / delivery style tag.
`language`	`str`	`"en"`	Output language code.
`output_format`	`str`	`"pcm"`	Audio encoding - `pcm`, `wav`, `mp3`, `ogg_opus`, `flac`, `mulaw`, `alaw`.
`speed`	`float`	`1.0`	Speaking speed multiplier (0.25 – 4.0).

Style tags

Tag	Description
`<Neutral>`	Clean read-speech by default
`<Happy>`	Joyful, upbeat tone
`<Sad>`	Somber, melancholic tone
`<Angry>`	Forceful, intense tone
`<Fearful>`	Anxious, trembling tone
`<Surprised>`	Exclamatory, astonished tone
`<Disgust>`	Repulsed, disapproving tone
`<News>`	Formal news-anchor style
`<Conversational>`	Casual, everyday speech - recommended for voice agents
`<Narrative>`	Storytelling / audiobook delivery
`<Enthusiastic>`	Energetic, passionate tone

Text formatting

The service automatically prepends the style tag before sending to the API:

Python

tts = ShunyalabsTTSService(speaker="Rajesh", style="<Happy>")
# Input:  "Welcome!"
# Sent:   "<Happy> Welcome!"

Full pipeline example

A complete voice agent with Shunyalabs STT and TTS, OpenAI LLM, and the Daily WebRTC transport:

Python

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def run_voice_agent(room_url: str, token: str):
    transport = DailyTransport(
        room_url, token, "Shunyalabs Agent",
        DailyParams(audio_out_enabled=True, transcription_enabled=False),
    )

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="auto",
        sample_rate=16000,
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    messages = [{"role": "system", "content": "You are a helpful voice assistant."}]
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="hi",
        style="<Conversational>",
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
        await task.queue_frames([context_aggregator.user().get_context_frame()])

    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(run_voice_agent(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
    ))

Multilingual example

Python

# Hindi conversational bot
tts = ShunyalabsTTSService(voice="Rajesh", language="hi", style="<Conversational>")

# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")

Error reference

Exception	HTTP code	Description
`AuthenticationError`	401	Invalid or missing API key.
`PermissionDeniedError`	403	API key lacks permission for the resource.
`RateLimitError`	429	Rate limit exceeded. Implement exponential backoff.
`ServerError`	5xx	Server-side error. Retried automatically.
`TimeoutError`	—	Request exceeded timeout (default 60 s).
`TranscriptionError`	—	ASR-specific failure (e.g. unsupported audio format).
`SynthesisError`	—	TTS-specific failure (e.g. invalid voice parameter).

Troubleshooting

Symptom	Resolution
`AuthenticationError` on startup	Verify `SHUNYALABS_API_KEY` is set and valid.
WebSocket connection refused	Ensure outbound WSS (port 443) is open to `asr.shunyalabs.ai` and `tts.shunyalabs.ai`.
No transcription output	Check `sample_rate` matches your transport input. Verify audio source is active.
TTS audio silent or missing	Ensure `output_format=pcm` matches transport output. Verify `TTSStartedFrame` is received.
High latency on first TTS chunk	Deploy closer to the Shunyalabs gateway region (`asia-south1`).
`ImportError: pipecat_shunyalabs`	Run `pip install pipecat-shunyalabs` and confirm your virtual environment is activated.

Pipecat integration

Installation

Authentication

Quick start

STT - ShunyalabsSTTService

Parameters

Frame mapping

TTS - ShunyalabsTTSService

Parameters

Style tags

Text formatting

Full pipeline example

Multilingual example

Error reference

Troubleshooting

STT - `ShunyalabsSTTService`

TTS - `ShunyalabsTTSService`