Pipecat integration
pipecat-shunyalabs v1.0.3Native Shunyalabs STT and TTS services for Pipecat pipelines. Drop ShunyalabsSTTService and ShunyalabsTTSService into any Pipecat pipeline and get real-time streaming ASR with 46 speakers across 23 languages - no glue code required.
pipecat-ai[daily]) for WebRTC support.
Installation
Install the package from PyPI:
pip install pipecat-shunyalabs
To include a transport (e.g. Daily WebRTC):
pip install pipecat-shunyalabs pipecat-ai[daily]
Authentication
Set your API key as an environment variable (recommended) or pass it directly to the service classes:
export SHUNYALABS_API_KEY="your-api-key"
stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")
Quick start
A minimal pipeline wiring Shunyalabs STT → OpenAI LLM → Shunyalabs TTS on a local audio transport:
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def main():
transport = LocalAudioTransport()
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="en",
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="en",
style="<Conversational>",
)
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(main())
STT - ShunyalabsSTTService
Real-time streaming speech-to-text over a persistent WebSocket connection. Supports 23 Indian and international languages with optional automatic language detection.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | API key. Falls back to SHUNYALABS_API_KEY env var. |
language | str | "auto" | Language code (e.g. "en", "hi") or "auto" for detection. |
url | str | wss://asr.shunyalabs.ai/ws | WebSocket endpoint URL. |
sample_rate | int | 16000 | Expected audio sample rate in Hz. Must match transport input. |
Frame mapping
| Shunyalabs event | Pipecat frame |
|---|---|
PARTIAL | InterimTranscriptionFrame - emitted continuously as speech is recognised |
FINAL_SEGMENT | TranscriptionFrame - emitted at speech segment boundary |
FINAL | TranscriptionFrame - emitted when full utterance is finalised |
from pipecat_shunyalabs import ShunyalabsSTTService
stt = ShunyalabsSTTService(
language="hi", # Hindi; use "auto" for detection
sample_rate=16000,
)
TTS - ShunyalabsTTSService
Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection and streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages - any speaker can synthesise in any language.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | API key. Falls back to SHUNYALABS_API_KEY env var. |
url | str | wss://tts.shunyalabs.ai/ws | WebSocket endpoint URL. |
model | str | "zero-indic" | TTS model identifier. |
voice | str | "Rajesh" | Speaker voice name. |
style | str | "<Neutral>" | Emotion / delivery style tag. |
language | str | "en" | Output language code. |
output_format | str | "pcm" | Audio encoding - pcm, wav, mp3, ogg_opus, flac, mulaw, alaw. |
speed | float | 1.0 | Speaking speed multiplier (0.25 – 4.0). |
Style tags
| Tag | Description |
|---|---|
<Neutral> | Clean read-speech by default |
<Happy> | Joyful, upbeat tone |
<Sad> | Somber, melancholic tone |
<Angry> | Forceful, intense tone |
<Fearful> | Anxious, trembling tone |
<Surprised> | Exclamatory, astonished tone |
<Disgust> | Repulsed, disapproving tone |
<News> | Formal news-anchor style |
<Conversational> | Casual, everyday speech - recommended for voice agents |
<Narrative> | Storytelling / audiobook delivery |
<Enthusiastic> | Energetic, passionate tone |
Text formatting
The service automatically prepends the style tag before sending to the API:
tts = ShunyalabsTTSService(speaker="Rajesh", style="<Happy>")
# Input: "Welcome!"
# Sent: "<Happy> Welcome!"
Full pipeline example
A complete voice agent with Shunyalabs STT and TTS, OpenAI LLM, and the Daily WebRTC transport:
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def run_voice_agent(room_url: str, token: str):
transport = DailyTransport(
room_url, token, "Shunyalabs Agent",
DailyParams(audio_out_enabled=True, transcription_enabled=False),
)
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="auto",
sample_rate=16000,
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
messages = [{"role": "system", "content": "You are a helpful voice assistant."}]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="hi",
style="<Conversational>",
)
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([context_aggregator.user().get_context_frame()])
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(run_voice_agent(
room_url=os.environ["DAILY_ROOM_URL"],
token=os.environ["DAILY_TOKEN"],
))
Multilingual example
# Hindi conversational bot
tts = ShunyalabsTTSService(voice="Rajesh", language="hi", style="<Conversational>")
# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")
Error reference
| Exception | HTTP code | Description |
|---|---|---|
AuthenticationError | 401 | Invalid or missing API key. |
PermissionDeniedError | 403 | API key lacks permission for the resource. |
RateLimitError | 429 | Rate limit exceeded. Implement exponential backoff. |
ServerError | 5xx | Server-side error. Retried automatically. |
TimeoutError | — | Request exceeded timeout (default 60 s). |
TranscriptionError | — | ASR-specific failure (e.g. unsupported audio format). |
SynthesisError | — | TTS-specific failure (e.g. invalid voice parameter). |
Troubleshooting
| Symptom | Resolution |
|---|---|
AuthenticationError on startup | Verify SHUNYALABS_API_KEY is set and valid. |
| WebSocket connection refused | Ensure outbound WSS (port 443) is open to asr.shunyalabs.ai and tts.shunyalabs.ai. |
| No transcription output | Check sample_rate matches your transport input. Verify audio source is active. |
| TTS audio silent or missing | Ensure output_format=pcm matches transport output. Verify TTSStartedFrame is received. |
| High latency on first TTS chunk | Deploy closer to the Shunyalabs gateway region (asia-south1). |
ImportError: pipecat_shunyalabs | Run pip install pipecat-shunyalabs and confirm your virtual environment is activated. |