LiveKit integration
livekit-plugins-shunyalabs v1.0.1Official Shunyalabs plugin for LiveKit Agents. Plug shunyalabs.STT and shunyalabs.TTS directly into a LiveKit AgentSession - supports real-time streaming transcription, batch recognition, and high-fidelity multilingual voice synthesis.
Installation
pip install livekit-plugins-shunyalabs
Authentication
Set your API key as an environment variable or pass it directly to the plugin classes:
export SHUNYALABS_API_KEY="your-api-key"
stt = shunyalabs.STT(api_key="your-api-key")
tts = shunyalabs.TTS(api_key="your-api-key")
Quick start
The minimal wiring - pass shunyalabs.STT and shunyalabs.TTS into an AgentSession:
from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero
session = AgentSession(
stt=shunyalabs.STT(language="en"),
tts=shunyalabs.TTS(speaker="Rajesh", style="<Neutral>"),
vad=silero.VAD.load(),
)
STT - shunyalabs.STT
Streaming and batch speech-to-text backed by the Shunyalabs ASR gateway. Audio frames from LiveKit are forwarded over WebSocket; transcription events are pushed back as SpeechEvents.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | API key. Falls back to SHUNYALABS_API_KEY env var. |
language | str | "auto" | BCP-47 language code or "auto" for detection. |
api_url | str | https://asr.shunyalabs.ai | REST batch endpoint base URL. |
ws_url | str | wss://asr.shunyalabs.ai/ws | WebSocket streaming endpoint URL. |
Capabilities
| Capability | Supported |
|---|---|
| Streaming (real-time) | Yes |
| Interim results | Yes |
| Offline / batch recognition | Yes |
Streaming STT
Real-time transcription over WebSocket with event mapping to LiveKit's SpeechEventType:
| Shunyalabs event | LiveKit SpeechEventType |
|---|---|
PARTIAL | INTERIM_TRANSCRIPT |
FINAL_SEGMENT | FINAL_TRANSCRIPT + END_OF_SPEECH |
FINAL | FINAL_TRANSCRIPT + RECOGNITION_USAGE |
from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero
session = AgentSession(
stt=shunyalabs.STT(language="en"),
vad=silero.VAD.load(),
)
@session.on("user_speech_committed")
def on_speech(ev):
print(f"User said: {ev.transcript}")
Batch STT
Single-shot transcription of an audio buffer via POST /v1/audio/transcriptions:
from livekit.plugins import shunyalabs
stt = shunyalabs.STT(language="en")
# Inside an agent context:
event = await stt.recognize(audio_buffer)
print(event.alternatives[0].text)
TTS - shunyalabs.TTS
Streaming and chunked text-to-speech. Token-by-token streaming collects text then synthesises on flush via WebSocket; the batch API handles single-shot synthesis over HTTP.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | API key. Falls back to SHUNYALABS_API_KEY env var. |
api_url | str | https://tts.shunyalabs.ai | HTTP batch endpoint base URL. |
ws_url | str | wss://tts.shunyalabs.ai/ws | WebSocket streaming endpoint URL. |
model | str | "zero-indic" | TTS model name. |
voice | str | "Rajesh" | Voice name for the API. |
speaker | str | "Rajesh" | Speaker name prefix for text formatting. |
style | str | "<Neutral>" | Emotion style tag. |
language | str | "en" | Language code for transliteration. |
sample_rate | int | 16000 | Output audio sample rate in Hz. |
output_format | str | "pcm" | Audio format: pcm, wav, mp3, ogg_opus, flac. |
speed | float | 1.0 | Speaking speed multiplier (0.25 – 4.0). |
Style tags
| Tag | Description |
|---|---|
<Neutral> | Neutral tone by default |
<Happy> | Happy / cheerful |
<Sad> | Sad / melancholic |
<Angry> | Angry / intense |
<Fearful> | Fearful / anxious |
<Surprised> | Surprised / excited |
<Disgust> | Disgusted |
<News> | News anchor style |
<Conversational> | Casual conversational - recommended for voice agents |
<Narrative> | Storytelling / narration |
<Enthusiastic> | Enthusiastic / energetic |
Text formatting
The plugin automatically prepends the style tag before sending text to the API:
tts = shunyalabs.TTS(speaker="Rajesh", style="<Happy>")
# Input: "Welcome to our platform"
# Sent: "<Happy> Welcome to our platform"
Streaming TTS example
from livekit.agents import AgentSession
from livekit.plugins import shunyalabs
session = AgentSession(
tts=shunyalabs.TTS(
speaker="Nisha",
style="<Conversational>",
model="zero-indic",
voice="Nisha",
),
)
Chunked (batch) TTS example
from livekit.plugins import shunyalabs
tts = shunyalabs.TTS(speaker="Varun", voice="Varun")
stream = tts.synthesize("Hello, how can I help you today?")
Full agent example
import asyncio
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import shunyalabs, silero
class MyAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a helpful voice assistant.")
async def entrypoint(ctx):
session = AgentSession(
stt=shunyalabs.STT(language="auto"),
tts=shunyalabs.TTS(
model="zero-indic",
voice="Rajesh",
speaker="Rajesh",
style="<Conversational>",
),
vad=silero.VAD.load(),
)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_input_options=RoomInputOptions(),
)
Multilingual example
# Hindi speaker
tts_hindi = shunyalabs.TTS(
speaker="Rajesh", voice="Rajesh",
language="hi", style="<Neutral>",
)
# English speaker
tts_english = shunyalabs.TTS(
speaker="Varun", voice="Varun",
language="en", style="<Conversational>",
)