Frequently asked questions

Quick answers to the most common questions about Shunya Labs - from getting started and API usage to deployment and billing. Click any question to reveal the answer.

Getting Started Speech-to-Text Text-to-Speech API & Integration Deployment Billing & Support

Getting Started & Account

Setting up your account, getting API keys, and troubleshooting first-use issues.

?How do I get an API key?

Sign up at accounts.shunyalabs.ai to get your free API key. Once logged in, navigate to the API Keys section of your dashboard. You can create a new key with a single click. Each key comes with a free tier quota so you can start building immediately.

Your API key is a bearer token. Include it in every request via the Authorization: Bearer <your-key> header. Keep it secure — do not expose it in client-side code or public repositories.

?I'm getting "401 Unauthorized" errors. What's wrong?

A 401 error means your API key is missing, invalid, or expired. Check the following:

You are sending the Authorization: Bearer <key> header
The key has not been revoked from your dashboard
Your key has not expired (trial keys have a validity period)
There are no extra spaces or newlines in the header value

If the problem persists, generate a new key from your dashboard and try again.

?Can I use the API for free?

Yes! Every new account gets a free tier with monthly credits for STT, TTS, and Translation. The free tier is designed for prototyping, testing, and small-scale production use. Check the Quickstart guide for current free-tier limits.

When you outgrow the free tier, you can upgrade to a paid plan from your dashboard. There are no surprise charges — you set the budget.

?What languages are supported?

Shunya Labs covers 204 languages for STT, 23 languages with 46 voices for TTS, and 2,970 translation pairs across 55+ Indian languages through Vāķ Translate.

See the Supported languages page for the full list. All languages are available on both cloud and on-prem deployments.

?How do I sign up for a Shunya Labs account?

Signing up for a Shunya Labs account is straightforward. Visit accounts.shunyalabs.ai and click the Create Account button. You can register using your email address or sign in with Google or GitHub for a faster setup.

After submitting the registration form, you will receive a verification email. Click the verification link to activate your account. Once verified, you can log in and immediately access the dashboard, where you will find your free API key already generated for you. The free tier includes monthly credits for STT, TTS, and Translation, so you can start experimenting right away without adding a payment method.

?I didn't receive the verification email. What should I do?

If you did not receive the verification email, first check your spam or promotions folder, as the email may have been filtered. Add noreply@shunyalabs.ai to your address book or safe senders list to prevent future emails from being blocked.

If the email is not in your spam folder, you can request a new verification link by logging into your account at accounts.shunyalabs.ai and clicking Resend verification email. If you still do not receive it, contact support through the dashboard chat or by visiting the contact page. Make sure you entered the correct email address during registration.

?Can I have multiple API keys?

Yes, you can create multiple API keys from your dashboard. Navigate to the API Keys section and click Create New Key. Each key can be given a descriptive name so you can track which application or environment it is used for — for example, production-app, staging-server, or local-dev.

Having multiple keys makes it easier to rotate credentials without downtime. If a key is compromised, you can revoke it individually without affecting your other keys. You can also set per-key rate limits and usage quotas in the dashboard for finer-grained control over your API consumption.

?How do I reset my password?

To reset your password, go to the login page at accounts.shunyalabs.ai and click Forgot Password. Enter the email address associated with your account and click Send Reset Link. You will receive an email with a password reset link that expires within one hour.

Click the link and enter your new password. Choose a strong password with at least eight characters, including uppercase and lowercase letters, numbers, and special characters. If you did not receive the reset email, check your spam folder or contact support for assistance.

?I'm seeing "403 Forbidden". What does that mean?

A 403 Forbidden error indicates that your API key is valid but does not have permission to access the requested resource. This can happen if your plan does not include access to a specific model or endpoint. For example, a free-tier key cannot access enterprise-only models or on-prem management endpoints.

Check the Error codes reference for details on what each status code means. If you believe you should have access, verify that you are using the correct endpoint URL and that your API key has not been restricted to specific IP addresses or referrer domains. Contact support if the issue persists.

?How do I view my usage and quota?

Your usage and quota information is available in the Dashboard under the Usage tab. Here you can see a real-time breakdown of your consumption across STT, TTS, and Translation services, including the number of API calls made, audio seconds processed, and characters synthesized for the current billing period.

The dashboard also displays your remaining quota for the free tier or your paid plan limits. You can set monthly budget caps and receive email notifications when you reach 50%, 80%, and 100% of your budget. For programmatic access, the GET /v1/usage endpoint returns your current usage data in JSON format, which you can integrate into your own monitoring systems.

?How do I navigate the dashboard?

The Shunya Labs dashboard is organized into several key sections accessible from the left sidebar. The Overview page shows a summary of your recent API activity, quota usage, and quick links to documentation. The API Keys section lets you create, name, and revoke keys, while the Usage tab provides detailed billing and consumption metrics.

The Playground section allows you to test STT, TTS, and Translation endpoints interactively without writing any code. You can upload audio files, adjust parameters, and see live results. The Settings page lets you manage your profile, notification preferences, and billing information. If you need help at any point, click the Help icon in the top-right corner to access the documentation or start a support chat.

?Why can't I access the playground?

The playground is available to all authenticated users with an active account. If you cannot access it, first ensure that you are logged in and that your account has been verified. Some free-tier accounts may have restricted playground access if the account was flagged for unusual activity or if the free-trial period has ended without upgrading.

Try clearing your browser cache and cookies, then log in again. If the issue persists, check that your browser supports modern JavaScript and WebSocket connections. The playground requires a stable internet connection. For enterprise or on-prem deployments, the playground may be disabled by your administrator for security reasons — contact your IT team in that case.

?How do I select the right model for my use case?

Shunya Labs offers several model categories optimized for different use cases. For STT, use Universal for general-purpose transcription across 204 languages, Indic for best accuracy on Indian languages, and Med for healthcare-domain transcription with medical terminology support. Each model is available in both batch and streaming variants.

For TTS, shunya-multilingual-1 supports 23 languages with natural prosody, while shunya-indic-1 provides the highest quality for Indian languages. The Express variants offer lower latency for real-time applications. Review the Capability matrix page for a complete comparison of accuracy, latency, and pricing across all available models.

?How do I set up my development environment?

Setting up your development environment is quick. Install the Shunya Labs Python SDK with pip install shunyalabs. For Node.js or TypeScript, use npm install shunyalabs. If you prefer to use the REST API directly, any HTTP client like curl, Python requests, or fetch in JavaScript will work.

Store your API key as an environment variable for security: export SHUNYA_API_KEY="your-key". Then create a client instance:

from shunyalabs import Shunya
client = Shunya()  # reads SHUNYA_API_KEY from env

See the Quickstart guide for step-by-step setup instructions in Python, Node.js, and cURL.

?How do I enable two-factor authentication?

Two-factor authentication (2FA) adds an extra layer of security to your Shunya Labs account. To enable it, go to Settings > Security > Two-Factor Authentication in the dashboard. You can choose between an authenticator app (like Google Authenticator or Authy) or SMS-based codes.

Once enabled, you will be prompted for a one-time code from your authenticator app each time you log in, in addition to your password. We recommend using an authenticator app for the highest security. If you lose access to your 2FA device, you can use the backup codes provided during setup to regain access. Store these codes in a secure location. Contact support if you need to disable 2FA without access to your authenticator.

?What browsers are supported for the dashboard?

The Shunya Labs dashboard supports the latest two major versions of the following browsers: Google Chrome, Mozilla Firefox, Apple Safari, and Microsoft Edge. Chrome and Firefox provide the best experience, especially for the Playground and real-time features that use WebSocket connections.

For the best experience, ensure your browser is up to date and that JavaScript is enabled. The dashboard uses modern CSS features like CSS Grid, custom properties, and color-mix(), which require a recent browser version. Internet Explorer is not supported. If you encounter display issues, try clearing your browser cache or disabling extensions that might interfere with page rendering.

?How do I manage team members and permissions?

Team management is available on paid and Enterprise plans. To add team members, go to Settings > Team in the dashboard. You can invite members by email address. Each member can be assigned a role: Admin (full access to all settings and billing), Developer (can create and manage API keys, view usage), or Viewer (read-only access to usage and settings).

Invited members receive an email with a link to join. You can also manage API key access per team member, set per-member usage quotas, and view each member's API activity in the audit log. For Enterprise plans, you can integrate with your organization's SAML/SSO provider for centralized identity management. Team members consume from the same billing account — usage is aggregated under your plan.

?How do I update my profile information?

To update your profile information, log in to the dashboard and navigate to Settings > Profile. Here you can change your display name, email address, timezone, and notification preferences. If you change your email address, you will need to verify the new email by clicking a confirmation link sent to the new address.

Your profile settings also include options for preferred language for the dashboard interface and default API parameters. Changes to your profile take effect immediately. If you are using SSO (Enterprise plans), profile information is synchronized from your identity provider and cannot be edited directly in the dashboard. Contact your IT administrator to update SSO-managed profile fields.

?What is the difference between a model and an endpoint?

A model is the underlying AI engine that performs the task — for example, shunya-universal-1 for STT or shunya-multilingual-1 for TTS. Each model has specific capabilities, accuracy characteristics, and supported languages. An endpoint is the API URL through which you access the model, such as /v1/audio/transcriptions for batch STT or /v1/audio/synthesize for TTS.

Multiple models can be accessed through the same endpoint by changing the model parameter. Similarly, the same model may be accessible through different endpoints (batch vs. streaming). When selecting a model for your use case, consider factors like accuracy, latency, language support, and pricing. The Capability matrix page provides a comparison of all available models and their characteristics.

Speech-to-Text (STT)

Issues with transcription accuracy, audio formats, streaming, and batch processing.

?Why is my transcript inaccurate?

Transcription accuracy depends on several factors. Here are the most common fixes:

Audio quality: Ensure the recording is clear, with minimal background noise. Use a sample rate of 16 kHz for best results.
Language selection: Always set the language parameter to the correct language code. Auto-detection works well but explicit language tags improve accuracy.
Domain-specific terms: Use the keyterm_normalization parameter to pass domain vocabulary.
Audio length: Very short clips (< 1 second) may not produce reliable results. Aim for 3+ seconds of speech per utterance.

Our composite WER is 3.10% on standard benchmarks, but real-world results vary with audio conditions.

?What audio formats are supported for STT?

We support the following audio formats for STT:

WAV — PCM, 16-bit, mono/stereo (preferred)
MP3 — compressed, good for file uploads
FLAC — lossless compression
OGG/Opus — streaming-friendly
WebM — web recording output
Raw PCM — via raw format with sample rate
μ-law / A-law (G.711) — for telephony audio at 8 kHz

For streaming, we recommend 16-bit PCM at 16 kHz in mono for the lowest latency and best accuracy.

?How do I transcribe a long audio file?

Use the batch/async endpoint (POST /v1/audio/transcriptions) for files longer than a few minutes. The batch endpoint accepts files up to 500 MB and returns a transcript once processing is complete.

For very large volumes, consider splitting the audio into segments and processing them in parallel. See the ASR overview for detailed guidance.

?Why is streaming STT not working?

Streaming STT uses a WebSocket connection. Common issues include:

Wrong endpoint: Use wss://api.shunyalabs.ai/v1/audio/transcribe, not the HTTP URL
Auth in query: Send the API key as a query parameter: ?authorization=bearer <key>
Audio format mismatch: Send 16-bit PCM at 16 kHz, mono. Check your microphone and encoder settings
WebSocket closed prematurely: Keep the connection alive by sending audio data continuously. Silence longer than 30 seconds may timeout
Firewall/proxy: Ensure your network allows WebSocket (wss://) connections on port 443

Refer to the Streaming guide for code examples in Python and JavaScript.

?How do I improve accuracy for domain-specific terms?

Use the keyterm normalization feature to boost accuracy for domain-specific vocabulary. Pass a list of terms your model should recognize:

{
  "keyterm_normalization": ["myocardial infarction", "HIPAA", "RTFx", "Vāķ"]
}

This works with both batch and streaming STT. For best results, include common variations, abbreviations, and proper nouns relevant to your domain.

?What is the difference between Universal, Indic, and Med models?

Shunya Labs offers three model families optimized for different use cases. Universal models are general-purpose and support all 204 languages with broad coverage, making them suitable for media transcription, meeting notes, and content creation across diverse languages. They provide good accuracy for most common scenarios.

Indic models are fine-tuned specifically for Indian languages, delivering higher accuracy on languages like Hindi, Tamil, Telugu, Bengali, Marathi, and Gujarati. They handle code-switching, regional accents, and Indian English more effectively than Universal models. Med models are specialized for healthcare and medical transcription, with enhanced recognition of medical terminology, drug names, and clinical abbreviations. Select the model that best matches your domain via the model parameter in your API request.

?How do I enable diarization (speaker labels)?

Diarization labels each segment of the transcript with the speaker's identity. To enable it, set the diarize parameter to true in your request. You can optionally specify the expected number of speakers using the num_speakers parameter, which can improve accuracy.

{
  "audio": "meeting.wav",
  "diarize": true,
  "num_speakers": 3,
  "language": "en"
}

The response includes a speaker field for each segment, such as SPEAKER_00, SPEAKER_01, etc. Diarization works best with audio that has minimal crosstalk and consistent speaker separation.

?Why is punctuation and capitalization missing in my transcript?

By default, Shunya Labs STT returns raw transcription without punctuation or capitalization. To enable these features, set the punctuation parameter to true in your request. This activates a post-processing step that adds periods, commas, question marks, and proper capitalization to the transcript.

{
  "audio": "speech.wav",
  "punctuation": true,
  "language": "en"
}

Punctuation is available for most major languages and works with both batch and streaming endpoints. Note that enabling punctuation adds minimal latency (typically under 100 ms). For languages that do not use Latin script, capitalization may not apply, but sentence boundary detection and appropriate punctuation marks will still be added.

?How do I get word-level timestamps?

Word-level timestamps provide start and end times for each word in the transcript. Enable them by setting word_timestamps to true in your STT request:

{
  "audio": "speech.wav",
  "word_timestamps": true,
  "language": "en"
}

The response includes an array of word objects, each with word, start, and end fields (in seconds):

{
  "words": [
    {"word": "Hello", "start": 0.12, "end": 0.35},
    {"word": "world", "start": 0.36, "end": 0.58}
  ]
}

Word timestamps are useful for subtitle generation, audio alignment, and interactive applications where you need to highlight words as they are spoken.

?Why is language auto-detection not working correctly?

Language auto-detection works by analyzing the acoustic and phonetic features of the audio to identify the spoken language. It performs best with at least 10–15 seconds of continuous speech. Short utterances, music, background noise, or very quiet recordings can cause incorrect detection.

To improve accuracy, explicitly set the language parameter whenever you know the language of the audio. If you expect code-switching or multiple languages within the same audio, you can enable multilingual mode by setting language: "auto". For the most reliable results, consider segmenting your audio by language before sending it to the API.

?How do I transcribe audio with multiple speakers?

To transcribe audio with multiple speakers, enable diarization by setting diarize: true and optionally specify num_speakers. The diarization engine analyzes voice characteristics to distinguish between different speakers and assigns labels like SPEAKER_00, SPEAKER_01, etc. to each segment.

{
  "audio": "meeting.wav",
  "diarize": true,
  "num_speakers": 4,
  "language": "en"
}

For the best multi-speaker transcription results, use a good-quality microphone and minimize overlapping speech. The system can handle up to 10 distinct speakers per audio file.

?What is VAD and how do I configure it?

Voice Activity Detection (VAD) is a preprocessing step that identifies segments of audio containing speech and filters out silence or non-speech noise. Shunya Labs uses VAD to improve transcription accuracy by only processing the portions of audio that actually contain speech.

You can configure VAD behavior using the vad_filter parameter. Set vad_filter: true to enable it, and adjust sensitivity with vad_threshold (default: 0.5, range: 0.0–1.0). You can also set vad_min_speech_duration_ms and vad_min_silence_duration_ms to fine-tune behavior. VAD is configured through the vad_filter parameter in your STT request. Refer to the ASR configuration page for details.

?How do I get a translated transcript along with the transcription?

You can receive both a transcription and translation in a single API call by using the translate parameter. Set translate: true and specify the target language with target_language. The response will include both the original transcript and the translated text.

{
  "audio": "hindi_speech.wav",
  "language": "hi",
  "translate": true,
  "target_language": "en"
}

The response returns a transcript field with the original text and a translation field with the translated text. This feature is available for all language pairs supported by Vāķ Translate.

?What confidence score should I expect?

Confidence scores range from 0.0 to 1.0 and indicate how likely the model considers its transcription to be correct. Scores above 0.80 are generally reliable, while scores below 0.60 indicate uncertain transcription that may contain errors.

For clean audio with a clear speaker and minimal noise, scores of 0.90–0.99 are typical. For challenging audio — heavy accents, noisy environments, or overlapping speech — scores may drop to 0.50–0.80. Use the confidence score as a signal to flag low-quality segments for human review.

?My audio file is too large. What are the limits?

Shunya Labs batch STT accepts audio files up to 500 MB in size, with a maximum duration of 8 hours per file. If your file exceeds either limit, you need to split it into smaller segments before uploading.

To split large audio files, use tools like ffmpeg:

ffmpeg -i large_audio.wav -f segment -segment_time 600 -c copy chunk_%03d.wav

This command splits the audio into 10-minute chunks. You can then process the chunks in parallel using multiple API requests for faster throughput.

?How do I transcribe streaming audio from a microphone?

To transcribe streaming audio from a microphone, use the WebSocket endpoint with audio captured from the user's microphone. In a browser, use the MediaRecorder API to capture audio chunks and send them over the WebSocket connection.

import pyaudio
import asyncio
import websockets

async def stream_microphone():
    uri = "wss://api.shunyalabs.ai/v1/audio/transcribe?authorization=bearer YOUR_KEY"
    async with websockets.connect(uri) as ws:
        p = pyaudio.PyAudio()
        stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
        while True:
            data = stream.read(4096)
            await ws.send(data)
            result = await ws.recv()
            print(result)

See the Microphone streaming guide for complete browser and Python examples.

?What is the difference between real-time and batch transcription?

Real-time (streaming) transcription processes audio as it is being captured, providing partial results with very low latency (typically 200–500 ms). It is ideal for live captioning, voice assistants, and real-time meeting transcription.

Batch transcription processes a complete audio file and returns the full transcript. It has higher accuracy because the model can analyze the entire audio context, and it supports advanced features like diarization, word-level timestamps, and keyterm normalization. Use the POST /v1/audio/transcriptions endpoint for batch and the WebSocket endpoint for streaming.

?How do I handle multiple languages in a single audio file?

To transcribe audio that contains multiple languages, set the language parameter to "auto" to enable multilingual mode. The model will automatically detect language switches and transcribe each segment in the appropriate language.

{
  "audio": "multilingual_speech.wav",
  "language": "auto",
  "model": "shunya-universal-1"
}

The Universal model is recommended for multilingual audio as it has the broadest language coverage. The response includes a language field for each transcribed segment.

?How do I transcribe audio in real time for live captions?

For live captioning, use the WebSocket streaming STT endpoint. The streaming endpoint processes audio as it is received and returns partial transcription results that update as more audio is processed. Connect to wss://api.shunyalabs.ai/v1/audio/transcribe and send audio chunks continuously.

Configure parameters like language, punctuation, and interim_results to get partial results for real-time display. Set interim_results: true to receive both interim and final results. For the best live captioning experience, use the Express model variant.

?Can I use Shunya STT for call center analytics?

Yes, Shunya Labs STT is well-suited for call center analytics. The batch transcription endpoint can process recorded calls, and the streaming endpoint can transcribe live calls. Key features include diarization (distinguishing agent from customer), keyterm normalization, and sentiment analysis.

For telephony audio (8 kHz, mu-law or A-law), set the sample_rate to 8000 and use response_format: "mulaw". The Indic and Med models are particularly effective for Indian language call centers and healthcare support lines respectively.

Text-to-Speech (TTS)

Voice quality, expression styles, cloning, and supported output formats.

?Why does the generated speech sound robotic?

Robotic-sounding speech is usually caused by one of the following:

Missing expression style: Add an <expression> tag like <Conversational> or <Happy>
Wrong voice selection: Try shunya-indic-1 or shunya-multilingual-1
Very short text: Single words or very short phrases may sound clipped
Unusual characters: URLs, email addresses, and mixed-script text can cause pronunciation issues

?How do I use expression styles?

Wrap the text with an expression tag: <Happy>This is great!</Happy>. Available styles include:

<Conversational> — natural, everyday speech
<Happy> — upbeat and cheerful
<News> — broadcast-style delivery
<Sad> — softer, slower tone
<Whisper> — hushed voice

Tags are applied per sentence and are not spoken aloud. See the Expression styles guide for the complete list.

?How does voice cloning work?

Voice cloning creates a custom voice from a short audio sample (3–6 seconds). Upload a clean recording of the target speaker's voice, and the API can generate new speech in that same voice.

For best results, use a sample with clear speech, minimal background noise, and consistent volume. The cloning endpoint is available via POST /v1/audio/clone. Cloned voices are ephemeral by default but can be persisted for your account.

Refer to the Voice cloning guide for step-by-step instructions.

?What audio formats are supported for TTS output?

Set the response_format parameter to choose your output format:

mp3 — compressed, widely compatible (default)
wav — uncompressed PCM, highest quality
opus — streaming-optimized
mulaw — μ-law, for telephony at 8 kHz
alaw — A-law, for telephony at 8 kHz
flac — lossless compression
pcm — raw PCM data

All formats are available for both batch and streaming TTS. See the Audio formats reference for details.

?How do I list all available voices?

You can list all available voices by calling the GET /v1/audio/voices endpoint. The response returns an array of voice objects, each containing the voice ID, name, supported languages, gender, and available expression styles.

curl -H "Authorization: Bearer $SHUNYA_API_KEY" \
  "https://api.shunyalabs.ai/v1/audio/voices"

The response includes details for all pre-built voices and any custom cloned voices associated with your account. Refer to the Voices reference page for the latest list.

?Can I use SSML tags in my TTS input?

Yes, Shunya Labs TTS supports a subset of SSML (Speech Synthesis Markup Language) tags for fine-grained control over speech output. You can use tags like <break> for pauses, <prosody> for rate and pitch, <say-as> for number and date interpretation, and <phoneme> for custom pronunciation.

<speak>
  <prosody rate="slow" pitch="+2st">
    Welcome to Shunya Labs.
    <break time="500ms"/>
  </prosody>
</speak>

To use SSML, set the input_type parameter to ssml in your request. SSML is supported when input_type is set to ssml. Refer to the Expression styles guide for available speech controls.

?How do I generate speech in multiple languages with the same voice?

Multilingual voices can synthesize speech in multiple languages using the same voice identity. shunya-multilingual-1 supports all 23 TTS languages.

{
  "text": "Hello, how are you?",
  "voice": "shunya-multilingual-1",
  "language": "en"
}

The voice maintains consistent timbre and quality across languages. For Indian languages, shunya-indic-1 offers the best cross-lingual performance.

?Why is my cloned voice not sounding like the original speaker?

The quality of the source audio sample is critical. Use a sample that is 3–6 seconds long, recorded in a quiet environment with minimal background noise, at a sample rate of at least 16 kHz. The speaker should be speaking clearly and at a consistent volume.

Try these improvements: provide multiple samples of the same speaker, ensure the sample covers a range of phonemes, and avoid samples with background music, reverb, or echo. Refer to the Voice cloning best practices for detailed guidance.

?What is the latency for streaming TTS?

Streaming TTS delivers audio with very low latency, typically 200–500 ms from the time the text is sent to the first audio chunk being received. Pre-built voices generally have the lowest latency, while cloned voices may add 100–200 ms of additional processing time.

For real-time applications, use the Express model variants and the streaming WebSocket endpoint (wss://api.shunyalabs.ai/v1/audio/synthesize). This enables time-to-first-audio as low as 150 ms under optimal conditions.

?How long can the generated audio be?

For batch TTS requests, the maximum input text length is 5,000 characters, which typically produces between 25–40 minutes of audio. For streaming TTS, there is no hard text limit per session, but each individual synthesis request has the same 5,000-character limit.

If you need to synthesize very long content, split your text into segments of 5,000 characters or fewer and process them sequentially or in parallel. Use the batch async endpoint for large-scale TTS workloads.

?How do I control speech rate and pitch?

Use the SSML <prosody> tag. The rate attribute accepts values like x-slow, slow, medium, fast, x-fast, or a percentage. The pitch attribute accepts values like low, medium, high, or relative semitones.

<speak>
  <prosody rate="85%" pitch="+1st">
    This text will be spoken slightly slower and with a higher pitch.
  </prosody>
</speak>

Alternatively, use the speed parameter in the API request for simpler rate control without SSML (0.25 to 4.0, where 1.0 is normal).

?How do I insert pauses or silence in the speech?

Use the SSML <break> tag to insert pauses. The time attribute accepts values in milliseconds (e.g., 500ms) or seconds (e.g., 2s). Use the strength attribute for relative pause lengths.

<speak>
  First sentence.<break time="1s"/>
  Second sentence after a pause.
  <break strength="strong"/>
  Third sentence after a strong break.
</speak>

Remember to set input_type: "ssml" in your API request.

?What is cross-lingual voice support?

Cross-lingual voice support allows a voice trained primarily on one language to synthesize speech in another language while maintaining the same voice identity. For example, a voice trained on English data can generate speech in Hindi while sounding like the same speaker.

To use cross-lingual synthesis, select a multilingual voice like shunya-multilingual-1 and set the language parameter to your desired target language. Cloned voices also support cross-lingual synthesis.

?Can I batch process multiple texts for TTS?

Yes, use the async batch endpoint to process multiple texts. Send an array of text items, each with its own voice, language, and format settings. The endpoint returns a job ID for tracking.

{
  "inputs": [
    {"text": "Hello world", "voice": "shunya-multilingual-1", "language": "en"},
    {"text": "नमस्ते दुनिया", "voice": "shunya-indic-1", "language": "hi"}
  ]
}

Batch processing supports up to 100 items per request. See the TTS overview for batch processing details.

?How do I create custom pronunciation for specific words?

Use the SSML <phoneme> tag to specify custom pronunciation. The tag accepts an alphabet attribute (set to ipa or xsampa) and a ph attribute with the desired pronunciation.

<speak>
  The drug is pronounced
  <phoneme alphabet="ipa" ph="ˈpaɪrəfɛnɪb">Pyrafenib</phoneme>.
</speak>

You can also use the <sub> tag to replace text with alternative pronunciation. Use the SSML <phoneme> tag for custom pronunciation. Refer to the Expression styles guide for more speech controls.

?How do I stream TTS audio to a telephone system?

Use the mulaw or alaw output format with an 8 kHz sample rate, which are standard for telephony systems. Set response_format: "mulaw" in your TTS request.

{
  "text": "Welcome to our automated service.",
  "voice": "shunya-multilingual-1",
  "response_format": "mulaw",
  "sample_rate": 8000
}

The mu-law format at 8 kHz is compatible with Asterisk, FreeSWITCH, Twilio, and most SIP-based telephony platforms. Use the mulaw or alaw response formats for telephony integration with Asterisk, FreeSWITCH, or Twilio.

?What are the supported sample rates for TTS output?

Shunya Labs TTS supports the following output sample rates:

24 kHz — Default for wav, mp3, flac, and pcm formats
16 kHz — Available for all formats. Recommended for most applications
8 kHz — Required for mulaw and alaw telephony formats
48 kHz — Available for wav format on request

Specify the desired sample rate using the sample_rate parameter. For streaming applications, 16 kHz is recommended.

?How do I use the TTS endpoint for notifications and alerts?

Use the batch TTS endpoint with short, templated text. The low latency of batch TTS (typically under 1 second for short texts) makes it suitable for real-time alerts.

{
  "text": "Alert: Server CPU usage has exceeded 90%.",
  "voice": "shunya-multilingual-1",
  "language": "en",
  "response_format": "mp3"
}

For high-volume notification systems, use the async batch endpoint to process multiple alerts in a single request. Cache generated audio files for commonly used notifications to reduce API calls.

?How does Shunya handle numbers, dates, and abbreviations?

Shunya Labs TTS automatically handles common formatting. By default, numbers are spoken as cardinal values, dates are expanded, and common abbreviations are expanded.

For finer control, use the SSML <say-as> tag. For example, <say-as interpret-as="cardinal">1234</say-as> says "one thousand two hundred thirty-four". Use the <say-as interpret-as="cardinal">1234</say-as> tag to control how numbers and dates are spoken. See the Expression styles guide for more details.

API & Integration

Authentication, error codes, rate limits, SDKs, and WebSocket connections.

?What are the API rate limits?

Rate limits vary by plan and endpoint. For the free tier:

Batch STT: 100 requests per minute
Streaming STT: 10 concurrent WebSocket connections
Batch TTS: 200 requests per minute
Streaming TTS: 10 concurrent WebSocket connections
Translation: 50 requests per minute

Paid plans have higher limits. Check the Rate limits page for current information.

?Why am I getting "429 Too Many Requests"?

The 429 status code means you have exceeded the rate limit for your plan. Here's what to do:

Wait and retry: The Retry-After response header tells you how long to wait
Reduce concurrency: Throttle parallel requests
Upgrade your plan: Paid plans offer higher limits
Use batch endpoints: Batch requests into fewer larger ones

Implement exponential backoff in your client code to handle 429s gracefully.

?How do I integrate using the Python SDK?

Install the Python SDK with pip install shunyalabs. Here's a quick example:

from shunyalabs import Shunya

client = Shunya(api_key="your-key")

transcript = client.stt.transcribe(audio="audio.wav", language="hi")
audio = client.tts.synthesize(text="नमस्ते", voice="shunya-indic-1")

The SDK supports async/await, streaming generators, and typed configuration. See the Python SDK guide for full documentation.

?Can I use the OpenAI SDK with Shunya?

Yes! Shunya is OpenAI-compatible. Point the OpenAI SDK's base_url to our endpoint:

from openai import OpenAI

client = OpenAI(
    api_key="your-shunya-key",
    base_url="https://api.shunyalabs.ai/v1"
)

All standard parameters work — model, language, response_format, etc. See the OpenAI compatibility guide.

?How do I handle WebSocket reconnections?

Implement reconnection logic with exponential backoff for production WebSocket clients:

Listen for the close event and attempt to reconnect
Start with a 1-second delay and double up to a maximum (e.g., 30 seconds)
Add jitter to avoid thundering herd on reconnection
On reconnect, re-send the StartTranscription or StartSynthesis message

Our WebSocket endpoints support graceful reconnection with a session_id in the initial response.

?What are the common HTTP status codes and what do they mean?

Shunya Labs APIs use standard HTTP status codes:

200 OK — Request succeeded
201 Created — Resource created successfully
400 Bad Request — Malformed request. Check JSON and required parameters
401 Unauthorized — Missing or invalid API key
403 Forbidden — Valid key but insufficient permissions
404 Not Found — Endpoint or resource does not exist
413 Payload Too Large — Request body exceeds maximum size
429 Too Many Requests — Rate limit exceeded
500 Internal Server Error — Unexpected server error

See the Error codes reference for complete details.

?What are the API timeout settings?

Shunya Labs API enforces the following timeouts:

HTTP requests: 60 seconds for synchronous endpoints
WebSocket connections: 30 minutes maximum. Idle connections (30s no data) are closed
Batch/async jobs: 30 minutes for STT, 15 minutes for TTS
File uploads: Must complete within 5 minutes

For long-running operations, always use the async batch endpoint. Configure your HTTP client with appropriate timeout values.

?How do I implement retry logic?

Implement retry logic with exponential backoff. Only retry on 429 (rate limited) and 5xx (server error) status codes.

import time
import random

def retry_with_backoff(fn, max_retries=5, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
            time.sleep(delay)

The Shunya Labs Python SDK includes built-in retry logic. Configure via Shunya(max_retries=3).

?Why is my WebSocket handshake failing?

WebSocket handshake failures are usually caused by:

Incorrect URL scheme: Use wss:// for secure connections
Missing authentication: Pass API key as query parameter: ?authorization=bearer YOUR_KEY
Firewall blocking: Ensure WebSocket connections on port 443 are allowed
TLS/SSL issues: Configure your client to accept your CA certificate for on-prem

Refer to the Streaming guide for WebSocket connection examples.

?How do I upgrade or downgrade my SDK version?

To upgrade the Python SDK, run pip install --upgrade shunyalabs. To install a specific version, use pip install shunyalabs==1.2.3. For Node.js, use npm update shunyalabs or npm install shunyalabs@1.2.3.

Check PyPI or npm for the latest versions. We follow semantic versioning.

?What is the difference between synchronous and asynchronous APIs?

Synchronous APIs process your request in real time and return the result in the same HTTP response. Best for short operations where you need immediate results.

Asynchronous (batch) APIs accept your request, return a job ID immediately, and process the work in the background. Poll for status or receive a webhook notification when complete. Ideal for long audio files and large batch TTS jobs.

?How do I debug API errors?

Start by checking the HTTP status code and response body. Most errors include a JSON body with error.code, error.message, and error.details fields.

Enable debug logging in the Python SDK by setting Shunya(debug=True). This logs all request URLs, headers, response status codes, and error bodies. Include the X-Request-Id from response headers when contacting support.

?How do I enable logging for API calls?

from shunyalabs import Shunya
import logging

logging.basicConfig(level=logging.DEBUG)
client = Shunya(debug=True)

For the Node.js SDK, set the DEBUG=shunyalabs:* environment variable. In production, log to a file and ensure API keys are redacted from logs.

?Can I use Shunya in a browser/JavaScript environment?

Yes, but never expose your API key in client-side code. Route API calls through a backend proxy server that attaches the API key server-side.

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream);
// Send chunks to wss://api.shunyalabs.ai/v1/audio/transcribe

Use the Streaming STT guide for browser-compatible WebSocket examples.

?How do I set up CORS for web applications?

Shunya Labs cloud API supports CORS. Configure allowed origins in your dashboard under Settings > CORS. Specify a comma-separated list of allowed origins for production use.

For on-prem deployments, CORS configuration is managed in the deployment's config.yaml file under the cors section. Wildcard origins are supported but not recommended for production.

?How do I set up webhooks for async job completion?

Include a webhook_url parameter in your async job request. When the job finishes, Shunya Labs sends an HTTP POST to your webhook URL with a JSON payload containing the job status and result URL.

{
  "audio": "long_meeting.wav",
  "webhook_url": "https://your-server.com/webhooks/shunya",
  "language": "en"
}

Your endpoint must respond with 200 OK within 5 seconds. Your endpoint must respond with 200 OK within 5 seconds. Webhook payload includes the full response object with a webhook_id for tracking.

?How do I use the API without an SDK?

Use any HTTP client. Example with curl for STT:

curl -X POST "https://api.shunyalabs.ai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $SHUNYA_API_KEY" \
  -F "audio=@speech.wav" \
  -F "language=en" \
  -F "punctuation=true"

For TTS:

curl -X POST "https://api.shunyalabs.ai/v1/audio/synthesize" \
  -H "Authorization: Bearer $SHUNYA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "shunya-multilingual-1", "language": "en"}' \
  --output speech.mp3

All API endpoints are documented in the API Reference.

?How do I rotate API keys without downtime?

Create a new API key in the dashboard while keeping your existing key active. Update your application to use the new key, deploy to staging first for testing, then update production.

After confirming production is stable with the new key, revoke the old key. Maintain multiple keys during rotation. Supports up to 100 active API keys per account.

?How do I set custom headers in API requests?

Shunya Labs APIs support custom headers like X-Request-Id for tracking, X-Api-Version for API versioning, and X-Request-Timeout for timeout overrides.

Custom headers must be prefixed with X-. Include the X-Request-Id from response headers when contacting support.

?How do I migrate from another speech API provider?

Since Shunya is OpenAI-compatible, you only need to change the base_url and api_key in existing OpenAI integration code. For other providers, update your code to use Shunya's API format.

The OpenAI compatibility guide provides a side-by-side comparison if migrating from OpenAI Speech Services.

Deployment & On-Prem

Self-hosting, hardware requirements, air-gapped environments, and data security.

?Can I deploy Shunya on my own hardware?

Yes. Shunya Labs supports on-premises deployment for organizations that need data sovereignty, low latency, or air-gapped operation. You can deploy on your own data center, private cloud, or edge devices.

The same API surface works everywhere — cloud and on-prem. Contact sales for a deployment evaluation.

?What are the hardware requirements?

Minimum recommendations:

CPU-only: 8+ cores, 32 GB RAM, 100 GB SSD
GPU-accelerated: NVIDIA T4 / A10G / A100 with 16+ GB VRAM
Air-gapped: Same as above, no internet required after installation

All models are CPU-compatible. See the Deployment guide for detailed sizing.

?How does air-gapped deployment work?

The system runs with no connection to the public internet. Models are packaged as Docker images and transferred via physical media or private network. Triton Inference Server handles model serving locally.

No telemetry, no external API calls — everything runs inside your network. Updates are delivered as encrypted bundles. Common in defense, government, and regulated healthcare environments.

?Is my data encrypted?

Yes. Shunya Labs follows industry-standard security practices:

In transit: TLS 1.3 for all API and WebSocket connections
At rest: AES-256 encryption for all stored data
Audio retention: Audio files are deleted immediately after processing

We are SOC 2 Type II and ISO 27001:2022 certified, and comply with HIPAA, GDPR, and CCPA. See the Compliance page for details.

?How do I deploy using Docker?

Pull the image from our private registry:

docker pull shunyalabs/inference-server:latest
docker run --gpus all -p 8000:8000 -p 8001:8001 \
  -v /path/to/models:/models \
  -e SHUNYA_LICENSE_KEY="your-license" \
  shunyalabs/inference-server:latest

The container exposes the API on port 8000 and health checks on port 8001. See the Deployment guide for environment variables and volume mounts.

?Does Shunya support Kubernetes?

Yes, with official Helm charts. The charts configure deployments, services, ingress, horizontal pod autoscaling, and persistent volume claims. GPU nodes supported via NVIDIA device plugin.

helm repo add shunyalabs https://helm.shunyalabs.ai
helm install shunya-inference shunyalabs/inference-server \
  --set licenseKey="your-license" \
  --set gpu.enabled=true

See the Deployment guide for Kubernetes production best practices.

?How do I configure GPU passthrough for Docker?

Install the NVIDIA Container Toolkit on your host, then run containers with the --gpus all flag. For Kubernetes, install the NVIDIA device plugin daemonset.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU access with nvidia-smi inside the container.

?How do I update models in an on-prem deployment?

Model updates are delivered as encrypted bundles. Download and install using the CLI:

wget https://models.shunyalabs.ai/updates/indic-stt-v2.1.bundle
shunya-cli models install indic-stt-v2.1.bundle
shunya-cli models reload indic-stt

For air-gapped environments, transfer bundles via physical media. The CLI supports rollback to previous versions. The CLI supports rollback to previous versions for easy model management.

?How do I scale the deployment for high throughput?

Horizontal scaling — deploy multiple instances behind a load balancer. For Kubernetes, configure HPA based on GPU utilization.

Vertical scaling — use more powerful GPUs. Model parallelism — distribute different models across separate instances. Use the /v1/health endpoint to monitor instance load. See the Deployment guide for scaling benchmarks.

?How do I monitor the health of my deployment?

Health check endpoints on port 8001: GET /v1/health returns service status including model load states and GPU utilization. Prometheus metrics at /v1/metrics expose latency, error rates, and GPU memory usage.

Configure alerts for critical thresholds. The Helm chart includes a pre-configured ServiceMonitor for Prometheus Operator. The Helm chart includes a pre-configured ServiceMonitor for Prometheus Operator.

?Where can I find deployment logs?

Logs are written to stdout/stderr. For Docker, use docker logs <container-id>. For Kubernetes, use kubectl logs <pod-name>. Set log verbosity via SHUNYA_LOG_LEVEL environment variable.

For production, configure a log shipper to forward logs to a centralized system like Elasticsearch, CloudWatch, or Splunk. Forward logs using standard Docker logging drivers or a log shipper like Filebeat.

?How do I back up and restore my deployment?

Back up configuration files (config.yaml, .env), model files (/var/lib/shunya/models), license keys, and TLS certificates.

For Docker: docker run --rm -v shunya-models:/models -v /backup:/backup ubuntu tar czf /backup/models-backup.tar.gz /models. For Kubernetes, use Velero. Always test backup/restore in a non-production environment first.

?What network ports and protocols are required?

Required ports:

8000 (TCP) — REST API endpoint
8001 (TCP) — Health check and metrics endpoint
8002 (TCP) — WebSocket endpoint for streaming
22 (TCP) — SSH access for maintenance (optional)

All communication uses TLS 1.3 encryption. In air-gapped environments, no outbound connections are required. In air-gapped environments, no outbound connections are required.

?How do I configure TLS certificates for on-prem deployment?

Provide TLS certificates via environment variables:

docker run --gpus all -p 443:8000 \
  -v /etc/ssl/certs:/certs \
  -e SHUNYA_TLS_CERT=/certs/server.crt \
  -e SHUNYA_TLS_KEY=/certs/server.key \
  shunyalabs/inference-server:latest

For Kubernetes, configure TLS via Ingress with cert-manager. For Kubernetes, configure TLS via Ingress with cert-manager.

?Can I use Shunya with a reverse proxy?

Yes. Works with Nginx, HAProxy, or Traefik. The reverse proxy handles TLS termination and load balancing. Configure it to forward requests to port 8000 (HTTP) or 8002 (WebSocket).

# Nginx example
server {
    listen 443 ssl;
    location / {
        proxy_pass http://inference-server:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Ensure timeouts are configured for streaming connections. Use health checks on port 8001 for high-availability setups.

?How do I perform a rolling update?

For Kubernetes: helm upgrade shunya-inference shunyalabs/inference-server --set image.tag=v2.1.0 --reuse-values. Kubernetes performs a rolling update with zero downtime.

For Docker Compose: scale up a new instance, verify health, then scale down the old one. Use a blue-green strategy for manual Docker deployments.

?How do I set up logging and monitoring for on-prem deployment?

Deployment outputs structured JSON logs to stdout/stderr. Configure a log shipper (Filebeat, Fluentd) to forward logs to your centralized platform. Prometheus metrics are exposed at /v1/metrics on port 8001.

Key metrics include request count, latency histogram, GPU memory usage, and model health. Create a Grafana dashboard with alerts. See the Deployment guide for monitoring recommendations.

Billing & Support

Pricing plans, support channels, and account management.

?How does pricing work?

Shunya Labs uses a pay-as-you-go pricing model:

Free tier: Monthly credits for STT, TTS, and Translation — no credit card required
Usage-based: Pay per audio second (STT) or per character (TTS)
Enterprise: Custom pricing for on-prem deployments and dedicated throughput

Set a monthly budget cap in your dashboard. You'll never be charged more than your budget. Alerts at 50%, 80%, and 100% usage.

?What support plans are available?

Community: Free — documentation, GitHub issues, community Discord
Standard: Email support with 8-hour response (included with paid plans)
Enterprise: Dedicated Slack, 1-hour SLA, quarterly reviews, priority features

All paid plans include Standard support. Enterprise requires an annual contract.

?How do I contact support?

You can reach us through any of these channels:

Documentation: Start here for most questions
GitHub: Issues on the Hugging Face organization
Website: Contact form at shunyalabs.ai/contact
Dashboard: In-app chat (available on paid plans)

?How do I track my usage in real time?

Real-time usage is in the dashboard under the Usage tab. Displays current billing period consumption for STT, TTS, and Translation with updates within 60 seconds.

For programmatic access, use GET /v1/usage to retrieve usage data as JSON. Set up email notifications for custom usage thresholds.

?How do I set up billing alerts?

Go to Settings > Billing > Alerts. Create alerts at specific usage percentages (50%, 75%, 90%) or absolute thresholds. Alerts can be delivered via email, webhook, or Slack.

Set a monthly budget cap that automatically stops API access when reached. Budget caps can be set globally or per API key.

?How do I download invoices?

Go to Settings > Billing > Invoices. View all past invoices with billing period, amount, payment status, and PDF download link.

Invoices include line-item breakdown by service. Set up automatic email delivery to your finance team. Enterprise invoices are also available via API.

?How do I upgrade or downgrade my plan?

Go to Settings > Billing > Plan. Upgrades take effect immediately with prorated charges. Downgrades take effect at the next billing cycle.

Check usage will fit within new plan limits before downgrading. Free tier credits are not forfeited when upgrading. Contact your account manager for Enterprise plan changes.

?What payment methods are accepted?

Credit/Debit cards: Visa, Mastercard, American Express, Discover
Wire transfer: Available for annual Enterprise contracts
ACH direct debit: For US customers on annual plans
UPI: Unified Payments Interface for Indian customers
Net banking: Available for select Indian banks

All payments processed securely through our PCI-compliant payment gateway. We do not store full card numbers.

?How do I change my billing information?

Go to Settings > Billing > Payment Methods. Update credit card details, billing address, tax/VAT ID, and company name. Changes take effect immediately for future charges.

For legal entity or tax registration changes, update before the next invoice is generated. Enterprise customers may require a contract amendment.

?Can I get a refund?

Contact the billing team at billing@shunyalabs.ai within 30 days of the charge. Refunds for usage-based charges are generally not issued for consumed services, but overcharges will be corrected.

Annual Enterprise contract refunds follow your agreement terms. Prepaid annual plan customers may receive prorated refunds for unused portions. All refunds processed within 10 business days.

?How do I cancel my account?

Go to Settings > Account > Cancel Account. API access is deactivated immediately. Data is retained for 90 days in case of reactivation, then permanently deleted.

Download invoices and usage data before cancellation. Enterprise cancellation follows your contract terms.

?What is the SLA for Enterprise plans?

Enterprise plans include a 99.95% uptime SLA for API endpoints, measured monthly. Scheduled maintenance (48-hour notice) is excluded.

Service credits: 5% of monthly fee for each 30 minutes below 99.95%, up to 100% of monthly fee. Claims must be submitted within 30 days. Includes 1-hour response time for critical (P1) tickets.

?How do I add team members to my billing account?

Go to Settings > Team. Click Invite Member and enter their email. Assign roles: Admin, Developer, or Viewer. Usage from all members aggregates under the same billing account.

Enterprise plans support SAML/SSO integration. View per-member usage breakdown in the Usage tab. Remove members from the Team settings.

?How do I get a VAT or GST invoice?

Update your billing information in Settings > Billing > Tax Information. Enter your VAT/GST registration number and company address. Future invoices will include these details automatically.

For corrected invoices, contact billing@shunyalabs.ai with your invoice number. Enterprise invoices can include purchase order numbers.

?What happens if I exceed my monthly budget cap?

API access is automatically paused for the remainder of the billing cycle. You receive a notification email and dashboard banner. Existing WebSocket sessions are allowed to complete.

To resume service, increase the budget cap in the dashboard or wait for the next billing cycle. Set per-key budget caps for finer-grained control.

?How do I set up a payment method for the first time?

Go to Settings > Billing > Payment Methods. Click Add Payment Method and enter your card details. A temporary $1.00 authorization hold is released immediately.

Indian customers can use UPI or net banking. Multiple cards can be added. All payment info is processed by our PCI-compliant payment processor.

?How do I integrate Shunya with my existing application?

Integrating Shunya Labs with your existing application depends on your tech stack. For Python applications, install the SDK with pip install shunyalabs and create a client instance using your API key. For Node.js, use npm install shunyalabs. For other languages, use direct HTTP requests to the REST API endpoints.

The integration typically involves replacing existing speech processing calls with Shunya API calls. If you are migrating from another provider, the OpenAI compatibility guide provides code comparisons. For web applications, route API calls through a backend proxy to keep your API key secure.

?What are the best practices for API key security?

API key security is critical to prevent unauthorized usage. Follow these best practices:

Never hardcode API keys in source code, configuration files, or client-side applications
Use environment variables or a secure secrets manager (like AWS Secrets Manager, HashiCorp Vault, or Kubernetes Secrets)
Rotate keys regularly — at least every 90 days for production keys
Use separate keys for development, staging, and production environments
Set IP address restrictions on your API keys if your infrastructure has static IPs

If you suspect a key has been compromised, revoke it immediately from the dashboard and generate a new one. Check the Security & compliance page for more detailed recommendations.

?How do I test the API without using my quota?

Shunya Labs provides a free tier with monthly credits that allow you to test the API without incurring charges. The free tier includes a limited number of audio seconds for STT and characters for TTS each month. Additionally, the Playground in the dashboard lets you test endpoints interactively without writing code.

For development and testing, create a separate API key with lower rate limits and use it exclusively in your development environment. You can monitor your test usage in the dashboard under the Usage tab. If you need additional testing capacity, contact support to discuss trial credit extensions.

?How do I transcribe audio from a video file?

To transcribe audio from a video file, first extract the audio track using a tool like ffmpeg, then send the extracted audio to the STT API. Here's how to extract audio:

ffmpeg -i video.mp4 -vn -ar 16000 -ac 1 -f wav audio.wav

This extracts a 16 kHz mono WAV file, which is the optimal format for STT. You can then upload the WAV file to the batch transcription endpoint. For long videos, split the audio into segments and process them in parallel for faster results.

?What is the maximum file size for streaming STT?

Streaming STT does not have a fixed file size limit because it processes audio incrementally. Instead, there is a connection duration limit of 30 minutes per WebSocket session. The amount of data you can send within that window depends on the audio encoding — for 16-bit PCM at 16 kHz mono, the theoretical maximum is about 576 MB over 30 minutes.

For pre-recorded files longer than 30 minutes, use the batch transcription endpoint instead. If your streaming session exceeds the 30-minute limit, the connection is closed gracefully, and you can establish a new session to continue.

?How do I get a confidence score for each word?

To get per-word confidence scores, set both word_timestamps and word_confidence to true in your STT request:

{
  "audio": "speech.wav",
  "word_timestamps": true,
  "word_confidence": true,
  "language": "en"
}

The response includes a confidence field for each word object, ranging from 0.0 to 1.0. This is useful for identifying uncertain words that may need human review. Word confidence is available for all models in batch mode.

?How do I adjust the speaking rate for different languages?

Speaking rate can be adjusted using the speed parameter or the SSML <prosody rate> tag. The optimal speaking rate varies by language — for example, a rate of 1.0 for English may sound faster than the same rate for Hindi.

When using multilingual voices, you may need to adjust the rate per language for the most natural output. As a starting point, use 0.9 for Hindi and other Indian languages, 1.0 for English, and 1.1 for shorter-form content. The speed parameter accepts values from 0.25 to 4.0.

?What is the difference between a neural and a standard voice?

Shunya Labs uses neural TTS technology for all its voices, which means they are based on deep neural networks that produce natural, human-like speech with appropriate intonation, stress, and rhythm. There are no "standard" or "legacy" voices — all voices use the latest neural architecture.

Within neural voices, shunya-multilingual-1 is optimized for broad language coverage, shunya-indic-1 for Indian language authenticity, and Express variants for minimal latency. All variants produce neural-quality speech with different trade-offs.

?How do I synthesize speech in a female or male voice?

Each voice in Shunya Labs TTS has an associated gender. List available voices using GET /v1/audio/voices, which includes a gender field for each voice. Then select the desired voice ID in your synthesis request.

Female voices include shunya-voice-f1 and shunya-voice-f2, while male voices include shunya-voice-m1 and shunya-voice-m2. Some multilingual voices support both gender variants. For cloned voices, the gender matches the source speaker.

?How do I use pagination with list endpoints?

List endpoints like GET /v1/audio/voices and GET /v1/usage support pagination. Use the limit parameter (default 20, max 100) and offset parameter to navigate pages. The response includes a total field and a next_offset field for the next page.

Implement a loop that increments the offset by the limit value until all pages are retrieved. Example: GET /v1/usage?limit=50&offset=100 retrieves items 101–150.

?How do I handle rate limits in a multi-threaded application?

In a multi-threaded application, rate limits apply across all threads. Implement a centralized rate limiter that tracks request counts before dispatching requests. For Python, use shunyalabs.RateLimiter. For distributed systems, use a shared Redis counter.

Set your local rate limit to 80% of the API limit for headroom. On 429 responses, implement exponential backoff. Consider separate API keys for different services to isolate rate limit impacts.

?How do I get the request ID for troubleshooting?

Every API response includes an X-Request-Id header. In the Python SDK, access it from the response object: response.request_id. This ID allows Shunya Labs support to look up the exact request in server logs.

Include the request ID when contacting support for faster issue resolution. For direct HTTP requests, capture the X-Request-Id header from the response.

?How do I set up high availability for on-prem deployment?

Deploy multiple inference server instances behind a load balancer. Each instance runs independently. If one fails, the load balancer routes traffic to healthy instances. For Kubernetes, configure multiple replicas with pod anti-affinity across nodes.

Use GET /v1/health on port 8001 for health probes. Configure readiness probes that check model load status. Use highly available storage. Test HA setup by simulating failures in a staging environment before production.

?How do I configure resource limits for Docker containers?

Use Docker's --cpus and --memory flags:

docker run --gpus all \
  --cpus="16" \
  --memory="64g" \
  -p 8000:8000 \
  shunyalabs/inference-server:latest

Assign specific GPUs with --gpus '"device=0,1"'. Reserve at least 32 GB memory for STT models, 16 GB for TTS. Monitor with docker stats and adjust based on observed consumption.

?How do I migrate from cloud to on-prem deployment?

Contact Shunya Labs sales to set up an on-prem license and obtain the deployment package. Install using Docker or Kubernetes as described in the Deployment guide.

Ensure hardware meets minimum requirements, configure TLS, set up monitoring, and update your application's base_url to your on-prem server. Test with a subset of traffic before full cutover. Model versions and API behavior are consistent between cloud and on-prem.

?How do I compare pricing across different plans?

A pricing comparison is available on the Shunya Labs website. Plans differ by rate limits, included features, and support levels. The free tier suits prototyping, while paid plans offer higher limits and lower per-unit pricing.

In the dashboard, go to Settings > Billing > Plan for a side-by-side comparison with your current usage. Contact sales for help choosing the right plan based on your expected usage.

?How do I export my usage data?

Export usage data from the Usage tab in the dashboard as CSV. Select date range and granularity (daily, weekly, monthly). The CSV includes date, service, quantity, and estimated cost columns.

For programmatic export, use GET /v1/usage with start_date and end_date parameters to retrieve JSON data. Enterprise customers can access team member and API key breakdowns.

?How do I handle disputed charges?

Contact billing@shunyalabs.ai within 30 days of the charge with your invoice number and dispute reason. The billing team investigates and responds within 5 business days.

Common disputes include duplicate charges, incorrect pricing, or unauthorized usage. During investigation, your account remains active. If resolved in your favor, the amount is credited or refunded. Enterprise dispute terms follow your contract.

Additional & questions

Latency, Vocabulary, and account management.

?What is the latency for real-time transcription?

Real-time transcription latency depends on the audio length, model size, and your region. Typical latency for real-time streaming transcription is under 300ms for short utterances when using the lightweight Nano model. Larger models like Universal may have 500-800ms latency.

For the lowest possible latency, use the Nano model with streaming mode and keep audio chunks under 2 seconds. Network latency to the nearest server region also affects overall response time. For on-premise deployments, latency is typically under 100ms since the network hop is eliminated.

?Can I use SSML to add pauses and emphasis?

Yes, Shunya Labs TTS fully supports the Speech Synthesis Markup Language (SSML). You can use <break> tags to add pauses, <prosody> tags to control rate, pitch, and volume, and <emphasis> tags to add vocal emphasis to specific words or phrases.


  Welcome to Shunya Labs.
  
  Your audio solution provider.

All SSML elements are supported as per the W3C SSML 1.1 specification. When using SSML, set the input_type parameter to ssml in your API request.

?How do I find my data residency and region information?

Your data residency information is displayed in the dashboard under Settings > Account > Data Residency. Shunya Labs cloud services are hosted in multiple regions including US East (Virginia), EU West (Frankfurt), and Asia South (Mumbai). The default region is determined by your account's registered location.

For Enterprise and on-prem deployments, data residency is entirely under your control since the inference server runs within your infrastructure. If you need data to be processed in a specific cloud region for compliance reasons, contact sales to have your account configured for a specific region.

?How do I enable audit logging for my account?

Audit logging records all API activity, configuration changes, and user actions for security and compliance purposes. To enable audit logging, go to Settings > Security > Audit Log in the dashboard. Audit logs are available on paid and Enterprise plans with 90-day retention, and 365 days on Enterprise.

The audit log captures who performed what action, when, and from which IP address. Events include API key creation and revocation, plan changes, team member additions, and settings modifications. Export audit logs in CSV or JSON format for integration with your SIEM system.

?How do I use custom vocabulary with STT?

Custom vocabulary improves recognition accuracy for domain-specific terms. Pass a list of terms using the keyterm_normalization parameter in your STT request. These terms are given higher priority during transcription.

{
  "audio": "lecture.wav",
  "keyterm_normalization": ["photosynthesis", "mitochondria", "CRISPR-Cas9"],
  "language": "en"
}

Enterprise plans can also use custom language models fine-tuned on your domain data, providing the highest accuracy for specialized domains like legal, medical, or technical content.

?Can I use SSML to add pauses and emphasis?

Yes, Shunya Labs TTS fully supports SSML 1.1. Use <break> tags for pauses, <prosody> for rate/pitch/volume control, and <emphasis> for vocal emphasis on specific words.


  Welcome to Shunya Labs.
  
  Your audio solution provider.

Set input_type to ssml in your TTS API request when using SSML input.

?How do I transcribe audio with background noise?

Enable VAD filtering by setting vad_filter: true with an appropriate threshold to isolate speech from background noise. The Universal model is robust to moderate background noise levels.

For extremely noisy environments, preprocess audio with noise reduction tools like ffmpeg or sox before API submission.

?How do I customize voice tone for my brand?

Use expression styles like <Conversational> and <News> for preset tonal characteristics. For fine control, use SSML <prosody> tags to adjust rate, pitch, and volume to match your brand voice.

Enterprise customers can work with Shunya Labs to create a custom voice that embodies their brand identity.