STT Endpoint
Configure the Speech-to-Text (STT) endpoint for transcription.
Overview
Intervu uses an OpenAI-compatible STT endpoint to transcribe audio in real-time. We recommend Speaches for local, private transcription.
Configuration
Open Settings (gear icon) and locate the STT section:
| Setting | Description | Example |
|---|---|---|
| STT Endpoint | URL for the STT API | http://localhost:8000/v1/audio/transcriptions |
| STT Model | Model identifier | Systran/faster-distil-whisper-small.en |
| STT API Key | API key (if required) | Leave empty for Speaches |
Default Values
STT Endpoint: http://localhost:8000/v1/audio/transcriptions
STT Model: Systran/faster-distil-whisper-small.en
STT API Key: (empty)Testing the Connection
After configuring the endpoint:
- Click Test STT button in Settings
- Intervu sends a silent audio sample to verify the connection
- A success message confirms the endpoint is working
Test First
Always test the STT endpoint before starting an interview. This catches configuration errors early.
Model Selection
Choose a model based on your needs:
| Model | Speed | Accuracy | VRAM | Languages |
|---|---|---|---|---|
Systran/faster-distil-whisper-small.en | ⚡ Fast | Good | ~1GB | English only |
Systran/faster-distil-whisper-medium.en | Medium | Better | ~2GB | English only |
Systran/faster-whisper-small.en | Fast | Good | ~1GB | Multilingual |
openai/whisper-medium | Slow | Best | ~5GB | Multilingual |
Recommendations
- Real-time interviews: Use
faster-distil-whisper-small.enfor best speed - Non-English: Use
faster-whisper-small.enor multilingual models - Maximum accuracy: Use
whisper-mediumorwhisper-large
Advanced STT Settings
In Settings → Advanced:
Silence RMS Threshold
Audio below this level is treated as silence and not sent to STT.
- Default:
0.005 - Higher: More aggressive filtering (may miss quiet speech)
- Lower: Less filtering (may send noise to STT)
Audio Chunk Duration
Length of audio chunks sent to STT.
- Default:
3seconds - Lower: Faster response, but more API calls
- Higher: Fewer API calls, but slower response
Hallucination Phrases
Comma-separated phrases that should be ignored.
- Default:
you,thank you,thanks,thanks for watching,thank you for watching,thanks for listening,thank you for listening,bye,goodbye,the end,so,okay,hmm,uh,um,oh,ah,i - Purpose: Whisper sometimes hallucinates these phrases from silence
Using Other STT Services
Intervu supports any OpenAI-compatible STT endpoint.
OpenAI Whisper API
bash
Endpoint: https://api.openai.com/v1/audio/transcriptions
Model: whisper-1
API Key: sk-...Azure Speech Services
bash
Endpoint: https://your-region.api.cognitive.microsoft.com/openai/deployments/whisper/audio/transcriptions?api-version=2024-02-15-preview
Model: whisper
API Key: your-azure-keyCustom Endpoint Requirements
Your endpoint must accept:
- POST request to
/v1/audio/transcriptions - Multipart form with audio file
modelparameter- Return JSON:
{ text: "transcribed text" }
Troubleshooting
Connection Failed
- Verify Speaches is running:
docker ps - Check the endpoint URL is correct
- Try
curl http://localhost:8000/v1/models
Slow Transcription
- Use a smaller model
- Ensure GPU acceleration is working
- Check Speaches logs:
docker logs speaches
Poor Accuracy
- Speak clearly and close to microphone
- Try a larger model
- Adjust silence threshold in Advanced settings
Hallucinated Text
- Check hallucination phrases in Advanced settings
- Add common false positives to the list
Next Steps
- LLM Endpoint — Configure answer generation
- Advanced Settings — Fine-tune behavior