Transcription Issues
Problems with speech-to-text transcription quality or performance.
Symptoms
- Empty or missing transcripts
- Low accuracy transcriptions
- Hallucinated words in transcript
- Slow transcription
- Words appearing out of order
- Transcript not matching audio
No Transcription
Check STT Connection
- Open Settings (gear icon)
- Click Test STT
- Should see "Success" or error message
Common Causes
| Cause | Solution |
|---|---|
| Speaches not running | docker ps to verify |
| Wrong endpoint URL | Check STT endpoint setting |
| Model not downloaded | uvx speaches-cli model ls |
| APIkey missing | Add key if required |
Verify Speaches
# Check if running
docker ps | grep speaches
# Should show the container
# Check models
curl http://localhost:8000/v1/models
# Test transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@test.wav" \
-F "model=Systran/faster-distil-whisper-small.en"Low Accuracy
Improve Microphone Quality
- Use a quality microphone/headset
- Position mic close to mouth
- Reduce background noise
- Enable noise suppression if available
Use a Better Model
Smaller models are faster but less accurate:
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
faster-distil-whisper-small.en | Fast | Good | Real-time |
faster-whisper-small.en | Medium | Better | Better accuracy |
whisper-medium | Slow | Best | Maximum accuracy |
Audio Quality Tips
- Speak clearly — Avoid mumbling
- Consistent distance — Keep mic at same distance
- Quiet environment — Reduce background noise
- Proper gain — Avoid clipping (red in level meter)
Hallucinated Words
Whisper sometimes "hears" words in silence.
Common Hallucinations
- "you"
- "thank you"
- "thanks for watching"
- "bye"
- "okay"
- "hmm"
Solution: Hallucination Filter
Intervu filters common hallucinations by default.
- Open Settings (gear icon)
- Go to Advanced Settings
- Find Hallucination Phrases
- Add any false positives you see
Format: Comma-separated list
you,thank you,thanks,bye,ok,hmm,custom phrase hereHow It Works
When a transcription matches only hallucination phrases, it's discarded:
Transcript: "you"
Filtered: Yes (matches "you" in filter)
Transcript: "you are welcome"
Filtered: No (contains real words)Slow Transcription
Check Processing Time
Each audio chunk should transcribe in1-3 seconds.
Causes:
- Large model
- No GPU acceleration
- Network latency (for remote STT)
- High system load
Solutions
Use smallermodel
bashuvx speaches-cli model download Systran/faster-distil-whisper-small.enEnable GPU
- Use
compose.cuda.yamlfor Speaches - Verify GPU is detected:
nvidia-smi
- Use
Use local STT
- Avoid remote APIs for real-time transcription
- Keep latency minimal
Reduce chunk duration
- Smaller chunks = faster processing
- Trade-off: more API calls
Out of Order Words
Cause: Network Latency
When using remote STT, packets may arrive out of order.
Solutions:
- Use local Speaches
- Check network stability
- Reduce chunk duration
Cause: Debounce Timer
If you speak during the debounce period:
[0.0s] Interviewer starts speaking
[1.5s] "Tell me about..." (partial sentence)
[2.0s] Debounce triggers
[2.0s] "Tell me about your experience" (complete)Solution: Increase LLM debounce seconds in Advanced Settings.
Speaker Attribution
Blurred Speakers
Intervu should show:
- [Interviewer] — System audio (blue)
- [You] — Microphone (white)
If Speakers Are Swapped
- Check device selection in Settings
- Ensure system audio device is "CABLE Output"
- Ensure microphone is your actual mic
If One Speaker Missing
- Check audio level meters
- Verify both devices are selected
- Test each device independently
Windows: Ensure "CABLE Output" is selected as system audio device.
macOS: Ensure "BlackHole 2ch" is selected as system audio device and a multi-output device is set as the system sound output. See macOS Setup.
Debugging Transcription
Enable Logging
- Open Settings (gear icon)
- Click Open Logs Folder
- Check
app.logfor transcription messages
Log Examples
Successful transcription:
[stt] Received audio chunk: 48000 bytes
[stt] Transcription: "Tell me about your experience"
[stt] Transcription took: 1.2sFailed transcription:
[stt] Transcription failed: Connection refused
[stt] Retrying...Hallucination filtered:
[stt] Transcription: "thank you"
[stt] Filtered (hallucination): "thank you"Still Not Working?
Test with Different Audio
- Use a recorded audio file
- Test directly with Speaches:bash
curl -X POST http://localhost:8000/v1/audio/transcriptions \ -F "file=@your-audio.wav" \ -F "model=Systran/faster-distil-whisper-small.en" - If that works, issue is with audio capture
- If that fails, issue is with STT
Reset Audio Pipeline
- Close Intervu completely
- Clear any cached data
- Restart Speaches:
docker restart speaches - Start Intervu
- Re-select audio devices
Next Steps
- STT Endpoint — STT configuration
- No Audio Devices — Audio capture issues