Transcription Issues

Problems with speech-to-text transcription quality or performance.

Symptoms

Empty or missing transcripts
Low accuracy transcriptions
Hallucinated words in transcript
Slow transcription
Words appearing out of order
Transcript not matching audio

No Transcription

Check STT Connection

Open Settings (gear icon)
Click Test STT
Should see "Success" or error message

Common Causes

Cause	Solution
Speaches not running	`docker ps` to verify
Wrong endpoint URL	Check STT endpoint setting
Model not downloaded	`uvx speaches-cli model ls`
APIkey missing	Add key if required

Verify Speaches

bash

# Check if running
docker ps | grep speaches

# Should show the container

# Check models
curl http://localhost:8000/v1/models

# Test transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@test.wav" \
  -F "model=Systran/faster-distil-whisper-small.en"

Low Accuracy

Improve Microphone Quality

Use a quality microphone/headset
Position mic close to mouth
Reduce background noise
Enable noise suppression if available

Use a Better Model

Smaller models are faster but less accurate:

Model	Speed	Accuracy	Use Case
`faster-distil-whisper-small.en`	Fast	Good	Real-time
`faster-whisper-small.en`	Medium	Better	Better accuracy
`whisper-medium`	Slow	Best	Maximum accuracy

Audio Quality Tips

Speak clearly — Avoid mumbling
Consistent distance — Keep mic at same distance
Quiet environment — Reduce background noise
Proper gain — Avoid clipping (red in level meter)

Hallucinated Words

Whisper sometimes "hears" words in silence.

Common Hallucinations

"you"
"thank you"
"thanks for watching"
"bye"
"okay"
"hmm"

Solution: Hallucination Filter

Intervu filters common hallucinations by default.

Open Settings (gear icon)
Go to Advanced Settings
Find Hallucination Phrases
Add any false positives you see

Format: Comma-separated list

you,thank you,thanks,bye,ok,hmm,custom phrase here

How It Works

When a transcription matches only hallucination phrases, it's discarded:

Transcript: "you"
Filtered: Yes (matches "you" in filter)

Transcript: "you are welcome"
Filtered: No (contains real words)

Slow Transcription

Check Processing Time

Each audio chunk should transcribe in1-3 seconds.

Causes:

Large model
No GPU acceleration
Network latency (for remote STT)
High system load

Solutions

Use smallermodel

bash

uvx speaches-cli model download Systran/faster-distil-whisper-small.en

Enable GPU
- Use compose.cuda.yaml for Speaches
- Verify GPU is detected: nvidia-smi
Use local STT
- Avoid remote APIs for real-time transcription
- Keep latency minimal
Reduce chunk duration
- Smaller chunks = faster processing
- Trade-off: more API calls

Out of Order Words

Cause: Network Latency

When using remote STT, packets may arrive out of order.

Solutions:

Use local Speaches
Check network stability
Reduce chunk duration

Cause: Debounce Timer

If you speak during the debounce period:

[0.0s] Interviewer starts speaking
[1.5s] "Tell me about..." (partial sentence)
[2.0s] Debounce triggers
[2.0s] "Tell me about your experience" (complete)

Solution: Increase LLM debounce seconds in Advanced Settings.

Speaker Attribution

Blurred Speakers

Intervu should show:

[Interviewer] — System audio (blue)
[You] — Microphone (white)

If Speakers Are Swapped

Check device selection in Settings
Ensure system audio device is "CABLE Output"
Ensure microphone is your actual mic

If One Speaker Missing

Check audio level meters
Verify both devices are selected
Test each device independently

Windows: Ensure "CABLE Output" is selected as system audio device.

macOS: Ensure "BlackHole 2ch" is selected as system audio device and a multi-output device is set as the system sound output. See macOS Setup.

Debugging Transcription

Enable Logging

Open Settings (gear icon)
Click Open Logs Folder
Check app.log for transcription messages

Log Examples

Successful transcription:

[stt] Received audio chunk: 48000 bytes
[stt] Transcription: "Tell me about your experience"
[stt] Transcription took: 1.2s

Failed transcription:

[stt] Transcription failed: Connection refused
[stt] Retrying...

Hallucination filtered:

[stt] Transcription: "thank you"
[stt] Filtered (hallucination): "thank you"

Still Not Working?

Test with Different Audio

Use a recorded audio file

Test directly with Speaches:

bash

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@your-audio.wav" \
  -F "model=Systran/faster-distil-whisper-small.en"

If that works, issue is with audio capture
If that fails, issue is with STT

Reset Audio Pipeline

Close Intervu completely
Clear any cached data
Restart Speaches: docker restart speaches
Start Intervu
Re-select audio devices

Next Steps

STT Endpoint — STT configuration
No Audio Devices — Audio capture issues

Transcription Issues ​

Symptoms ​

No Transcription ​

Check STT Connection ​

Common Causes ​

Verify Speaches ​

Low Accuracy ​

Improve Microphone Quality ​

Use a Better Model ​

Audio Quality Tips ​

Hallucinated Words ​

Common Hallucinations ​

Solution: Hallucination Filter ​

How It Works ​

Slow Transcription ​

Check Processing Time ​

Solutions ​

Out of Order Words ​

Cause: Network Latency ​

Cause: Debounce Timer ​

Speaker Attribution ​

Blurred Speakers ​

If Speakers Are Swapped ​

If One Speaker Missing ​

Debugging Transcription ​

Enable Logging ​

Log Examples ​

Still Not Working? ​

Test with Different Audio ​

Reset Audio Pipeline ​

Next Steps ​

Transcription Issues

Symptoms

No Transcription

Check STT Connection

Common Causes

Verify Speaches

Low Accuracy

Improve Microphone Quality

Use a Better Model

Audio Quality Tips

Hallucinated Words

Common Hallucinations

Solution: Hallucination Filter

How It Works

Slow Transcription

Check Processing Time

Solutions

Out of Order Words

Cause: Network Latency

Cause: Debounce Timer

Speaker Attribution

Blurred Speakers

If Speakers Are Swapped

If One Speaker Missing

Debugging Transcription

Enable Logging

Log Examples

Still Not Working?

Test with Different Audio

Reset Audio Pipeline

Next Steps