Skip to content

Transcription Issues

Problems with speech-to-text transcription quality or performance.

Symptoms

  • Empty or missing transcripts
  • Low accuracy transcriptions
  • Hallucinated words in transcript
  • Slow transcription
  • Words appearing out of order
  • Transcript not matching audio

No Transcription

Check STT Connection

  1. Open Settings (gear icon)
  2. Click Test STT
  3. Should see "Success" or error message

Common Causes

CauseSolution
Speaches not runningdocker ps to verify
Wrong endpoint URLCheck STT endpoint setting
Model not downloadeduvx speaches-cli model ls
APIkey missingAdd key if required

Verify Speaches

bash
# Check if running
docker ps | grep speaches

# Should show the container

# Check models
curl http://localhost:8000/v1/models

# Test transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@test.wav" \
  -F "model=Systran/faster-distil-whisper-small.en"

Low Accuracy

Improve Microphone Quality

  • Use a quality microphone/headset
  • Position mic close to mouth
  • Reduce background noise
  • Enable noise suppression if available

Use a Better Model

Smaller models are faster but less accurate:

ModelSpeedAccuracyUse Case
faster-distil-whisper-small.enFastGoodReal-time
faster-whisper-small.enMediumBetterBetter accuracy
whisper-mediumSlowBestMaximum accuracy

Audio Quality Tips

  1. Speak clearly — Avoid mumbling
  2. Consistent distance — Keep mic at same distance
  3. Quiet environment — Reduce background noise
  4. Proper gain — Avoid clipping (red in level meter)

Hallucinated Words

Whisper sometimes "hears" words in silence.

Common Hallucinations

  • "you"
  • "thank you"
  • "thanks for watching"
  • "bye"
  • "okay"
  • "hmm"

Solution: Hallucination Filter

Intervu filters common hallucinations by default.

  1. Open Settings (gear icon)
  2. Go to Advanced Settings
  3. Find Hallucination Phrases
  4. Add any false positives you see

Format: Comma-separated list

you,thank you,thanks,bye,ok,hmm,custom phrase here

How It Works

When a transcription matches only hallucination phrases, it's discarded:

Transcript: "you"
Filtered: Yes (matches "you" in filter)

Transcript: "you are welcome"
Filtered: No (contains real words)

Slow Transcription

Check Processing Time

Each audio chunk should transcribe in1-3 seconds.

Causes:

  • Large model
  • No GPU acceleration
  • Network latency (for remote STT)
  • High system load

Solutions

  1. Use smallermodel

    bash
    uvx speaches-cli model download Systran/faster-distil-whisper-small.en
  2. Enable GPU

    • Use compose.cuda.yaml for Speaches
    • Verify GPU is detected: nvidia-smi
  3. Use local STT

    • Avoid remote APIs for real-time transcription
    • Keep latency minimal
  4. Reduce chunk duration

    • Smaller chunks = faster processing
    • Trade-off: more API calls

Out of Order Words

Cause: Network Latency

When using remote STT, packets may arrive out of order.

Solutions:

  • Use local Speaches
  • Check network stability
  • Reduce chunk duration

Cause: Debounce Timer

If you speak during the debounce period:

[0.0s] Interviewer starts speaking
[1.5s] "Tell me about..." (partial sentence)
[2.0s] Debounce triggers
[2.0s] "Tell me about your experience" (complete)

Solution: Increase LLM debounce seconds in Advanced Settings.


Speaker Attribution

Blurred Speakers

Intervu should show:

  • [Interviewer] — System audio (blue)
  • [You] — Microphone (white)

If Speakers Are Swapped

  1. Check device selection in Settings
  2. Ensure system audio device is "CABLE Output"
  3. Ensure microphone is your actual mic

If One Speaker Missing

  1. Check audio level meters
  2. Verify both devices are selected
  3. Test each device independently

Windows: Ensure "CABLE Output" is selected as system audio device.

macOS: Ensure "BlackHole 2ch" is selected as system audio device and a multi-output device is set as the system sound output. See macOS Setup.


Debugging Transcription

Enable Logging

  1. Open Settings (gear icon)
  2. Click Open Logs Folder
  3. Check app.log for transcription messages

Log Examples

Successful transcription:

[stt] Received audio chunk: 48000 bytes
[stt] Transcription: "Tell me about your experience"
[stt] Transcription took: 1.2s

Failed transcription:

[stt] Transcription failed: Connection refused
[stt] Retrying...

Hallucination filtered:

[stt] Transcription: "thank you"
[stt] Filtered (hallucination): "thank you"

Still Not Working?

Test with Different Audio

  1. Use a recorded audio file
  2. Test directly with Speaches:
    bash
    curl -X POST http://localhost:8000/v1/audio/transcriptions \
      -F "file=@your-audio.wav" \
      -F "model=Systran/faster-distil-whisper-small.en"
  3. If that works, issue is with audio capture
  4. If that fails, issue is with STT

Reset Audio Pipeline

  1. Close Intervu completely
  2. Clear any cached data
  3. Restart Speaches: docker restart speaches
  4. Start Intervu
  5. Re-select audio devices

Next Steps

Made with ❤️by Aldrick Bonaobra