Skip to content

LLM Connection Errors

Problems connecting to the LLM endpoint or generating answers.

Symptoms

  • "Failed to start generation" error
  • No answers being generated
  • Slow answer generation
  • Connection timeout errors
  • "LLM endpoint not configured" message

Connection Errors

LLM Not Running

Error:

Failed to start generation: Connection refused

For Ollama:

bash
# Check if running
curl http://localhost:11434/api/tags

# Start if not running
ollama serve

# Pull model if missing
ollama pull llama3.2:3b

For Speaches:

bash
# Check if running
docker ps | grep speaches

# Start if not running
docker compose up -d

Wrong Endpoint

Common mistakes:

WrongCorrect
http://localhost:11434http://localhost:11434/v1/chat/completions
http://localhost:8000http://localhost:8000/v1/audio/transcriptions
https://api.openai.comhttps://api.openai.com/v1/chat/completions

Include /v1/chat/completions

The endpoint must end with /v1/chat/completions for OpenAI compatibility.

API Key Issues

For OpenAI API:

  1. Ensure API key is valid
  2. Check if key has sufficient credits
  3. Verify key has chat permissions

Error messages:

401 Unauthorized → Invalid API key
429 Too Many Requests → Rate limited
402 Payment Required → Insufficient credits

For OpenWebUI:

  1. Generate API key in Settings → Account
  2. Copy the key exactly (no extra spaces)
  3. Verify the endpoint URL

Testing LLM Connection

In Intervu

  1. Open Settings (gear icon)
  2. Configure LLM endpoint and model
  3. Click Test LLM
  4. Should see "Success: [response]"

Manual Test

bash
# Test Ollama
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

# Test OpenAI API
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Slow Generation

Causes

CauseSolution
Large modelUse smaller model
No GPUEnable GPU acceleration
Slow networkUse local LLM
Long contextShorten resume
Max tokens too highReduce in Advanced Settings

Solutions

Use smaller model:

bash
# Instead of llama3.2:latest (larger)
# Use llama3.2:3b or phi3:mini (smaller, faster)
ollama pull phi3:mini

Enable GPU for Ollama:

bash
# Check GPU is detected
nvidia-smi

# Ollama auto-detects GPU
# If not, set environment variable
OLLAMA_GPU_LAYERS=35 ollama serve

Reduce context:

  • Shorten your resume to key points
  • Remove outdated experience
  • Keep system prompt concise

Out of Memory

Error Messages

CUDA out of memory
OOM (Out Of Memory)

Solutions

Close other applications:

  • Free up RAM
  • Free up VRAM

Use smaller model:

bash
# 70B model → 3B model
ollama pull llama3.2:3b

# Or even smaller
ollama pull phi3:mini

Reduce GPU layers:

bash
# Ollama config
OLLAMA_GPU_LAYERS=20 ollama serve

Use CPU-only mode:

bash
# Slower but uses less VRAM
OLLAMA_GPU_LAYERS=0 ollama serve

Answer Quality Issues

Generic Answers

Causes:

  • Resume not detailed enough
  • System prompt too vague
  • Model too small

Solutions:

  1. Expand resume with specific achievements
  2. Customize system prompt
  3. Use larger model
  4. Rateanswers to improve future generations

Irrelevant Answers

Causes:

  • Wrong context in transcript
  • Model hallucinating
  • Poor question detection

Solutions:

  1. Enable Advanced Mode for question extraction
  2. Use larger model
  3. Check transcript is accurate
  4. Rate down irrelevant answers

Incomplete Answers

Causes:

  • Max tokens too low
  • Connection interrupted
  • Model stopping early

Solutions:

  1. Increase Max Tokens in Advanced Settings
  2. Check network stability
  3. Use "regenerate" button

Streaming Issues

Answer Not Streaming

Expected: Answer appears word by word

If answer appears all at once:

  • Streaming may be disabled by model
  • Network buffering
  • Endpoint doesn't support streaming

For Ollama:

bash
# Verify streaming works
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Count to ten"}],
    "stream": true
  }'

Should show data streaming line by line.

Connection Drops Mid-Stream

Causes:

  • Network instability
  • Server timeout
  • Context too long

Solutions:

  • Use local LLM
  • Increase server timeout
  • Shorten resume

Advanced Mode Issues

Extractor LLM Fails

Error:

Extraction failed, falling back to simple mode

Causes:

  • Extractor endpoint incorrect
  • Extractor model not available
  • Insufficient context

Solutions:

  1. Verify extractor endpoint
  2. Check extractor model is pulled/downloaded
  3. Use same endpoint for extraction and answering

No Questions Extracted

Causes:

  • Model not following format
  • Interviewer hasn't asked a question
  • Extractor model too small

Solutions:

  1. Use larger model for extraction
  2. Wait for complete question
  3. Customize extractor system prompt

Testing Complete Flow

End-to-End Test

bash
# 1. Verify STT
curl http://localhost:8000/v1/models

# 2. Verify LLM
curl http://localhost:11434/api/tags

# 3. Test in Intervu Settings
# - Click "Test STT"
# - Click "Test LLM"
# - Both should succeed

# 4. Startcapture
# - Select audio devices
# - Click microphone button
# - Speak into mic
# - Verify transcript appears

# 5. Verify answer generation
# - Speak as interviewer
# - Wait for debounce
# - Answer should stream in

Still Not Working?

Check Logs

  1. Open Settings → Open Logs Folder
  2. Check app.log for errors

Reset Everything

bash
# Stop services
docker stop speaches
ollama stop

# Restart services
docker start speaches
ollama serve

# Restart Intervu and re-configure endpoints

Windows:

powershell
# Delete all Intervu data
rmdir /s /q %APPDATA%\intervu
# Restart Intervu and reconfigure

macOS:

bash
# Delete all Intervu data
rm -rf ~/Library/Application\ Support/intervu
# Restart Intervu and reconfigure

Contact Support

If issues persist:

  1. Export logs from Settings
  2. Note your configuration:
    • LLM endpoint
    • Model used
    • System specs
  3. Report the issue

Next Steps

Made with ❤️by Aldrick Bonaobra