LLM Connection Errors

Problems connecting to the LLM endpoint or generating answers.

Symptoms

"Failed to start generation" error
No answers being generated
Slow answer generation
Connection timeout errors
"LLM endpoint not configured" message

Connection Errors

LLM Not Running

Error:

Failed to start generation: Connection refused

For Ollama:

bash

# Check if running
curl http://localhost:11434/api/tags

# Start if not running
ollama serve

# Pull model if missing
ollama pull llama3.2:3b

For Speaches:

bash

# Check if running
docker ps | grep speaches

# Start if not running
docker compose up -d

Wrong Endpoint

Common mistakes:

Wrong	Correct
`http://localhost:11434`	`http://localhost:11434/v1/chat/completions`
`http://localhost:8000`	`http://localhost:8000/v1/audio/transcriptions`
`https://api.openai.com`	`https://api.openai.com/v1/chat/completions`

Include /v1/chat/completions

The endpoint must end with /v1/chat/completions for OpenAI compatibility.

API Key Issues

For OpenAI API:

Ensure API key is valid
Check if key has sufficient credits
Verify key has chat permissions

Error messages:

401 Unauthorized → Invalid API key
429 Too Many Requests → Rate limited
402 Payment Required → Insufficient credits

For OpenWebUI:

Generate API key in Settings → Account
Copy the key exactly (no extra spaces)
Verify the endpoint URL

Testing LLM Connection

In Intervu

Open Settings (gear icon)
Configure LLM endpoint and model
Click Test LLM
Should see "Success: [response]"

Manual Test

bash

# Test Ollama
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

# Test OpenAI API
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Slow Generation

Causes

Cause	Solution
Large model	Use smaller model
No GPU	Enable GPU acceleration
Slow network	Use local LLM
Long context	Shorten resume
Max tokens too high	Reduce in Advanced Settings

Solutions

Use smaller model:

bash

# Instead of llama3.2:latest (larger)
# Use llama3.2:3b or phi3:mini (smaller, faster)
ollama pull phi3:mini

Enable GPU for Ollama:

bash

# Check GPU is detected
nvidia-smi

# Ollama auto-detects GPU
# If not, set environment variable
OLLAMA_GPU_LAYERS=35 ollama serve

Reduce context:

Shorten your resume to key points
Condense Company / Role Context to essential details
Keep Q&A Bank entries brief
Remove outdated experience
Keep system prompt concise

Out of Memory

Error Messages

CUDA out of memory

OOM (Out Of Memory)

Solutions

Close other applications:

Free up RAM
Free up VRAM

Use smaller model:

bash

# 70B model → 3B model
ollama pull llama3.2:3b

# Or even smaller
ollama pull phi3:mini

Reduce GPU layers:

bash

# Ollama config
OLLAMA_GPU_LAYERS=20 ollama serve

Use CPU-only mode:

bash

# Slower but uses less VRAM
OLLAMA_GPU_LAYERS=0 ollama serve

Answer Quality Issues

Generic Answers

Causes:

Resume not detailed enough
System prompt too vague
Model too small

Solutions:

Expand resume with specific achievements
Fill in Q&A Bank with pre-prepared answers and preferences
Add Company / Role Context with the job description
Customize system prompt
Use larger model
Rate answers to improve future generations

Irrelevant Answers

Causes:

Wrong context in transcript
Model hallucinating
Poor question detection

Solutions:

Enable Advanced Mode for question extraction
Use larger model
Check transcript is accurate
Rate down irrelevant answers

Incomplete Answers

Causes:

Max tokens too low
Connection interrupted
Model stopping early

Solutions:

Increase Max Tokens in Advanced Settings
Check network stability
Use "regenerate" button

Streaming Issues

Answer Not Streaming

Expected: Answer appears word by word

If answer appears all at once:

Streaming may be disabled by model
Network buffering
Endpoint doesn't support streaming

For Ollama:

bash

# Verify streaming works
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Count to ten"}],
    "stream": true
  }'

Should show data streaming line by line.

Connection Drops Mid-Stream

Causes:

Network instability
Server timeout
Context too long

Solutions:

Use local LLM
Increase server timeout
Shorten resume

Advanced Mode Issues

Extractor LLM Fails

Error:

Extraction failed, falling back to simple mode

Causes:

Extractor endpoint incorrect
Extractor model not available
Insufficient context

Solutions:

Verify extractor endpoint
Check extractor model is pulled/downloaded
Use same endpoint for extraction and answering

No Questions Extracted

Causes:

Model not following format
Interviewer hasn't asked a question
Extractor model too small

Solutions:

Use larger model for extraction
Wait for complete question
Customize extractor system prompt

Testing Complete Flow

End-to-End Test

bash

# 1. Verify STT
curl http://localhost:8000/v1/models

# 2. Verify LLM
curl http://localhost:11434/api/tags

# 3. Test in Intervu Settings
# - Click "Test STT"
# - Click "Test LLM"
# - Both should succeed

# 4. Startcapture
# - Select audio devices
# - Click microphone button
# - Speak into mic
# - Verify transcript appears

# 5. Verify answer generation
# - Speak as interviewer
# - Wait for debounce
# - Answer should stream in

Still Not Working?

Check Logs

Open Settings → Open Logs Folder
Check app.log for errors

Reset Everything

bash

# Stop services
docker stop speaches
ollama stop

# Restart services
docker start speaches
ollama serve

# Restart Intervu and re-configure endpoints

Windows:

powershell

# Delete all Intervu data
rmdir /s /q %APPDATA%\intervu
# Restart Intervu and reconfigure

macOS:

bash

# Delete all Intervu data
rm -rf ~/Library/Application\ Support/intervu
# Restart Intervu and reconfigure

Contact Support

If issues persist:

Export logs from Settings
Note your configuration:
- LLM endpoint
- Model used
- System specs
Report the issue

Next Steps

LLM Endpoint — LLM configuration
Basic Mode — Usage guide

LLM Connection Errors ​

Symptoms ​

Connection Errors ​

LLM Not Running ​

Wrong Endpoint ​

API Key Issues ​

Testing LLM Connection ​

In Intervu ​

Manual Test ​

Slow Generation ​

Causes ​

Solutions ​

Out of Memory ​

Error Messages ​

Solutions ​

Answer Quality Issues ​

Generic Answers ​

Irrelevant Answers ​

Incomplete Answers ​

Streaming Issues ​

Answer Not Streaming ​

Connection Drops Mid-Stream ​

Advanced Mode Issues ​

Extractor LLM Fails ​

No Questions Extracted ​

Testing Complete Flow ​

End-to-End Test ​

Still Not Working? ​

Check Logs ​

Reset Everything ​

Contact Support ​

Next Steps ​

LLM Connection Errors

Symptoms

Connection Errors

LLM Not Running

Wrong Endpoint

API Key Issues

Testing LLM Connection

In Intervu

Manual Test

Slow Generation

Causes

Solutions

Out of Memory

Error Messages

Solutions

Answer Quality Issues

Generic Answers

Irrelevant Answers

Incomplete Answers

Streaming Issues

Answer Not Streaming

Connection Drops Mid-Stream

Advanced Mode Issues

Extractor LLM Fails

No Questions Extracted

Testing Complete Flow

End-to-End Test

Still Not Working?

Check Logs

Reset Everything

Contact Support

Next Steps