LLM Setup
Intervu requires an OpenAI-compatible LLM endpoint to generate suggested answers. Choose from several options:
Options Overview
| Option | Type | Cost | Privacy | Speed |
|---|---|---|---|---|
| Ollama | Local | Free | ✅ Best | Depends on hardware |
| OpenWebUI | Self-hosted | Free | ✅ Best | Depends on hardware |
| OpenAI API | Cloud | Paid | ⚠️ Data sent to OpenAI | Fast |
Option 1: Ollama (Recommended)
Ollama runs LLMs locally on your machine — free and private.
Installation
- Download from ollama.ai
- Install following the prompts for your OS
- Ollama runs automatically on port 11434
Pull a Model
bash
# Recommended: Fast and capable
ollama pull llama3.2:3b
# Alternative: Larger, more capable
ollama pull llama3.2:latest
# Alternative: Mistral
ollama pull mistral:7bVerify
bash
# Test Ollama is running
curl http://localhost:11434/api/tags
# Test generation
ollama run llama3.2:3b "Say hello in one word."Configuration in Intervu
- Open Settings (gear icon)
- LLM Endpoint:
http://localhost:11434/v1/chat/completions - LLM Model:
llama3.2:3b(or whichever model you pulled) - LLM API Key: Leave empty (not required for Ollama)
- Click Test LLM to verify
Model Recommendations
llama3.2:3b— Best balance of speed and quality for real-time answersphi3:mini— Faster, lower qualityllama3.2:latest— Better answers, slower on CPU
Option 2: OpenWebUI
If you have an existing OpenWebUI instance running.
Configuration in Intervu
- Open Settings (gear icon)
- LLM Endpoint:
https://your-openwebui-instance/api/chat/completions - LLM Model: Your configured model name
- LLM API Key: Your OpenWebUI API key (if required)
- Click Test LLM to verify
Finding Your API Key
In OpenWebUI:
- Go to Settings → Account
- Generate or copy your API key
Option 3: OpenAI API
Use OpenAI's GPT models (requires API key and costs money).
Get API Key
- Go to platform.openai.com
- Create an API key
Configuration in Intervu
- Open Settings (gear icon)
- LLM Endpoint:
https://api.openai.com/v1/chat/completions - LLM Model:
gpt-4o-mini(recommended) orgpt-4o - LLM API Key: Your OpenAI API key
- Click Test LLM to verify
Cost Consideration
gpt-4o-mini: ~$0.15/1M input tokens, ~$0.60/1M output tokensgpt-4o: ~$2.50/1M input tokens, ~$10/1M output tokens
Real-time transcription generates ~1000-3000 tokens per session.
Custom Endpoints
Intervu supports any OpenAI-compatible endpoint. If you have a custom setup:
- Open Settings (gear icon)
- Enter your custom endpoint URL (must end with
/v1/chat/completions) - Enter your model name
- Enter API key if required
- Test the connection
Requirements
Yourendpoint must support:
- POST request to
/v1/chat/completions - Request body:
{ model, messages[], stream: true } - Response: Server-Sent Events (SSE) stream
Advanced: LLM Settings
In the Advanced section of Settings:
| Setting | Description | Default |
|---|---|---|
| Max Tokens | Maximum response length | 0(unlimited) |
| Thinking Mode | Enable extended reasoning (Claude models) | Off |
Troubleshooting
Connection Refused
bash
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not, start it:
ollama serveSlow Responses
- Use a smaller model (
llama3.2:3b,phi3:mini) - Ensure GPU is being used (check
nvidia-smi) - Reduce max tokens in Advanced settings
Out of Memory
- Close other applications
- Use a smaller model
- For Ollama, modify system memory:bash
OLLAMA_NUM_GPU=1 OLLAMA_GPU_LAYERS=20 ollama serve