LLM Setup

Intervu requires an OpenAI-compatible LLM endpoint to generate suggested answers. Choose from several options:

Options Overview

Option	Type	Cost	Privacy	Speed
Ollama	Local	Free	✅ Best	Depends on hardware
OpenWebUI	Self-hosted	Free	✅ Best	Depends on hardware
OpenAI API	Cloud	Paid	⚠️ Data sent to OpenAI	Fast

Option 1: Ollama (Recommended)

Ollama runs LLMs locally on your machine — free and private.

Installation

Download from ollama.ai
Install following the prompts for your OS
Ollama runs automatically on port 11434

Pull a Model

bash

# Recommended: Fast and capable
ollama pull llama3.2:3b

# Alternative: Larger, more capable
ollama pull llama3.2:latest

# Alternative: Mistral
ollama pull mistral:7b

Verify

bash

# Test Ollama is running
curl http://localhost:11434/api/tags

# Test generation
ollama run llama3.2:3b "Say hello in one word."

Configuration in Intervu

Open Settings (gear icon)
LLM Endpoint: http://localhost:11434/v1/chat/completions
LLM Model: llama3.2:3b (or whichever model you pulled)
LLM API Key: Leave empty (not required for Ollama)
Click Test LLM to verify

Model Recommendations

llama3.2:3b — Best balance of speed and quality for real-time answers
phi3:mini — Faster, lower quality
llama3.2:latest — Better answers, slower on CPU

Option 2: OpenWebUI

If you have an existing OpenWebUI instance running.

Configuration in Intervu

Open Settings (gear icon)
LLM Endpoint: https://your-openwebui-instance/api/chat/completions
LLM Model: Your configured model name
LLM API Key: Your OpenWebUI API key (if required)
Click Test LLM to verify

Finding Your API Key

In OpenWebUI:

Go to Settings → Account
Generate or copy your API key

Option 3: OpenAI API

Use OpenAI's GPT models (requires API key and costs money).

Get API Key

Go to platform.openai.com
Create an API key

Configuration in Intervu

Open Settings (gear icon)
LLM Endpoint: https://api.openai.com/v1/chat/completions
LLM Model: gpt-4o-mini (recommended) or gpt-4o
LLM API Key: Your OpenAI API key
Click Test LLM to verify

Cost Consideration

gpt-4o-mini: ~$0.15/1M input tokens, ~$0.60/1M output tokens
gpt-4o: ~$2.50/1M input tokens, ~$10/1M output tokens

Real-time transcription generates ~1000-3000 tokens per session.

Custom Endpoints

Intervu supports any OpenAI-compatible endpoint. If you have a custom setup:

Open Settings (gear icon)
Enter your custom endpoint URL (must end with /v1/chat/completions)
Enter your model name
Enter API key if required
Test the connection

Requirements

Yourendpoint must support:

POST request to /v1/chat/completions
Request body: { model, messages[], stream: true }
Response: Server-Sent Events (SSE) stream

Advanced: LLM Settings

In the Advanced section of Settings:

Setting	Description	Default
Max Tokens	Maximum response length	0(unlimited)
Thinking Mode	Enable extended reasoning (Claude models)	Off

Troubleshooting

Connection Refused

bash

# Check if Ollama is running
curl http://localhost:11434/api/tags

# If not, start it:
ollama serve

Slow Responses

Use a smaller model (llama3.2:3b, phi3:mini)
Ensure GPU is being used (check nvidia-smi)
Reduce max tokens in Advanced settings

Out of Memory

Close other applications
Use a smaller model

For Ollama, modify system memory:

bash

OLLAMA_NUM_GPU=1 OLLAMA_GPU_LAYERS=20 ollama serve

LLM Setup ​

Options Overview ​

Option 1: Ollama (Recommended) ​

Installation ​

Pull a Model ​

Verify ​

Configuration in Intervu ​

Option 2: OpenWebUI ​

Configuration in Intervu ​

Finding Your API Key ​

Option 3: OpenAI API ​

Get API Key ​

Configuration in Intervu ​

Custom Endpoints ​

Requirements ​

Advanced: LLM Settings ​

Troubleshooting ​

Connection Refused ​

Slow Responses ​

Out of Memory ​

Next Steps ​

LLM Setup

Options Overview

Option 1: Ollama (Recommended)

Installation

Pull a Model

Verify

Configuration in Intervu

Option 2: OpenWebUI

Configuration in Intervu

Finding Your API Key

Option 3: OpenAI API

Get API Key

Configuration in Intervu

Custom Endpoints

Requirements

Advanced: LLM Settings

Troubleshooting

Connection Refused

Slow Responses

Out of Memory

Next Steps