Skip to content

LLM Setup

Intervu requires an OpenAI-compatible LLM endpoint to generate suggested answers. Choose from several options:

Options Overview

OptionTypeCostPrivacySpeed
OllamaLocalFree✅ BestDepends on hardware
OpenWebUISelf-hostedFree✅ BestDepends on hardware
OpenAI APICloudPaid⚠️ Data sent to OpenAIFast

Ollama runs LLMs locally on your machine — free and private.

Installation

  1. Download from ollama.ai
  2. Install following the prompts for your OS
  3. Ollama runs automatically on port 11434

Pull a Model

bash
# Recommended: Fast and capable
ollama pull llama3.2:3b

# Alternative: Larger, more capable
ollama pull llama3.2:latest

# Alternative: Mistral
ollama pull mistral:7b

Verify

bash
# Test Ollama is running
curl http://localhost:11434/api/tags

# Test generation
ollama run llama3.2:3b "Say hello in one word."

Configuration in Intervu

  1. Open Settings (gear icon)
  2. LLM Endpoint: http://localhost:11434/v1/chat/completions
  3. LLM Model: llama3.2:3b (or whichever model you pulled)
  4. LLM API Key: Leave empty (not required for Ollama)
  5. Click Test LLM to verify

Model Recommendations

  • llama3.2:3b — Best balance of speed and quality for real-time answers
  • phi3:mini — Faster, lower quality
  • llama3.2:latest — Better answers, slower on CPU

Option 2: OpenWebUI

If you have an existing OpenWebUI instance running.

Configuration in Intervu

  1. Open Settings (gear icon)
  2. LLM Endpoint: https://your-openwebui-instance/api/chat/completions
  3. LLM Model: Your configured model name
  4. LLM API Key: Your OpenWebUI API key (if required)
  5. Click Test LLM to verify

Finding Your API Key

In OpenWebUI:

  1. Go to SettingsAccount
  2. Generate or copy your API key

Option 3: OpenAI API

Use OpenAI's GPT models (requires API key and costs money).

Get API Key

  1. Go to platform.openai.com
  2. Create an API key

Configuration in Intervu

  1. Open Settings (gear icon)
  2. LLM Endpoint: https://api.openai.com/v1/chat/completions
  3. LLM Model: gpt-4o-mini (recommended) or gpt-4o
  4. LLM API Key: Your OpenAI API key
  5. Click Test LLM to verify

Cost Consideration

  • gpt-4o-mini: ~$0.15/1M input tokens, ~$0.60/1M output tokens
  • gpt-4o: ~$2.50/1M input tokens, ~$10/1M output tokens

Real-time transcription generates ~1000-3000 tokens per session.


Custom Endpoints

Intervu supports any OpenAI-compatible endpoint. If you have a custom setup:

  1. Open Settings (gear icon)
  2. Enter your custom endpoint URL (must end with /v1/chat/completions)
  3. Enter your model name
  4. Enter API key if required
  5. Test the connection

Requirements

Yourendpoint must support:

  • POST request to /v1/chat/completions
  • Request body: { model, messages[], stream: true }
  • Response: Server-Sent Events (SSE) stream

Advanced: LLM Settings

In the Advanced section of Settings:

SettingDescriptionDefault
Max TokensMaximum response length0(unlimited)
Thinking ModeEnable extended reasoning (Claude models)Off

Troubleshooting

Connection Refused

bash
# Check if Ollama is running
curl http://localhost:11434/api/tags

# If not, start it:
ollama serve

Slow Responses

  • Use a smaller model (llama3.2:3b, phi3:mini)
  • Ensure GPU is being used (check nvidia-smi)
  • Reduce max tokens in Advanced settings

Out of Memory

  • Close other applications
  • Use a smaller model
  • For Ollama, modify system memory:
    bash
    OLLAMA_NUM_GPU=1 OLLAMA_GPU_LAYERS=20 ollama serve

Next Steps

Made with ❤️by Aldrick Bonaobra