Chat Window

The dedicated chat window provides a freeform question interface separate from the auto-answer flow.

Overview

While Intervu automatically generates answers based on detected questions in the transcript, sometimes you want to ask follow-up questions or explore topics in more depth. The chat window provides a dedicated interface for this.

Key Features

Freeform questions — Ask anything without waiting for transcript detection
Step-by-step processing — See thinking, context fetching, and generation stages
Vector context support — Automatically retrieves relevant context from PGVector sources
Streaming responses — Watch answers generate in real-time
Persistent history — Chat history saved and restored across sessions

Opening the Chat Window

Click the chat icon (speech bubble) in the title bar to open the chat window. A separate window appears where you can type questions.

How It Works

Processing Stages

When you send a message, you'll see the following stages:

Stage	Description
Thinking	LLM is processing your message
Fetching context	Retrieving relevant chunks from PGVector (if enabled)
Generating	Streaming the answer

Context Attachment

If PGVector is enabled for any source (Resume, Q&A Bank, Company Context):

Your message is embedded
Semantic search retrieves top-K chunks from each enabled source
Retrieved context appears as "Loaded vector context" in the UI
The LLM receives this context along with your message

Live Transcript

The chat window has access to the current session's transcript, allowing it to reference what was said during the interview.

Chat History

Messages are saved to userData/chat-history.json
Up to 50 messages are retained per session
Older messages show a "not in context" indicator
Clear chat to start fresh

Settings

Chat Max Tokens

Found in Settings → Advanced:

Value	Behavior
0	Unlimited — uses LLM server's default
> 0	Caps response length at specified tokens

TIP

For longer, detailed answers, leave chatMaxTokens at 0 (default) and ensure your LLM server has adequate context window (num_ctx in Ollama).

Chat vs Auto-Answer

Feature	Auto-Answer	Chat Window
Trigger	Automatic on transcript	Manual message
Context	Live transcript + rated answers	Live transcript
Vector retrieval	Yes (if enabled)	Yes (if enabled)
Max tokens	`llmMaxTokens`	`chatMaxTokens`
System prompt	Concise answers	Full detailed answers

Best Practices

When to Use Chat

Clarification — Ask follow-up questions about generated answers
Deep dive — Explore technical topics in more detail
Preparation — Practice answering before the interview
Reference check — Query your indexed documents during the interview

Token Limits

If responses seem cut off:

Increase num_ctx in your Ollama model or OpenWebUI settings
Ensure chatMaxTokens is 0 (unlimited)
Consider using a model with larger context window

Context Relevance

For best vector retrieval:

Index comprehensive but focused content
Use appropriate Top-K values (5-10 typically)
Set similarity thresholds based on your embedding model

Troubleshooting

Chat Not Responding

Verify LLM endpoint is running
Check Settings → LLM Endpoint configuration
Try "Test LLM" in settings

Context Not Loading

Verify PGVector connection (Test Connection button)
Check that content is indexed (row count > 0)
Try lowering Min Similarity threshold

Responses Cut Off

Increase your LLM server's context window
Set chatMaxTokens to 0
Use a model with larger context capacity

PGVector Retrieval — Set up semantic search
Advanced Settings — Configure chat max tokens
Basic Mode — Understanding auto-answer flow

Chat Window ​

Overview ​

Key Features ​

Opening the Chat Window ​

How It Works ​

Processing Stages ​

Context Attachment ​

Live Transcript ​

Chat History ​

Settings ​

Chat Max Tokens ​

Chat vs Auto-Answer ​

Best Practices ​

When to Use Chat ​

Token Limits ​

Context Relevance ​

Troubleshooting ​

Chat Not Responding ​

Context Not Loading ​

Responses Cut Off ​

Related Topics ​