Chat Window
The dedicated chat window provides a freeform question interface separate from the auto-answer flow.
Overview
While Intervu automatically generates answers based on detected questions in the transcript, sometimes you want to ask follow-up questions or explore topics in more depth. The chat window provides a dedicated interface for this.
Key Features
- Freeform questions — Ask anything without waiting for transcript detection
- Step-by-step processing — See thinking, context fetching, and generation stages
- Vector context support — Automatically retrieves relevant context from PGVector sources
- Streaming responses — Watch answers generate in real-time
- Persistent history — Chat history saved and restored across sessions
Opening the Chat Window
Click the chat icon (speech bubble) in the title bar to open the chat window. A separate window appears where you can type questions.
How It Works
Processing Stages
When you send a message, you'll see the following stages:
| Stage | Description |
|---|---|
| Thinking | LLM is processing your message |
| Fetching context | Retrieving relevant chunks from PGVector (if enabled) |
| Generating | Streaming the answer |
Context Attachment
If PGVector is enabled for any source (Resume, Q&A Bank, Company Context):
- Your message is embedded
- Semantic search retrieves top-K chunks from each enabled source
- Retrieved context appears as "Loaded vector context" in the UI
- The LLM receives this context along with your message
Live Transcript
The chat window has access to the current session's transcript, allowing it to reference what was said during the interview.
Chat History
- Messages are saved to
userData/chat-history.json - Up to 50 messages are retained per session
- Older messages show a "not in context" indicator
- Clear chat to start fresh
Settings
Chat Max Tokens
Found in Settings → Advanced:
| Value | Behavior |
|---|---|
| 0 | Unlimited — uses LLM server's default |
| > 0 | Caps response length at specified tokens |
TIP
For longer, detailed answers, leave chatMaxTokens at 0 (default) and ensure your LLM server has adequate context window (num_ctx in Ollama).
Chat vs Auto-Answer
| Feature | Auto-Answer | Chat Window |
|---|---|---|
| Trigger | Automatic on transcript | Manual message |
| Context | Live transcript + rated answers | Live transcript |
| Vector retrieval | Yes (if enabled) | Yes (if enabled) |
| Max tokens | llmMaxTokens | chatMaxTokens |
| System prompt | Concise answers | Full detailed answers |
Best Practices
When to Use Chat
- Clarification — Ask follow-up questions about generated answers
- Deep dive — Explore technical topics in more detail
- Preparation — Practice answering before the interview
- Reference check — Query your indexed documents during the interview
Token Limits
If responses seem cut off:
- Increase
num_ctxin your Ollama model or OpenWebUI settings - Ensure
chatMaxTokensis 0 (unlimited) - Consider using a model with larger context window
Context Relevance
For best vector retrieval:
- Index comprehensive but focused content
- Use appropriate Top-K values (5-10 typically)
- Set similarity thresholds based on your embedding model
Troubleshooting
Chat Not Responding
- Verify LLM endpoint is running
- Check Settings → LLM Endpoint configuration
- Try "Test LLM" in settings
Context Not Loading
- Verify PGVector connection (Test Connection button)
- Check that content is indexed (row count > 0)
- Try lowering Min Similarity threshold
Responses Cut Off
- Increase your LLM server's context window
- Set
chatMaxTokensto 0 - Use a model with larger context capacity
Related Topics
- PGVector Retrieval — Set up semantic search
- Advanced Settings — Configure chat max tokens
- Basic Mode — Understanding auto-answer flow