Skip to content

Chat Window

The dedicated chat window provides a freeform question interface separate from the auto-answer flow.

Overview

While Intervu automatically generates answers based on detected questions in the transcript, sometimes you want to ask follow-up questions or explore topics in more depth. The chat window provides a dedicated interface for this.

Key Features

  • Freeform questions — Ask anything without waiting for transcript detection
  • Step-by-step processing — See thinking, context fetching, and generation stages
  • Vector context support — Automatically retrieves relevant context from PGVector sources
  • Streaming responses — Watch answers generate in real-time
  • Persistent history — Chat history saved and restored across sessions

Opening the Chat Window

Click the chat icon (speech bubble) in the title bar to open the chat window. A separate window appears where you can type questions.


How It Works

Processing Stages

When you send a message, you'll see the following stages:

StageDescription
ThinkingLLM is processing your message
Fetching contextRetrieving relevant chunks from PGVector (if enabled)
GeneratingStreaming the answer

Context Attachment

If PGVector is enabled for any source (Resume, Q&A Bank, Company Context):

  1. Your message is embedded
  2. Semantic search retrieves top-K chunks from each enabled source
  3. Retrieved context appears as "Loaded vector context" in the UI
  4. The LLM receives this context along with your message

Live Transcript

The chat window has access to the current session's transcript, allowing it to reference what was said during the interview.


Chat History

  • Messages are saved to userData/chat-history.json
  • Up to 50 messages are retained per session
  • Older messages show a "not in context" indicator
  • Clear chat to start fresh

Settings

Chat Max Tokens

Found in Settings → Advanced:

ValueBehavior
0Unlimited — uses LLM server's default
> 0Caps response length at specified tokens

TIP

For longer, detailed answers, leave chatMaxTokens at 0 (default) and ensure your LLM server has adequate context window (num_ctx in Ollama).

Chat vs Auto-Answer

FeatureAuto-AnswerChat Window
TriggerAutomatic on transcriptManual message
ContextLive transcript + rated answersLive transcript
Vector retrievalYes (if enabled)Yes (if enabled)
Max tokensllmMaxTokenschatMaxTokens
System promptConcise answersFull detailed answers

Best Practices

When to Use Chat

  • Clarification — Ask follow-up questions about generated answers
  • Deep dive — Explore technical topics in more detail
  • Preparation — Practice answering before the interview
  • Reference check — Query your indexed documents during the interview

Token Limits

If responses seem cut off:

  1. Increase num_ctx in your Ollama model or OpenWebUI settings
  2. Ensure chatMaxTokens is 0 (unlimited)
  3. Consider using a model with larger context window

Context Relevance

For best vector retrieval:

  • Index comprehensive but focused content
  • Use appropriate Top-K values (5-10 typically)
  • Set similarity thresholds based on your embedding model

Troubleshooting

Chat Not Responding

  • Verify LLM endpoint is running
  • Check Settings → LLM Endpoint configuration
  • Try "Test LLM" in settings

Context Not Loading

  • Verify PGVector connection (Test Connection button)
  • Check that content is indexed (row count > 0)
  • Try lowering Min Similarity threshold

Responses Cut Off

  • Increase your LLM server's context window
  • Set chatMaxTokens to 0
  • Use a model with larger context capacity

Made with ❤️by Aldrick Bonaobra