Skip to content

Advanced Mode

Use a second LLM to extract complete questions before generating answers.

Overview

Advanced Mode adds an intelligent question extraction step:

ModeProcess
BasicInterviewer speaks → Generate answer immediately
AdvancedInterviewer speaks → Extract questions → Generate answers for complete questions only

Why Use Advanced Mode?

  • Reduces noise: Doesn't generate answers for casual conversation
  • Better context: Waits for complete questions
  • More relevant: Filters out statements that aren't questions

Enabling Advanced Mode

  1. Open Settings (gear icon)
  2. Scroll to Advanced Settings
  3. Toggle Advanced Mode on
  4. Configure the extractor LLM (can use same endpoint as main LLM)

Configuration

SettingDescriptionDefault
Advanced ModeEnable/disableOff
Extractor EndpointLLM for question extractionSame as main LLM
Extractor ModelModel to useSame as main LLM
Extractor API KeyAPI key if requiredEmpty

How It Works

Basic Mode Flow

Interviewer speaks

Transcript captured

Wait for debounce (2s)

Generate answer

Advanced Mode Flow

Interviewer speaks

Transcript captured

Wait for debounce (2s)

Extractor LLM analyzes transcript

Identify complete questions

Generate answer for each question

Question Extraction

The extractor LLM:

  1. Receives recent transcript entries
  2. Identifies interviewer's speech
  3. Detects complete questions
  4. Returns question text

Example extraction:

Transcript:
[Interviewer]: So I see you worked at TechCorp... tell me about your role there. What were your main responsibilities?

Extracted Question:
"What were your main responsibilities at TechCorp?"

Use Cases

When to Use Advanced Mode

Good for:

  • Verbose interviewers who ramble
  • Conversational interviews with back-and-forth
  • When you only want answers for actual questions
  • Reducing noise in the answer panel

Not needed for:

  • Direct, concise interviewers
  • Technical interviews with short questions
  • Fast-paced interviews

Queue Mode Integration

Advanced Mode works with Queue Mode for multiple questions:

Queue Mode Off (Default)

  • Only the most recent question is answered
  • Previous questions are cancelled

Queue Mode On

  • All extracted questions are queued
  • Each question gets an answer in order
  • Cards show queue position

Extractor LLM Configuration

Using Same LLM

You can use the same endpoint for both extraction and answering:

Extractor Endpoint: http://localhost:11434/v1/chat/completions
Extractor Model: llama3.2:3b

This is the default and simplest setup.

Using Different LLM

For better extraction, you may use a different model:

Main LLM: llama3.2:3b (for answers)
Extractor LLM: phi3:mini (for question detection)

Why different models?

  • Question extraction needs speed, not creativity
  • Smaller models work well for detection
  • Frees up resources for main answer generation

Troubleshooting

No Questions Extracted

Causes:

  • Interviewer hasn't asked a complete question yet
  • Extractor model isn't detecting questions
  • Transcript is empty

Solutions:

  • Wait for interviewer to finish speaking
  • Check extractor endpoint is working
  • Test with a clear question: "What is your experience with React?"

Too Many Questions Extracted

Causes:

  • Every statement is being treated as a question
  • Model is too aggressive in detection

Solutions:

  • Add explicit prompts in extractor system prompt
  • Use a different model
  • Check if hallucination phrases need updating

Slow Performance

Causes:

  • Two LLM calls instead of one
  • Large extractor model

Solutions:

  • Use a smaller model for extraction (phi3:mini)
  • Ensure GPU acceleration
  • Check network latency

System Prompt for Extractor

The default extractor prompt:

You are a question extraction assistant for a live interview transcript.
The transcript is captured in real-time using short audio chunks, so
interviewer questions may be split across multiple consecutive entries.
Your job is to identify complete, unanswered interview questions.

You can customize this in Settings → Advanced → Extractor System Prompt.

Customization Examples

Strict Question Detection

Only extract questions that:
1. End with a question mark
2. Start with a question word
3. Are complete sentences
Ignore statements, commands, and incomplete thoughts.

Include Follow-ups

Identify both direct questions and implied follow-up questions.
For example, "That's interesting..." implies "Tell me more."

Performance Tips

Optimize Extractor Model

bash
# Use a fast, small model
ollama pull phi3:mini

# Or use the same model if resources allow
# llama3.2:3b works fine for both

Reduce Redundant Calls

  • Increase debounce seconds to wait for complete thoughts
  • Use Queue mode to batch question extraction

Next Steps

Made with ❤️by Aldrick Bonaobra