Papers
Topics
Authors
Recent
2000 character limit reached

Proactive Question Generation

Updated 9 December 2025
  • Proactive Question Generation is a method that actively expands dialogue by adding targeted follow-up questions and additional information to enrich user interactions.
  • It employs multi-step reasoning techniques such as chain-of-thought prompting, planning-based strategies, and reinforcement learning to drive multi-turn engagement.
  • Evaluation relies on semantic similarity, user simulation, and classification metrics to measure effectiveness in uncovering latent user intent and clarifying ambiguous queries.

Proactive Question Generation (PQG) denotes the class of methodologies, models, and evaluation frameworks for constructing dialogue agents and information-seeking systems that do not merely react to user queries but actively engage users by introducing new related information, asking targeted follow-up questions, and strategically steering conversations toward more comprehensive or goal-oriented outcomes. Unlike reactive paradigms, which terminate dialogue upon answering the stated query, PQG systems extend interaction through deliberate information expansion and user engagement, supporting richer, multi-turn exploration, clarification, and personalized discovery.

1. Formal Definitions and Paradigms of Proactivity

The definitive recent formulation of proactivity in Information-Seeking Dialogue (ISD) is attributed to Lee et al. (Lee et al., 20 Oct 2024). Here, a proactive response RR to a user query QQ comprises:

  • An Answer: a direct reply addressing QQ.
  • A Proactive Element: new information related to QQ, either as a Follow-up Question (FQ) or Additional Information (AI). An FQ asks if the user wants a specific related fact; an AI offers that fact directly.

Formally:

  • RR is labeled proactive if both elements are present.
  • FQ: “Would you like to learn about her other roles in the MCU?”
  • AI: “Did you know she also appears in Guardians of the Galaxy Vol. 2?”

This architecture evaluates each response for conversational engagement through the explicit introduction of related knowledge, setting PQG apart from reactive systems that end the session after minimal answer delivery.

In conclusion-driven conversational question generation (CCQG), as instantiated by PCQPR (Guo et al., 2 Oct 2024), proactiveness is defined by the agent's planning over multiple conversational turns to drive the dialogue toward a predefined outcome T=(qn,ann)T = (q_n, an_n), optimizing not just local coherence but global trajectory.

2. Model Architectures and Algorithmic Strategies

Chain-of-Thought and Task Decomposition

PQG is frequently realized through in-context learning with Chain-of-Thought (CoT) prompting (e.g., 3-step CoT, 3-in-1 CoT (Lee et al., 20 Oct 2024)):

  • P1P_1: conversational answer generation.
  • P2P_2: extraction of a specific related fact not present in P1P_1.
  • P3P_3: proactive element generation as either FQ or AI.

Instruction-tuning methods (e.g., QLoRA over Falcon-40B-Instruct) encode PQG objectives via composite prompt templates that enforce multi-step reasoning, yielding substantial improvements (up to +90% gains) over direct zero-shot approaches.

Planning-Based and RL Frameworks

PCQPR employs an MCTS-like planner combined with LLM rollouts and “comparable reflection” (Guo et al., 2 Oct 2024):

  • State: C,H,Spartial\langle C, H, S_{\text{partial}} \rangle; Action: next QA pair.
  • Trees simulate future turns, backpropagate rewards, and iterate question plans conditioned on feedback, with long-term optimization toward terminal success.

Reinforcement learning, especially as formulated in proactive information gathering (Huang et al., 28 Jul 2025), rewards clarifying questions that reveal latent user constraints or requirements, optimizing LLMs (e.g., Qwen-2.5-7B) with explicit evidence-sentence rewards:

  • Reward rt(qt)=1r_t(q_t) = 1 if the question elicits hidden user intent; else $0$.
  • PPO is used for policy optimization, aligning question generation with user-specific latent knowledge.

Graph-Guided and Knowledge-Based Methods

Graph-structured conditioning (AMR graphs, action-flow graphs (Pham et al., 24 Jan 2024)) guarantees exhaustive semantic coverage for PQG in procedural and task-specific contexts:

  • Each graph node (concept/action/ingredient) is mapped to one or more QA pairs.
  • Resulting datasets demonstrate fine-grained coverage and support the training of compact QA models exceeding large LMs on BLEURT and coverage metrics.

Knowledge-based frameworks (e.g., KBQG for conversational recommendation (Ren et al., 2021)) exploit knowledge graph (KG) user–item–relation embeddings, mining the most informative relations through attention mechanisms, and prompting users with personalized, slot-filling clarification questions.

3. Automatic Evaluation Metrics for Proactiveness

PQG is evaluated via diverse metrics reflecting both the direct success of proactive engagement and its correlation with human judgment:

  • Semantic Similarity-based Metrics (Lee et al., 20 Oct 2024): combine BERTScore between QQ and proactive element RR, token-internal BERTScore, and weightings via α\alpha.
    • SFQ=αBS(Q,R)+(1α)BSˉ(R)S_{FQ} = \alpha BS(Q,R) + (1-\alpha)\bar{BS}(R)
    • SAI=αBS(Q,R)+(1α)(1BSˉ(R))S_{AI} = \alpha BS(Q,R) + (1-\alpha)(1-\bar{BS}(R))
    • High point-biserial correlations with human annotation ($0.46$–$0.58$).
  • User Simulation-based Metrics (Lee et al., 20 Oct 2024): simulate user replies to RR via LLM, compute sentiment via RoBERTa, aggregate positive scores.
  • Classification-based Methods (Lee et al., 20 Oct 2024): fine-tuned DeBERTa-V3-Large classifiers label valid/invalid FQ/AI elements, yielding logit scores.
  • Task-Specific Proactivity Metrics (Deng et al., 2022): e.g., Clarification Need Prediction F1, ROUGE-2 for clarification questions, and “Proactivity Score” for joint success on ambiguous input detection and clarification.
  • Planning Outcomes (Guo et al., 2 Oct 2024): Success Rate (terminal QA matches TT), semantic similarity scores (SimCSE), and coherence metrics (Conv-last1/2).

Tables below summarize metric categories and key findings:

Metric Domain Maximum Correlation
Semantic similarity (BS) ISD, KG dialog FQ: 0.46, AI: 0.58
User simulation ISD FQ: 0.26, AI: 0.33
Classification-based ISD FQ: 0.19, AI: 0.49
Metric PCQPR (CoQA) SG-CQG COT TOT GPT-4-Turbo
Success Rate (%) 35.00 15.40 19.20 23.20 12.80

4. Datasets and Domain Coverage

PQG research comprises multiple specialized corpus constructions:

  • Proactive Dialogue Dataset (Lee et al., 20 Oct 2024): 2,000 single-turn ISD conversations via NQQA, balanced between FQ and AI, each annotated through crowdsourcing.
  • PACIFIC (Deng et al., 2022): hybrid tabular/text domain in finance, with explicit ambiguity induction and annotation for need-clarify detection and clarification question generation.
  • AmbigNQ/PAQA (Erbacher et al., 26 Feb 2024): large-scale open-retrieval QA with ambiguous questions mapped to gold clarifying questions, supporting proactive handling of ambiguous search.
  • KGConv (Faille et al., 11 Apr 2024): knowledge-driven dialogs with fact selection steps for explainable PQG, enabling fact–question mappings and reference-less evaluation.
  • Procedural QA Graphs (Pham et al., 24 Jan 2024): exhaustive QA dataset for procedural text, via AMR and flow graph templating.

5. Empirical Results and Comparative Analysis

PQG architectures yield marked improvements over baselines on multiple fronts:

  • ISD (Falcon-40B-Instruct, (Lee et al., 20 Oct 2024)): 3-step CoT and 3-in-1 CoT zero-shot prompt designs increase FQ classification accuracy from 0.73 to 0.88.
    • Few-shot CoT prompts produce up to +90% gains in zero-shot settings.
    • Supervised fine-tuning matches or exceeds 3-shot prompting (FQ classification: 0.94).
  • CCQG (GPT-4-Turbo, (Guo et al., 2 Oct 2024)): PCQPR increases Success Rate to 35%, +11.8 pp over strong planning baselines.
  • Procedural QA (Pham et al., 24 Jan 2024): BLEURT scores on graph-generated data match or exceed GPT3/ChatGPT, demonstrating the importance of semantic coverage.
  • Finance QA (Deng et al., 2022): UniPCQA achieves >87% ROUGE-2 on clarifier generation, >91% F1 in ambiguity detection.
  • Open-Retrieval QA (Erbacher et al., 26 Feb 2024): Adding gold evidence passages boosts ambiguity detection accuracy from 0.527 (Q-only) to 0.873.
  • Mental Health Diagnostics (Roy et al., 2023): ProKnow-algo reduces unsafe matches by 89%, improves explainability scores by +0.5 vs. baseline, and achieves an 82% composite gain.

6. Practical Extensions and Future Directions

PQG is extensible across domains and architectures:

  • Finer-grained proactive elements: e.g., comparative or conditional follow-ups (Lee et al., 20 Oct 2024).
  • Reward modeling and RLHF: direct optimization of conversational proactivity.
  • Multi-turn annotation corpora: moving beyond single-turn protocols.
  • Factuality checks: mitigating hallucination risk while maintaining engagement.
  • Conversation-level metrics: capturing cumulative proactivity across sessions.

Algorithmic and data-centric future work is focused on scaling to longer conversations, integrating human reflection feedback, domain adaptation (medical/finance/legal), dynamic knowledge-graph augmentation, and learning-to-rank candidate follow-ups.

7. Significance and Open Challenges

PQG fundamentally transforms the capabilities of dialogue agents: from passive answerers to strategic collaborators capable of uncovering user intent, reducing ambiguity, improving knowledge coverage, and supporting discovery-focused human–AI interaction. Critical challenges persist in cost-effective depth planning, automatic ambiguity detection in open domains, robust metric development for multi-turn proactivity, and the scalable curation of annotated datasets for domain-specific needs.

Proactive methodologies, from CoT decomposition and reward-driven RL to graph-driven semantic coverage and knowledge-based slot-filling, collectively define the technical landscape of PQG, guiding ongoing research and practical deployment across information-seeking, recommendation, search, and creative domains.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Proactive Question Generation.