Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STaR-GATE: Teaching Language Models to Ask Clarifying Questions (2403.19154v3)

Published 28 Mar 2024 in cs.CL and cs.AI

Abstract: When prompting LLMs to complete a task, users often leave important aspects unsaid. While asking questions could resolve this ambiguity (GATE; Li et al., 2023), models often struggle to ask good questions. We explore a LLM's ability to self-improve (STaR; Zelikman et al., 2022) by rewarding the model for generating useful questions-a simple method we dub STaR-GATE. We generate a synthetic dataset of 25,500 unique persona-task prompts to simulate conversations between a pretrained LLM-the Questioner-and a Roleplayer whose preferences are unknown to the Questioner. By asking questions, the Questioner elicits preferences from the Roleplayer. The Questioner is iteratively finetuned on questions that increase the probability of high-quality responses to the task, which are generated by an Oracle with access to the Roleplayer's latent preferences. After two iterations of self-improvement, the Questioner asks better questions, allowing it to generate responses that are preferred over responses from the initial model on 72% of tasks. Our results indicate that teaching a LLM to ask better questions leads to better personalized responses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Thinking fast and slow with deep learning and tree search. Advances in neural information processing systems, 30, 2017.
  2. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877--1901, 2020.
  4. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  5. Kto: Model alignment as prospect theoretic optimization. arXiv preprint arXiv:2402.01306, 2024.
  6. Probabilistic model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
  7. Social contract ai: Aligning ai assistants with implicit group norms. arXiv preprint arXiv:2310.17769, 2023.
  8. Strategic reasoning with language models. arXiv preprint arXiv:2305.19165, 2023.
  9. Bayesian preference elicitation with language models. arXiv preprint arXiv:2403.05534, 2024.
  10. Training chain-of-thought via latent-variable inference. Advances in Neural Information Processing Systems, 36, 2024.
  11. Zero-shot goal-directed dialogue via rl on imagined conversations. arXiv preprint arXiv:2311.05584, 2023.
  12. V-star: Training verifiers for self-taught reasoners. arXiv preprint arXiv:2402.06457, 2024.
  13. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  14. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  15. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274, 2023.
  16. Eliciting human preferences with language models. arXiv preprint arXiv:2310.11589, 2023.
  17. Decision-oriented dialogue for human-ai collaboration. arXiv preprint arXiv:2305.20076, 2023.
  18. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.
  19. Prodigy: a profile-based dialogue generation dataset, 2023.
  20. OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, 2023.
  21. Active preference inference using language models and probabilistic reasoning. arXiv preprint arXiv:2312.12009, 2023.
  22. Certified deductive reasoning with language models. 2023.
  23. Autoact: Automatic agent learning from scratch via self-planning. arXiv preprint arXiv:2401.05268, 2024.
  24. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  25. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  26. Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361, 2019.
  27. Grounding or guesswork? large language models are presumptive grounders. arXiv preprint arXiv:2311.09144, 2023.
  28. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  29. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
  30. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
  31. Task ambiguity in humans and language models. arXiv preprint arXiv:2212.10711, 2022.
  32. Large language models in medicine. Nature medicine, 29(8):1930--1940, 2023.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  34. Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926, 2023.
  35. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229--256, 1992.
  36. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35:20744--20757, 2022.
  37. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476--15488, 2022.
  38. Quiet-star: Language models can teach themselves to think before speaking. arXiv preprint arXiv:2403.09629, 2024.
  39. In-context principle learning from mistakes. arXiv preprint arXiv:2402.05403, 2024.
  40. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chinmaya Andukuri (1 paper)
  2. Jan-Philipp Fränken (12 papers)
  3. Tobias Gerstenberg (18 papers)
  4. Noah D. Goodman (83 papers)
Citations (16)

Summary

Teaching LLMs to Ask Clarifying Questions

Introduction

In the domain of conversational AI and Natural Language Processing, the ability of LLMs (LMs) to interpret user intents accurately is paramount. Traditional models often stumble when prompts are ambiguous or lack sufficient detail, leading to suboptimal responses. The paper "STaR-GATE: Teaching LLMs to Ask Clarifying Questions" introduces an iterative algorithm designed to enhance a LM's questioning capability, thereby improving its responses to user prompts. By embedding an active learning loop into the model's training process, STaR-GATE (Self-Taught Reasoner-Generative Active Task Elicitation) achieves significant improvements in generating contextually relevant questions, which in turn facilitates more personalized and accurate responses to user inputs.

Methodology

STaR-GATE Overview:

STaR-GATE combines the principles of active preference elicitation (GATE) with a self-improvement learning strategy (STaR). The paper constructs a synthetic dataset encompassing 25,500 unique persona-task prompts, simulating interactions between the model (termed the Questioner) and a Roleplayer. The Roleplayer represents users with undisclosed preferences, and the Questioner is tasked with eliciting these preferences through targeted questions. Success is measured by the Questioner's ability to generate responses that align with gold-standard answers produced by an Oracle, which has complete access to the Roleplayer's preferences.

Key Techniques:

  1. Iterative Finetuning: The model undergoes repeated cycles of self-improvement by iteratively refining its questioning approach based on feedback loops. This process involves generating questions, eliciting preferences, and receiving rewards based on the relevance of generated responses to the gold-standard answers.
  2. Regularization: To prevent overfitting on the question-asking behavior and ensure balanced performance in both questioning and responding tasks, response regularization techniques are employed. This involves generating responses from the model and incorporating these responses into the training data.
  3. Roleplayer and Oracle Models: Utilizing pre-trained LLMs as Roleplayers and Oracles introduces variability and depth to the training process, simulating real-world interactions and preferences more effectively.

Results

After two iterations of applying STaR-GATE, the improved model demonstrates a notable capability in asking better questions that lead to generating responses preferred by users in 72% of tasks. This outcome underscores the efficacy of the algorithm in enhancing the model's interactive and elicitation skills. Moreover, the win rates and log probabilities of generating gold-standard responses see consistent improvement across iterations, indicating a positive trend in the model's learning trajectory.

Implications and Future Directions

The success of STaR-GATE in teaching LLMs to ask clarifying questions has both theoretical and practical implications:

  1. Enhanced Model Interactivity: The ability to ask poignant questions can significantly improve user experience in conversational AI, making interactions more dynamic and context-aware.
  2. Personalized Responses: By effectively eliciting user preferences, models can offer more tailored responses, enhancing satisfaction and engagement across various applications, from virtual assistants to customer service bots.
  3. Future Work: While promising, the STaR-GATE approach points toward further research avenues, including exploring other modalities of elicitation, integrating with larger, more capable models, and adapting the methodology across diverse languages and cultural contexts.

Conclusion

The STaR-GATE algorithm represents a significant step forward in the development of conversational AI, empowering LLMs to interact more effectively with users by asking clarifying questions. Through iterative self-improvement and targeted response generation, the approach not only boosts the model's understanding of user intents but also fosters a more personalized and engaging conversational experience.