Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues (2404.11095v2)
Abstract: Aligning LLMs with human expectations requires high-quality instructional dialogues, which usually require instructions that are diverse and in-depth. Existing methods leverage two LLMs to interact for automatic collection: one simulating a user to pose instructions, and the other acting as a system agent to respond. However, these user simulators struggle to model the rules behind how dialogues can pose different instructions without explicit guidance, resulting in general instructions. In this paper, we propose to explicitly capture the complex rules to help the user simulator pose diverse and in-depth instruction. Specifically, we first induce high-level instruction strategies from various real instruction dialogues serving as rules. Afterward, different possible strategies are applied to the newly given dialogue scenario deductively to pose various instructions. Experimental results show that our method can generate diverse and in-depth instructions. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.
- Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all.
- A synthetic data generation framework for grounded dialogues. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10866–10882, Toronto, Canada. Association for Computational Linguistics.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Enhancing chat language models by scaling high-quality instructional conversations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3029–3051, Singapore. Association for Computational Linguistics.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
- Alpacafarm: A simulation framework for methods that learn from human feedback.
- Vinod Goel. 2007. Anatomy of deductive reasoning. Trends in cognitive sciences, 11(10):435–441.
- ChainCQG: Flow-aware conversational question generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2061–2070, Online. Association for Computational Linguistics.
- Inductive reasoning. Wiley interdisciplinary reviews: Cognitive science, 1(2):278–292.
- Unnatural instructions: Tuning language models with (almost) no human labor. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14409–14428, Toronto, Canada. Association for Computational Linguistics.
- Sequence-to-sequence data augmentation for dialogue language understanding. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1234–1245, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Towards better instruction following language models for chinese: Investigating the impact of training data and evaluation. arXiv preprint arXiv:2304.07854.
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
- Large language model as a user simulator. arXiv preprint arXiv:2308.11534.
- Platolm: Teaching llms via a socratic questioning user simulator.
- Mt-eval: A multi-turn capabilities evaluation benchmark for large language models.
- Building machines that learn and think like people. Behavioral and brain sciences, 40:e253.
- CAMEL: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems.
- Learning through dialogue interactions by asking questions. In International Conference on Learning Representations.
- Consecutive question generation via dynamic multitask learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6620–6635.
- Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8:293–321.
- Shikib Mehri and Maxine Eskenazi. 2020. USR: An unsupervised and reference free evaluation metric for dialog generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 681–707, Online. Association for Computational Linguistics.
- Ryszard S Michalski. 1983. A theory and methodology of inductive learning. In Machine learning, pages 83–134. Elsevier.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Counterfactual data augmentation via perspective transition for open-domain dialogues. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1635–1648, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Generative agents: Interactive simulacra of human behavior.
- Justus J Randolph. 2005. Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. Online submission.
- Gtm: A generative triple-wise model for conversational question generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3495–3506.
- Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33:16857–16867.
- Itd: Large language models can teach themselves induction through deduction. arXiv preprint arXiv:2403.05789.
- Parrot: Enhancing multi-turn chat models by learning to ask questions. arXiv preprint arXiv:2310.07301.
- Llama 2: Open foundation and fine-tuned chat models.
- Hypothesis search: Inductive reasoning with language models. arXiv preprint arXiv:2309.05660.
- Learning to ask questions in open-domain conversational systems with typed decoders. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2193–2203.
- Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
- Lamini-lm: A diverse herd of distilled models from large-scale instructions. CoRR, abs/2304.14402.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Baize: An open-source chat model with parameter-efficient tuning on self-chat data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6268–6278, Singapore. Association for Computational Linguistics.
- Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658.
- Large language model as attributed training data generator: A tale of diversity and bias. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Strategy and policy learning for non-task-oriented conversational systems. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue, pages 404–412.
- Synthesize, prompt and transfer: Zero-shot conversational question generation with pre-trained language model. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8989–9010.
- Expel: Llm agents are experiential learners.
- Step-back prompting enables reasoning via abstraction in large language models. In The Twelfth International Conference on Learning Representations.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Large language models can learn rules.