Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving (2311.00694v2)
Abstract: LLMs have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
- Development of metacognition in gifted children: Directions for future research. Developmental review, 15(1):1–37, 1995.
- The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Metacognition and problem solving: A process-oriented approach. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(1):205, 1995.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Natural language deduction through search over statement compositions. arXiv preprint arXiv:2201.06028, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712, 2022.
- The role of metacognition in problem solving. Metacognition: Knowing about knowing, 207:226, 1994.
- Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems, 34:11553–11564, 2021.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Leslie Pack Kaelbling, Danica Kragic, and Komei Sugiura (eds.), Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pp. 1025–1037. PMLR, 30 Oct–01 Nov 2020. URL https://proceedings.mlr.press/v100/gupta20a.html.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Reparameterized policy learning for multimodal trajectory optimization. 2023.
- Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
- Hrl4in: Hierarchical reinforcement learning for interactive navigation with mobile manipulators. In Conference on Robot Learning, pp. 603–616. PMLR, 2020.
- Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023.
- Infogail: Interpretable imitation learning from visual demonstrations. Advances in neural information processing systems, 30, 2017.
- Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023.
- Deductive verification of chain-of-thought reasoning. arXiv preprint arXiv:2306.03872, 2023.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Metacognition: Knowing about knowing. MIT press, 1994.
- Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems, 31, 2018.
- OpenAI. Gpt-4 technical report, 2023.
- Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions On Graphics (TOG), 41(4):1–17, 2022.
- Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pp. 188–204. PMLR, 2021.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Large language models can be easily distracted by irrelevant context. arXiv preprint arXiv:2302.00093, 2023.
- Prompting gpt-3 to be reliable. arXiv preprint arXiv:2210.09150, 2022.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- H Lee Swanson. The relationship between metacognition and problem solving in gifted children. Roeper Review, 15(1):43–48, 1992.
- Proofwriter: Generating implications, proofs, and abductive statements over natural language. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pp. 3621–3634. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.findings-acl.317. URL https://doi.org/10.18653/v1/2021.findings-acl.317.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
- Generating natural language proofs with verifier-guided search. arXiv preprint arXiv:2205.12443, 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
- Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921, 2023.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022a.
- Teaching algorithmic reasoning via in-context learning. arXiv preprint arXiv:2211.09066, 2022b.
- Zhan Ling (16 papers)
- Yunhao Fang (11 papers)
- Xuanlin Li (18 papers)
- Tongzhou Mu (19 papers)
- Mingu Lee (16 papers)
- Reza Pourreza (18 papers)
- Roland Memisevic (36 papers)
- Hao Su (217 papers)