Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning (2402.14963v2)

Published 22 Feb 2024 in cs.CL and cs.AI

Abstract: While LLMs have the capability to iteratively reflect on their own outputs, recent studies have observed their struggles with knowledge-rich problems without access to external resources. In addition to the inefficiency of LLMs in self-assessment, we also observe that LLMs struggle to revisit their predictions despite receiving explicit negative feedback. Therefore, We propose Mirror, a Multiple-perspective self-reflection method for knowledge-rich reasoning, to avoid getting stuck at a particular reflection iteration. Mirror enables LLMs to reflect from multiple-perspective clues, achieved through a heuristic interaction between a Navigator and a Reasoner. It guides agents toward diverse yet plausibly reliable reasoning trajectory without access to ground truth by encouraging (1) diversity of directions generated by Navigator and (2) agreement among strategically induced perturbations in responses generated by the Reasoner. The experiments on five reasoning datasets demonstrate that Mirror's superiority over several contemporary self-reflection approaches. Additionally, the ablation study studies clearly indicate that our strategies alleviate the aforementioned challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Adrien Baranes and Pierre-Yves Oudeyer. 2013. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics Auton. Syst., 61(1):49–73.
  2. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43.
  3. Discovering latent knowledge in language models without supervision. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  4. Learning universal policies via text-guided video generation. CoRR, abs/2302.00111.
  5. Guiding pretraining in reinforcement learning with large language models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 8657–8677. PMLR.
  6. RARR: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16477–16508, Toronto, Canada. Association for Computational Linguistics.
  7. Enabling large language models to generate text with citations. In Empirical Methods in Natural Language Processing (EMNLP).
  8. Improving alignment of dialogue agents via targeted human judgements. ArXiv, abs/2209.14375.
  9. CRITIC: large language models can self-correct with tool-interactive critiquing. CoRR, abs/2305.11738.
  10. Critic: Large language models can self-correct with tool-interactive critiquing. ArXiv, abs/2305.11738.
  11. Reasoning with language model is planning with world model. CoRR, abs/2305.14992.
  12. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  13. TRUE: Re-evaluating factual consistency evaluation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3905–3920, Seattle, United States. Association for Computational Linguistics.
  14. Large language models cannot self-correct reasoning yet. CoRR, abs/2310.01798.
  15. Language models (mostly) know what they know. CoRR, abs/2207.05221.
  16. Grace: Discriminator-guided chain-of-thought reasoning. In Conference on Empirical Methods in Natural Language Processing.
  17. Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer.
  18. Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22.
  19. Emotionprompt: Leveraging psychology for large language models enhancement via emotional stimulus. arXiv preprint arXiv:2307.11760.
  20. Lost in the middle: How language models use long contexts. CoRR, abs/2307.03172.
  21. Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960.
  22. Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
  23. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. CoRR, abs/2303.08896.
  24. Samuel Marks and Max Tegmark. 2023. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets. ArXiv, abs/2310.06824.
  25. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11048–11064. Association for Computational Linguistics.
  26. Improving intrinsic exploration with language abstractions. Advances in Neural Information Processing Systems, 35:33947–33960.
  27. Pierre-Yves Oudeyer and Frederic Kaplan. 2007. What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1:6.
  28. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. ArXiv, abs/2308.03188.
  29. C-MCTS: safe planning with monte carlo tree search. CoRR, abs/2305.16209.
  30. Refiner: Reasoning feedback on intermediate representations. ArXiv, abs/2304.01904.
  31. Check your facts and try again: Improving large language models with external knowledge and automated feedback. CoRR, abs/2302.12813.
  32. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  33. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Conference on Empirical Methods in Natural Language Processing.
  34. Reflexion: Language agents with verbal reinforcement learning.
  35. Monte carlo tree search: a review of recent modifications and applications. Artif. Intell. Rev., 56(3):2497–2562.
  36. FEVER: a large-scale dataset for fact extraction and VERification. In NAACL-HLT.
  37. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  38. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  39. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  40. Self-evaluation guided beam search for reasoning.
  41. Tree of thoughts: Deliberate problem solving with large language models. ArXiv, abs/2305.10601.
  42. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  43. STar: Bootstrapping reasoning with reasoning. In Advances in Neural Information Processing Systems.
  44. Interpretable unified language checking. ArXiv, abs/2304.03728.
  45. Judging llm-as-a-judge with mt-bench and chatbot arena. ArXiv, abs/2306.05685.
  46. Language agent tree search unifies reasoning acting and planning in language models. CoRR, abs/2310.04406.
  47. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4471–4485, Toronto, Canada. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hanqi Yan (18 papers)
  2. Qinglin Zhu (6 papers)
  3. Xinyu Wang (186 papers)
  4. Lin Gui (66 papers)
  5. Yulan He (113 papers)
Citations (2)