Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? (2404.12728v2)

Published 19 Apr 2024 in cs.CL

Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help LLMs better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two improved methods with significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. A general theoretical paradigm to understand learning from human preferences. arXiv preprint arXiv:2310.12036.
  2. Deepseek llm: Scaling open-source language models with longtermism.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  6. Palm: Scaling language modeling with pathways.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  8. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  9. Specializing smaller language models towards multi-step reasoning. arXiv preprint arXiv:2301.12726.
  10. Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720.
  11. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874.
  12. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
  13. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  14. Keith J Holyoak. 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pages 234–259.
  15. Mistral 7b. arXiv preprint arXiv:2310.06825.
  16. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406.
  17. Large language models are zero-shot reasoners. In Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022).
  18. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  19. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
  20. On the advance of making language models better reasoners. arXiv preprint arXiv:2206.02336.
  21. Textbooks are all you need II: phi-1.5 technical report. arXiv preprint arXiv:2309.05463.
  22. Statistical rejection sampling improves preference optimization. In The Twelfth International Conference on Learning Representations.
  23. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610.
  24. Teaching small language models to reason. ArXiv preprint, abs/2212.08410.
  25. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
  26. OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  27. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  28. In-context learning with iterative demonstration selection. arXiv preprint arXiv:2310.09881.
  29. Is ChatGPT a general-purpose natural language processing task solver? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1339–1384, Singapore. Association for Computational Linguistics.
  30. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  31. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
  32. VS Ramachandran. 2012. Encyclopedia of Human Behavior. Elsevier/Academic Press; 2012.
  33. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  34. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  35. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  36. Reflexion: Language agents with verbal reinforcement learning.
  37. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:2201.11990.
  38. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
  39. Stanford alpaca: An instruction-following llama model. Stanford CRFM.
  40. Unifying language learning paradigms. arXiv preprint arXiv:2205.05131.
  41. Lamda: Language models for dialog applications.
  42. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  43. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  44. Stella Vosniadou and Andrew Ortony. 1989. Similarity and analogical reasoning. Cambridge University Press.
  45. Rationale-augmented ensembles in language models. arXiv preprint arXiv:2207.00747.
  46. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  47. Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541.
  48. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  49. Chain of thought prompting elicits reasoning in large language models. In Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022).
  50. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
  51. Large language models as analogical reasoners. In The Twelfth International Conference on Learning Representations.
  52. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
  53. Star: Bootstrapping reasoning with reasoning. arXiv preprint arXiv:2203.14465.
  54. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations (ICLR 2023).
  55. Verify-and-edit: A knowledge-enhanced chain-of-thought framework. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5823–5840, Toronto, Canada. Association for Computational Linguistics.
  56. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
  57. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chengwei Qin (28 papers)
  2. Wenhan Xia (13 papers)
  3. Tan Wang (18 papers)
  4. Fangkai Jiao (19 papers)
  5. Yuchen Hu (60 papers)
  6. Bosheng Ding (16 papers)
  7. Ruirui Chen (12 papers)
  8. Shafiq Joty (187 papers)
Citations (2)