Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance (2403.09085v2)

Published 14 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with several simple questions supported by a generic fact, LLMs often struggle to abstract and apply the generic fact to provide consistent and precise answers, revealing a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts. The code is available at https://github.com/Waste-Wood/MeanLearn.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Knowledge of knowledge: Exploring known-unknowns uncertainty with large language models. arXiv preprint arXiv:2305.13712.
  3. Abductive commonsense reasoning. In International Conference on Learning Representations.
  4. Genericskb: A knowledge base of generic statements. arXiv preprint arXiv:2005.00660.
  5. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  7. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
  8. OpenCompass Contributors. 2023. Opencompass: A universal evaluation platform for foundation models.
  9. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2682–2691.
  10. e-care: a new dataset for exploring explainable causal reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 432–446.
  11. Complexity-based prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations.
  12. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
  13. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  14. Large language models are reasoning teachers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14852–14882, Toronto, Canada. Association for Computational Linguistics.
  15. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  16. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  17. Race: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794.
  18. Building machines that learn and think like people. Behavioral and brain sciences, 40:e253.
  19. Symbolic chain-of-thought distillation: Small models can also “think” step-by-step. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2665–2679, Toronto, Canada. Association for Computational Linguistics.
  20. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  21. Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045.
  22. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707.
  23. Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement. arXiv preprint arXiv:2310.08559.
  24. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  25. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
  26. Social iqa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473.
  27. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
  28. Challenging BIG-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13003–13051, Toronto, Canada. Association for Computational Linguistics.
  29. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158.
  30. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  32. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
  33. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  34. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109.
  35. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  36. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  37. Examining inter-consistency of large language models collaboration: An in-depth analysis via debate. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7572–7590.
  38. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
  39. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  40. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations.
  41. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Kai Xiong (33 papers)
  2. Xiao Ding (38 papers)
  3. Ting Liu (329 papers)
  4. Bing Qin (186 papers)
  5. Dongliang Xu (19 papers)
  6. Qing Yang (138 papers)
  7. Hongtao Liu (44 papers)
  8. Yixin Cao (138 papers)
Citations (2)