Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization (2312.07763v1)

Published 12 Dec 2023 in cs.CL

Abstract: The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While LLMs exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Barbara Partee et al., “Compositionality,” Varieties of formal semantics, vol. 3, pp. 281–311, 1984.
  2. Stevan Harnad, “The symbol grounding problem,” Physica D: Nonlinear Phenomena, vol. 42, no. 1-3, pp. 335–346, 1990.
  3. “Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks,” in International conference on machine learning. PMLR, 2018, pp. 2873–2882.
  4. “Cogs: A compositional generalization challenge based on semantic interpretation,” in 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020. Association for Computational Linguistics (ACL), 2020, pp. 9087–9105.
  5. “Measuring compositional generalization: A comprehensive method on realistic data,” in International Conference on Learning Representations, 2019.
  6. “Compositional generalization via neural-symbolic stack machines,” Advances in Neural Information Processing Systems, vol. 33, pp. 1690–1701, 2020.
  7. “Learning algebraic recombination for compositional generalization,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 1129–1144.
  8. “Is chatgpt a general-purpose natural language processing task solver?,” arXiv preprint arXiv:2302.06476, 2023.
  9. “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
  10. “Large language models are zero-shot reasoners,” Advances in neural information processing systems, vol. 35, pp. 22199–22213, 2022.
  11. “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  12. “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
  13. “Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,” arXiv preprint arXiv:2211.12588, 2022.
  14. “Reascan: Compositional reasoning in language grounding,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  15. “Systematic generalization on gscan: What is nearly solved and what is next?,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 2180–2188.
  16. “When can transformers ground and compose: Insights from compositional generalization benchmarks,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 648–669.
  17. “Binding language models in symbolic languages,” in The Eleventh International Conference on Learning Representations, 2022.
  18. “Visual programming: Compositional visual reasoning without training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14953–14962.
  19. “Large language models as tool makers,” arXiv preprint arXiv:2305.17126, 2023.
  20. “Systematic generalization on gscan with language conditioned embedding,” arXiv preprint arXiv:2009.05552, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Min Zhang (630 papers)
  2. Jianfeng He (32 papers)
  3. Shuo Lei (10 papers)
  4. Murong Yue (8 papers)
  5. Linhang Wang (1 paper)
  6. Chang-Tien Lu (54 papers)
Citations (4)