Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability (2407.15720v2)

Published 22 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite the tremendous success of LLMs, how they approach composite tasks, especially those not encountered during the pretraining phase, remains an open and largely underexplored question. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We develop a test suite of composite tasks including linguistic and logical challenges and perform empirical studies across different LLM families. We observe that models exhibit divergent behaviors: (1) For simpler composite tasks that apply distinct mapping mechanisms to different input segments, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks involving reasoning multiple steps, where each step represents one task, models typically underperform, and scaling up generally provides no improvements. We offer theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. We believe our work sheds new light on the capabilities of LLMs in solving composite tasks regarding the nature of the tasks and model scale. Our dataset and code are available at {\url{https://github.com/OliverXUZY/LLM_Compose}}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. What learning algorithm is in-context learning? investigations with linear models. In The Eleventh International Conference on Learning Representations, 2023.
  2. Understanding the role of input token characters in language models: How does information loss affect performance? In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://openreview.net/forum?id=Ra6gfR3XuI.
  3. Does deep learning learn to abstract? a systematic probing framework. In The Eleventh International Conference on Learning Representations, 2023a.
  4. How do in-context examples affect compositional generalization? arXiv preprint arXiv:2305.04835, 2023b.
  5. Anthropic. The claude 3 model family: Opus, sonnet, haiku. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf, 2024.
  6. A theory for emergence of complex skills in language models. arXiv preprint arXiv:2307.15936, 2023.
  7. The reversal curse: Llms trained on” a is b” fail to learn” b is a”. arXiv preprint arXiv:2309.12288, 2023.
  8. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. Technical report, Zenodo, March 2021. If you use this software, please cite it using these metadata.
  9. Language models are few-shot learners. Advances in neural information processing systems, 2020.
  10. Meta-learning via language model in-context tuning. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  719–730, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.53.
  11. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  12. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  13. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019.
  14. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  15. Faith and fate: Limits of transformers on compositionality. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  16. SimCSE: Simple contrastive learning of sentence embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  6894–6910, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.552.
  17. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
  18. Openllama: An open reproduction of llama, May 2023. URL https://github.com/openlm-research/open_llama.
  19. Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic. arXiv preprint arXiv:2402.09469, 2024a.
  20. Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers. arXiv preprint arXiv:2405.05219, 2024b.
  21. Toward infinite-long prefix in transformer. arXiv preprint arXiv:2406.14036, 2024c.
  22. Tensor attention training: Provably efficient learning of higher-order transformers. arXiv preprint arXiv:2405.16411, 2024d.
  23. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  24. Surface form competition: Why the highest probability answer isn’t always right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  7038–7051, 2021.
  25. Selective annotation makes language models better few-shot learners. In The Eleventh International Conference on Learning Representations, 2023.
  26. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  27. Computational limits of low-rank adaptation (lora) for transformer-based models. arXiv preprint arXiv:2406.03136, 2024.
  28. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  5254–5276, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.319.
  29. Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
  30. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  31. COGS: A compositional generalization challenge based on semantic interpretation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  9087–9105, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.731.
  32. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  33. Diverse demonstrations improve in-context compositional generalization. arXiv preprint arXiv:2212.06800, 2022.
  34. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2021.
  35. What makes good in-context examples for GPT-3? In Eneko Agirre, Marianna Apidianaki, and Ivan Vulić (eds.), Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp.  100–114, Dublin, Ireland and Online, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.deelio-1.10.
  36. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  37. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8086–8098, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.556.
  38. Decoupled alignment for robust plug-and-play adaptation. arXiv preprint arXiv:2406.01514, 2024.
  39. What do large language models learn beyond language? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  6940–6953, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.516. URL https://aclanthology.org/2022.findings-emnlp.516.
  40. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. arXiv preprint arXiv:2307.03576, 2023.
  41. Meta. Introducing meta llama 3: The most capable openly available llm to date. https://ai.meta.com/blog/meta-llama-3/, 2024.
  42. MetaICL: Learning to learn in context. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2791–2809, Seattle, United States, July 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.201.
  43. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022b.
  44. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  45. OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt, 2022. Accessed: 2023-09-10.
  46. OpenAI. GPT-4 technical report. arXiv preprint arxiv:2303.08774, 2023.
  47. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022.
  48. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022.
  49. Improving language understanding by generative pre-training. OpenAI blog, 2018.
  50. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  51. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2655–2671, 2022.
  52. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022.
  53. The trade-off between universality and label efficiency of representations from contrastive learning. In International Conference on Learning Representations, 2023a.
  54. Why larger language models do in-context learning differently? In R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023b.
  55. Domain generalization via nuclear norm regularization. In Conference on Parsimony and Learning, pp.  179–201. PMLR, 2024.
  56. When and how does known class help discover unknown ones? provable understanding through spectral analysis. In International Conference on Machine Learning, pp.  33014–33043. PMLR, 2023.
  57. A graph-theoretic framework for understanding open-world semi-supervised learning. Advances in Neural Information Processing Systems, 36, 2024.
  58. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  59. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  60. Transformers learn in-context by gradient descent. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  35151–35174. PMLR, 23–29 Jul 2023.
  61. Is a picture worth a thousand words? delving into spatial reasoning for vision language models. arXiv preprint arXiv:2406.14852, 2024.
  62. Large language models are latent variable models: Explaining and finding good demonstrations for in-context learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  63. Multitask prompt tuning enables parameter-efficient transfer learning. In The Eleventh International Conference on Learning Representations, 2023b.
  64. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
  65. Symbol tuning improves in-context learning in language models. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023a.
  66. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023b.
  67. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
  68. Improving foundation models for few-shot learning via multitask finetuning. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
  69. Do large language models have compositional ability? an investigation into limitations and scalability. In ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024a. URL https://openreview.net/forum?id=4XPeF0SbJs.
  70. Towards few-shot adaptation of foundation models via multitask finetuning. In The Twelfth International Conference on Learning Representations, 2024b.
  71. Tree of thoughts: Deliberate problem solving with large language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  72. Compositional exemplars for in-context learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  39818–39833. PMLR, 23–29 Jul 2023.
  73. The expressive power of low-rank adaptation. In The Twelfth International Conference on Learning Representations, 2024.
  74. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023a.
  75. Trained transformers learn linear models in-context. arXiv preprint arXiv:2306.09927, 2023b.
  76. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pp.  12697–12706. PMLR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zhuoyan Xu (8 papers)
  2. Zhenmei Shi (60 papers)
  3. Yingyu Liang (107 papers)
Citations (17)