Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A General Theory for Compositional Generalization (2405.11743v1)

Published 20 May 2024 in cs.LG

Abstract: Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN) faces challenges in addressing the compositional generalization problem, prompting considerable research interest. However, existing theories often rely on task-specific assumptions, constraining the comprehensive understanding of CG. This study aims to explore compositional generalization from a task-agnostic perspective, offering a complementary viewpoint to task-specific analyses. The primary challenge is to define CG without overly restricting its scope, a feat achieved by identifying its fundamental characteristics and basing the definition on them. Using this definition, we seek to answer the question "what does the ultimate solution to CG look like?" through the following theoretical findings: 1) the first No Free Lunch theorem in CG, indicating the absence of general solutions; 2) a novel generalization bound applicable to any CG problem, specifying the conditions for an effective CG solution; and 3) the introduction of the generative effect to enhance understanding of CG problems and their solutions. This paper's significance lies in providing a general theory for CG problems, which, when combined with prior theorems under task-specific scenarios, can lead to a comprehensive understanding of CG.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. On the abstract structure of the behavioral approach to systems theory. arXiv preprint arXiv:1911.10398, 2019a.
  2. Generativity and interactional effects: an overview. arXiv preprint arXiv:1911.10406, 2019b.
  3. On the mathematical structure of cascade effects and emergent phenomena. arXiv preprint arXiv:1911.10376, 2019c.
  4. No free lunch theorem: A review. Approximation and optimization: Algorithms, complexity and applications, pages 57–82, 2019.
  5. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  6. P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
  7. A theory of learning from different domains. Machine learning, 79:151–175, 2010.
  8. O. Bousquet and A. Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
  9. Provably learning object-centric representations. In International Conference on Machine Learning, pages 3038–3062. PMLR, 2023.
  10. Sgd learns over-parameterized networks that provably generalize on linearly separable data. arXiv preprint arXiv:1710.10174, 2017.
  11. N. Chomsky. Syntactic structures. Mouton de Gruyter, 2002.
  12. K. Dong and T. Ma. First steps toward understanding the extrapolation of nonlinear models to unseen domains. arXiv preprint arXiv:2211.11719, 2022.
  13. Faith and fate: Limits of transformers on compositionality. Advances in Neural Information Processing Systems, 36, 2024.
  14. J. Fu and N. Zheng. Generalization error bounds for iterative learning algorithms with bounded updates. arXiv preprint arXiv:2309.05077, 2023.
  15. Learning trajectories are generalization indicators. arXiv preprint arXiv:2304.12579, 2023.
  16. Permutation equivariant models for compositional generalization in language. In International Conference on Learning Representations, 2019.
  17. Y. N. Harari. Sapiens: A brief history of humankind. Random House, 2014.
  18. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pages 1225–1234. PMLR, 2016.
  19. Simple explanation of the no free lunch theorem of optimization. In Proceedings of the 40th IEEE conference on decision and control (Cat. No. 01CH37228), volume 5, pages 4409–4414. IEEE, 2001.
  20. The no free lunch theorems: Complexity and security. IEEE Transactions on Automatic Control, 48(5):783–793, 2003.
  21. On the compositional generalization gap of in-context learning. arXiv preprint arXiv:2211.08473, 2022.
  22. Deep learning for text style transfer: A survey. Computational Linguistics, 48(1):155–205, 2022.
  23. Neural style transfer: A review. IEEE transactions on visualization and computer graphics, 26(11):3365–3385, 2019.
  24. T. Joyce and J. M. Herrmann. A review of no free lunch theorems, and their implications for metaheuristic optimisation. Nature-inspired algorithms and applied optimization, pages 27–51, 2018.
  25. Generalization in deep learning. arXiv preprint arXiv:1710.05468, 1(8), 2017.
  26. J. M. Keynes. The general theory of employment. The quarterly journal of economics, 51(2):209–223, 1937.
  27. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
  28. Human few-shot learning of compositional instructions. arXiv preprint arXiv:1901.04587, 2019.
  29. Y. Li and Y. Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31, 2018.
  30. Solving compositional reinforcement learning problems via task reduction. arXiv preprint arXiv:2103.07607, 2021.
  31. J.-P. Marquis. Category theory. 1996.
  32. D. A. McAllester. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory, pages 230–234, 1998.
  33. Foundations of machine learning. MIT press, 2018.
  34. R. Montague et al. Universal grammar. 1974, pages 222–46, 1970.
  35. D. Morwani and H. G. Ramaswamy. Inductive bias of gradient descent for exponentially weight normalized smooth homogeneous neural nets, 2021. URL https://openreview.net/forum?id=vCEhC7nOb6.
  36. Stochastic gradient descent on separable data: Exact convergence with a fixed learning rate. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3051–3059. PMLR, 2019.
  37. Learning to extrapolate: A transductive approach. arXiv preprint arXiv:2304.14329, 2023.
  38. B. Neyshabur. Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953, 2017.
  39. Improving compositional generalization in semantic parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2482–2495, 2020.
  40. B. Partee et al. Lexical semantics and compositionality. An invitation to cognitive science: Language, 1:311–360, 1995.
  41. J. Pearl and D. Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.
  42. M. Petrache and S. Trivedi. Position paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction. arXiv preprint arXiv:2402.01629, 2024.
  43. Y. Polyanskiy and Y. Wu. Lecture notes on information theory. Lecture Notes for ECE563 (UIUC) and, 6(2012-2016):7, 2014.
  44. Improving compositional generalization with latent structure and data augmentation. arXiv preprint arXiv:2112.07610, 2021.
  45. Improving compositional generalization using iterated learning and simplicial embeddings. Advances in Neural Information Processing Systems, 36, 2024.
  46. Tighter expected generalization error bounds via wasserstein distance. Advances in Neural Information Processing Systems, 34:19109–19121, 2021.
  47. Reinterpreting no free lunch. Evolutionary computation, 17(1):117–129, 2009.
  48. D. Russo and J. Zou. Controlling bias in adaptive data analysis using information theory. In Artificial Intelligence and Statistics, pages 1232–1240. PMLR, 2016.
  49. Extending the wilds benchmark for unsupervised adaptation. arXiv preprint arXiv:2112.05090, 2021.
  50. S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  51. D. Silver and K. Ciosek. Compositional planning using optimal option models. arXiv preprint arXiv:1206.6473, 2012.
  52. The no-free-lunch theorems of supervised learning. Synthese, 199(3):9979–10015, 2021.
  53. General theory of mortality and aging: A stochastic model relates observations on aging, physiologic decline, mortality, and radiation. Science, 132(3418):14–21, 1960.
  54. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  55. Skill machines: Temporal logic composition in reinforcement learning. arXiv preprint arXiv:2205.12532, 2022.
  56. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015.
  57. C. Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
  58. Provable compositional generalization for object-centric learning. arXiv preprint arXiv:2310.05327, 2023.
  59. Compositional generalization from first principles. Advances in Neural Information Processing Systems, 36, 2024.
  60. D. H. Wolpert. The supervised learning no-free-lunch theorems. Soft computing and industry: Recent applications, pages 25–42, 2002.
  61. D. H. Wolpert. What is important about the no free lunch theorems? In Black box optimization, machine learning, and no-free lunch theorems, pages 373–388. Springer, 2021.
  62. No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82, 1997.
  63. No free lunch theorems for search. Technical report, Citeseer, 1995.
  64. A. Xu and M. Raginsky. Information-theoretic analysis of generalization capability of learning algorithms. Advances in Neural Information Processing Systems, 30, 2017.
  65. X.-S. Yang. Free lunch or no free lunch: that is not just a question? International Journal on Artificial Intelligence Tools, 21(03):1240010, 2012.
  66. Toward compositional generalization in object-oriented world modeling. In International Conference on Machine Learning, pages 26841–26864. PMLR, 2022.

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com