Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory (2405.16674v2)

Published 26 May 2024 in cs.LG, cs.CC, and cs.LO

Abstract: Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Multi-layer SSMs are constrained by log-space computational capacity, limiting their reasoning abilities. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with advanced prompting techniques. Models often resort to shortcuts, leading to compounding errors. These findings highlight fundamental barriers within current deep learning architectures rooted in their computational capacities. We underscore the need for innovative solutions to transcend these constraints and achieve reliable multi-step reasoning and compositional task-solving, which is critical for advancing toward general artificial intelligence.

PDF Abstract

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Introduction

This paper, titled "Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory," examines the inherent limitations of current deep learning models, particularly Structured State Space Models (SSMs) and Transformers, in handling tasks that require complex reasoning over sequences. Despite their success in various applications, these models struggle with function composition and compositional tasks due to architectural and training constraints. The contributions include a theoretical framework grounded in complexity theory to elucidate these limitations and extensive empirical analysis to demonstrate their practical impact.

Function Composition Challenges

Function composition is a critical task that underpins many advanced AI applications such as mathematical problem-solving, algorithmic learning, and logical reasoning. Despite their capabilities, models like GPT-4 and SSMs falter in these tasks.

The complexity of function composition can be understood through a problem formulation where given two functions $g: A \to B$ and $f: B \to C$ , current models must compute $f(g(x))$ . The paper establishes that both SSMs and Transformer models rely on shortcuts and fail to maintain accuracy over multiple reasoning steps, leading to performance degradation as task complexity increases.

Theoretical Framework

The theoretical framework built on complexity theory provides insight into why these models fail. The essence of the argument is that SSMs and Transformers belong to weak complexity classes, specifically logspace-uniform TC $^0$ , which intrinsically limits their ability to perform function composition. This assertion is substantiated by the following key findings:

Theorem 1: Proves that SSM layers require a polynomially growing number of Chain-of-Thought (CoT) steps to solve function composition problems. This implies that maintaining a polynomial model state size is insufficient for tasks requiring genuine multi-step reasoning.
Theorem 2: Demonstrates that solving iterated function composition requires an exponential number of CoT steps, further underscoring the impracticality of current models for deep compositional tasks.
Theorem 3: Establishes that SSMs and Transformers cannot solve derivability, 2-SAT, Horn SAT, and circuit evaluation problems, provided $\mathbf{L} \neq \mathbf{NL}$ , by virtue of their logspace limitations.

Empirical Analysis

Extensive empirical evaluations highlight the practical implications of these theoretical findings:

SSMs and Transformers perform poorly on function composition tasks, such as multi-digit multiplication and logical expression chaining, even when equipped with CoT prompting. For instance, SSM-Attention hybrid models like Jamba achieve only 17% accuracy in 4-by-3 digit multiplication tasks.
Compositional tasks tested include spatial, temporal, and relationship compositions. Across various datasets (Math-QA, BIG-Bench Hard, Temporal-NLI, SpaRTUN), no models succeeded in consistently solving these tasks, corroborating the theoretical claims.
Error analysis and information flow studies reveal significant performance drops as task complexity scales, with models often resorting to shortcuts rather than authentic multi-step reasoning.

Implications and Future Directions

The implications of this work are profound, both practically and theoretically:

Practical Implications: These findings call into question the efficacy of current deep learning models in tasks requiring high levels of compositional reasoning. This affects applications in mathematics, algorithmic learning, and logical reasoning where multi-step processes are essential.
Theoretical Implications: The demonstrated limitations rooted in complexity theory highlight the need for innovative architectural paradigms and training methodologies beyond the current deep learning frameworks to achieve reliable multi-step reasoning.

Future research should focus on developing models that can inherently handle deep compositional tasks without resorting to shortcuts, perhaps by exploring advanced complexity classes or hybrid approaches that integrate traditional AI reasoning methods with deep learning.

Conclusion

This paper provides a rigorous examination of the limits of deep learning models in sequence modeling through the lens of complexity theory. The combination of theoretical proofs and empirical evidence underscores the inherent challenges faced by models like SSMs and Transformers in performing function composition and solving compositional tasks. Addressing these limitations will be crucial for advancing the capabilities of AI in complex reasoning applications.