Representation Complexity as the Source of Intelligence

Determine whether the complexity in solutions learned by GPT-2 models pretrained on complex elementary cellular automata—evidenced by increased attention to historical states—causally enables intelligent behavior and the repurposing of learned reasoning to downstream tasks.

Background

Although ECAs are memoryless and a trivial predictive solution exists (learning the instantaneous 8-bit rule), attention analyses show that models trained on complex rules attend more to past states, indicating they learn non-trivial solutions. This attention correlates with rule complexity.

Based on this observation, the authors explicitly conjecture that the complexity of the learned solution itself is what makes the model intelligent and able to transfer reasoning to downstream tasks.

References

The fact that the complex models are attending to previous states indicate that they are learning a more complex solution to this simple problem, and we conjecture that this complexity is what makes the model "intelligent" and capable of repurposing learned reasoning to downstream tasks.

— Intelligence at the Edge of Chaos (2410.02536 - Zhang et al., 3 Oct 2024) in Section 5.2 (Models Learn Complex Solutions For Simple Rules)

Representation Complexity as the Source of Intelligence

Sponsor

Background

References

Related Problems