Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis (2505.11581v1)

Published 16 May 2025 in cs.CV, cs.LG, and cs.NE

Abstract: Much of the excitement in modern AI is driven by the observation that scaling up existing systems leads to better performance. But does better performance necessarily imply better internal representations? While the representational optimist assumes it must, this position paper challenges that view. We compare neural networks evolved through an open-ended search process to networks trained via conventional stochastic gradient descent (SGD) on the simple task of generating a single image. This minimal setup offers a unique advantage: each hidden neuron's full functional behavior can be easily visualized as an image, thus revealing how the network's output behavior is internally constructed neuron by neuron. The result is striking: while both networks produce the same output behavior, their internal representations differ dramatically. The SGD-trained networks exhibit a form of disorganization that we term fractured entangled representation (FER). Interestingly, the evolved networks largely lack FER, even approaching a unified factored representation (UFR). In large models, FER may be degrading core model capacities like generalization, creativity, and (continual) learning. Therefore, understanding and mitigating FER could be critical to the future of representation learning.

Summary

The paper argues that conventional SGD produces fractured, disorganized internal representations that impede generalization and continual learning.
It contrasts SGD with open-ended evolution using Picbreeder, revealing unified, factored representations with modular semantic structure.
The study warns that high external performance in deep learning may conceal flawed internal organization, affecting model creativity and adaptability.

This paper, "Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis" (2505.11581), challenges the implicit assumption in deep learning that scaling up models and achieving better performance automatically leads to better internal representations. The authors propose the concept of Fractured Entangled Representation (FER), arguing that conventional training methods like stochastic gradient descent (SGD) on fixed objectives tend to produce disorganized internal structures that can negatively impact core model capacities like generalization, creativity, and continual learning.

To illustrate this, the paper compares neural networks trained via conventional SGD with networks evolved through an open-ended search process using the Picbreeder system. They use Compositional Pattern Producing Networks (CPPNs), a type of network that generates images from spatial coordinates. The key advantage of CPPNs is that the activation of every hidden neuron across the entire input space (the image grid) can be visualized, providing a clear view of the network's internal representation.

The Picbreeder system, based on the NEAT evolutionary algorithm, allows humans to guide the evolution of CPPNs by selecting preferred images, leading to a non-objective, open-ended search. The authors show that CPPNs evolved through Picbreeder often exhibit what they term Unified Factored Representation (UFR). In UFR, key regularities (like symmetry) emerge early in the network layers, and semantic features (like the shape, eyes, or mouth of a skull; or the body, stem, and background of an apple) are represented in a modular, factored way, allowing for reuse of computational components. They demonstrate this UFR through visualizations of neuron activations and by showing that sweeping individual network weights results in meaningful, coherent changes to specific semantic aspects of the generated image while preserving overall structure (e.g., changing the size of a skull's mouth without distorting other features, or swinging an apple's stem independently).

In contrast, when training a CPPN with conventional SGD to replicate the exact same output image as a Picbreeder-evolved network, the resulting internal representation is drastically different. The SGD-trained network achieves pixel-perfect accuracy but exhibits FER. Its internal layers show disorganized, patchwork-like activation patterns, with regularities like symmetry only appearing (if at all) in the final output layer. Weight sweeps in these FER networks lead to chaotic, meaningless distortions of the image, breaking symmetries and entangling unrelated features. The authors illustrate this with examples of a skull, butterfly, and apple, showing that despite identical external behavior, the internal structures learned by Picbreeder (UFR) and conventional SGD (FER) are fundamentally different.

The paper argues that this phenomenon of "Imposter Intelligence"—where outward performance is high but internal representation is flawed—has significant implications for large deep learning models like LLMs. While current LLMs achieve impressive performance on benchmarks, the authors suggest that FER might underlie some of their known limitations, such as:

Inconsistent reasoning (e.g., GPT-3 failing to count animals while correctly counting objects).
Fragility and shortcut learning (e.g., models learning Othello board states via fragmented heuristics rather than general rules; LLMs relying on heuristics for arithmetic instead of fundamental algorithms).
Poor generalization to out-of-distribution or counterfactual scenarios.
Inability to follow simple instructions when applied to slightly different contexts (e.g., GPT-4o replacing words in a sequence but not numbers; GPT-4o generating an extra thumb on a human hand but not an ape hand).
Challenges in mechanistic interpretability (superposition and polysemanticity potentially being symptoms of underlying FER).

The authors speculate on factors contributing to FER vs. UFR:

Order of Learning: The non-greedy, serendipitous curriculum naturally explored in open-ended search (like Picbreeder, where foundational regularities like symmetry are discovered early) might lead to better representations than the direct, objective-driven optimization of conventional SGD. Human learning also benefits from structured curricula and the ability to ignore information they aren't ready for.
Data Quantity and Holistic Unification: While more data might lead to better coverage and potentially some degree of "grokking" or holistic unification, the paper questions if this fully resolves FER, especially in sparse data domains or at the frontiers of knowledge.
Architectural and Algorithmic Choices: Different network architectures (CNNs, Transformers, MoE) and algorithmic modifications (regularization, pruning, directed updates) could potentially mitigate or exacerbate FER, but this requires further paper.
Open-Ended Search: The open-ended nature of the search process itself, which encourages the discovery of evolvable (adaptable) artifacts, is highlighted as a key factor promoting UFR.

In conclusion, the paper argues that while representational optimism is prevalent, the existence of FER in conventionally trained networks, even on simple tasks, suggests that achieving high benchmark performance does not guarantee a robust or adaptable internal representation. The contrast with representations found via open-ended search highlights the potential for alternative training paradigms to yield more structured and less fractured internal models, which the authors believe is crucial for achieving true generalization, creativity, and efficient continual learning in future AI systems. They call for increased research into observing and mitigating FER. Appendices provide detailed information on the methods used (NEAT, CPPNs, Picbreeder, layerization, SGD), additional image examples (butterfly, apple), experiments with ReLU networks showing FER persists, and analysis using PCA and multi-weight sweeps to address concerns about the basis of representation, further supporting their hypothesis.

PDF Markdown

Related Papers

Tweets

https://twitter.com/hardmaru/status/1941398131104678356

https://twitter.com/kenneth0stanley/status/1924650134299939082

https://twitter.com/bohannon_bot/status/1924898295698358591

https://twitter.com/rapha_gl/status/1941165232690536669

https://twitter.com/MLStreetTalk/status/1941254457884639693

https://twitter.com/hardmaru/status/1941390391124820397

YouTube

Show All Videos

HackerNews

Questioning Representational Optimism in Deep Learning (1 point, 0 comments)
Questioning Representational Optimism in Deep Learning (1 point, 3 comments)