Local vs distributed representations: What is the right basis for interpretability? (2411.03993v1)

Published 6 Nov 2024 in cs.CV

Abstract: Much of the research on the interpretability of deep neural networks has focused on studying the visual features that maximally activate individual neurons. However, recent work has cast doubts on the usefulness of such local representations for understanding the behavior of deep neural networks because individual neurons tend to respond to multiple unrelated visual patterns, a phenomenon referred to as "superposition". A promising alternative to disentangle these complex patterns is learning sparsely distributed vector representations from entire network layers, as the resulting basis vectors seemingly encode single identifiable visual patterns consistently. Thus, one would expect the resulting code to align better with human perceivable visual patterns, but supporting evidence remains, at best, anecdotal. To fill this gap, we conducted three large-scale psychophysics experiments collected from a pool of 560 participants. Our findings provide (i) strong evidence that features obtained from sparse distributed representations are easier to interpret by human observers and (ii) that this effect is more pronounced in the deepest layers of a neural network. Complementary analyses also reveal that (iii) features derived from sparse distributed representations contribute more to the model's decision. Overall, our results highlight that distributed representations constitute a superior basis for interpretability, underscoring a need for the field to move beyond the interpretation of local neural codes in favor of sparsely distributed ones.

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper shows that distributed representations derived using dictionary learning and NMF enhance interpretability over local features.
Psychophysics experiments involving 560 participants confirm higher accuracy and comprehension with distributed features across network layers.
Findings indicate that overcoming superposition with sparse distributed representations is key to improving model confidence and decision-making.

Local vs Distributed Representations: What Is the Right Basis for Interpretability?

Introduction

The paper introduces a pertinent investigation into the link between local and distributed representations in deep neural networks (DNNs) and their interpretability. The authors challenge the prevailing reliance on local representations, where individual neurons are typically studied for interpretability. They emphasize the limitations imposed by superposition in local representations—where neurons respond to multiple features—and advocate for the use of distributed representations using dictionary learning methods (Figure 1).

Figure 1: A conceptual depiction of local versus distributed representations in neural networks, highlighting the challenge of superposition in local representations and the vector-based simplification in distributed representations.

Methodology

Sparse Distributed Representations

For generating sparsely distributed representations, the paper utilizes CRAFT, a dictionary learning method. The process involves applying Non-negative Matrix Factorization (NMF) to map activation spaces of neural networks into more interpretable bases. This decomposition results in vectors driven by singular, interpretable features.

Psychophysics Experiments

The core empirical analysis comprises three psychophysics experiments, involving a cohort of 560 participants, evaluating how intuitive and comprehensible features derived from distributed representations are compared to local representations.

Experiment I (Figure 2): Evaluated interpretability with trials illustrating the maximally and minimally activating stimuli for both local and distributed representations.
Figure 2: Illustration of a trial setup in Experiment I, demonstrating the task for participants to discern the query image consistent with reference images.
Experiment II (Figure 3): Introduced semantic controls to minimize semantic bias in reference images, aiming to discern visual coherence purely based on feature interpretation rather than semantic categories.
Figure 3: Example of semantic control in trials to eradicate bias, hence focusing on visual pattern recognition rather than semantic inference.
Experiment III (Figure 4 and Figure 5): Integrated feature visualization with natural images to explore whether visualizations enhance interpretability—a hypothesis with divided evidence from preceding studies.
Figure 4: Mixed trials in Experiment III integrating feature visualizations with natural images.

Figure 5: Results from Experiment III highlighting performance discrepancies across network layers between local and distributed conditions.

Results

A pivotal takeaway from the experimental results is that distributed representations invariably led to higher interpretability scores. Participants consistently exhibited better performance in tasks involving distributed features, with accuracy peaking at deeper network layers. The authors assert that these vectors not only correspond closely to perceivable visual features but are indispensable to model decisions, evidenced by a noticeable drop in model confidence upon their omission (Figure 6).

Figure 6: Demonstrates the superior feature importance of distributed representations over local ones across various network layers.

Conclusion

This inquiry confirms the potential of sparsely distributed representations as a superior basis for interpretability in computer vision models. Distributed features ease human interpretation and align better with the model’s significant decision-making processes, particularly in complex and deep network structures. While existing local methods falter due to the superposition problem, distributed methods promote clearer, more cogent feature mappings. Future research could further explore scalable methods for synthesizing these representations in other neural architectures, potentially fostering universal interpretability frameworks applicable across AI systems.

PDF Markdown

Local vs distributed representations: What is the right basis for interpretability? (2411.03993v1)

Collections

Summary

Local vs Distributed Representations: What Is the Right Basis for Interpretability?

Introduction

Methodology

Sparse Distributed Representations

Psychophysics Experiments

Results

Conclusion

Follow-up Questions

Authors (7)

Don't miss out on important new AI/ML research

Local vs distributed representations: What is the right basis for interpretability? (2411.03993v1)

Collections

Summary

Local vs Distributed Representations: What Is the Right Basis for Interpretability?

Introduction

Methodology

Sparse Distributed Representations

Psychophysics Experiments

Results

Conclusion

Follow-up Questions

Related Papers

Authors (7)

Don't miss out on important new AI/ML research