Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations (2303.02536v4)

Published 5 Mar 2023 in cs.AI

Abstract: Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Atticus Geiger (35 papers)
  2. Zhengxuan Wu (37 papers)
  3. Christopher Potts (113 papers)
  4. Thomas Icard (22 papers)
  5. Noah D. Goodman (83 papers)
Citations (74)

Summary

We haven't generated a summary for this paper yet.