Data-driven Circuit Discovery

Updated 3 July 2026

Data-driven Circuit Discovery is a method that identifies minimal, sparse subgraphs within neural networks by analyzing activation patterns from diverse datasets.
The approach uncovers distinct circuits through clustering and differentiable pruning, ensuring faithfulness, completeness, and minimality in model behavior.
Recent advances include position-aware attribution and federated analog circuit discovery, enhancing interpretability and robustness across various models.

Data-driven Circuit Discovery (DCD) refers to a family of methods that automatically extract and interpret "circuits"—sparse, mechanistically meaningful subgraphs within artificial neural networks—by analyzing model computation in response to diverse datasets. The guiding principle is to let the structure of the model's activations and computation on actual data, rather than manual hypotheses or global ablation, dictate which subnetworks underpin specific functions or behaviors. DCD has recently become central in mechanistic interpretability for LLMs, vision models, and even analog circuit generation, as it reveals the dynamic, context-dependent, and sometimes multi-mechanism nature of network computation.

1. Formal Problem Definition and Motivation

Data-driven Circuit Discovery aims to recover, for a given pretrained model $M$ (typically represented as a computational graph $G = (V, E)$ of nodes and directed edges), minimal subgraphs $C \subseteq G$ ("circuits") that suffice to recreate the model's behavior on a target task or dataset $D$ (Rai et al., 9 May 2026, Hsu et al., 2024, Haklay et al., 7 Feb 2025). Unlike traditional, hypothesis-driven mechanistic interpretability—where one assumes a single, human-labeled circuit is responsible for task performance—DCD is motivated by two gaps:

Computation is data-dependent: The same model may use distinct pathways or mechanisms for different input subsets, even when task semantics are unchanged (Rai et al., 9 May 2026). This invalidates the assumption of a single, universal circuit per task.
Manual edges often miss essential mechanisms: Purely handcrafted or position-agnostic circuits neglect position-sensitive or context-specific computation, as well as task-overlapping or co-existing mechanisms (Haklay et al., 7 Feb 2025).

DCD seeks data-driven answers to: Which model components (weights, edges, or nodes) are functionally critical for a particular group of data examples? Can these components be grouped or clustered to reveal distinct mechanisms, and can one prescribe interpretable, minimal subgraphs explaining observed decisions?

2. Algorithmic Frameworks for Data-driven Circuit Discovery

Several algorithmic paradigms instantiate DCD, including per-example attribution and clustering (Rai et al., 9 May 2026), differentiable graph pruning (Yu et al., 2024), compositional contextual decomposition (Hsu et al., 2024), and formal verification (Hadad et al., 18 Feb 2026). A generic workflow comprises:

Attribution or Signature Construction: For each input $x_i$ in dataset $D$ , compute a signature vector $s_{x_i} \in \mathbb{R}^{|E|}$ assigning, e.g., importance or attribution to each edge, node, or weight (using patching, gradients, or decompositions) (Rai et al., 9 May 2026, Haklay et al., 7 Feb 2025, Hsu et al., 2024, Yu et al., 2024).
Clustering by Functional Similarity: Reduce the dimensionality of $\{s_{x_i}\}$ (via PCA, SVD, or binarization), and cluster examples into groups $D_1, \dots, D_K$ reflecting shared computational mechanisms (Rai et al., 9 May 2026).
Circuit Discovery per Cluster: For each group, search for a sparse subgraph $C_k \subseteq G$ such that, when only $G = (V, E)$ 0 is active (all other components ablated), $G = (V, E)$ 1's predictions on $G = (V, E)$ 2 are preserved. Objective functions trade off faithfulness, sparsity, and completeness (Rai et al., 9 May 2026, Yu et al., 2024).

Differentiable methods, such as DiscoGP, jointly optimize binary masks over parameters and edges using straight-through gradient estimators to directly enforce faithfulness and sparsity (Yu et al., 2024). Recursive decomposition (CD-T) propagates relevance through the model to identify source components, supporting aggressive layerwise pruning (Hsu et al., 2024).

3. Position-aware and Schema-based Advances

Early DCD efforts assumed position invariance; i.e., edge or node importance was aggregated across token positions, which obscured cross-positional or variable-length mechanisms (Haklay et al., 7 Feb 2025). Recent extensions address these gaps by:

Position-aware Attribution: Edge importance scores $G = (V, E)$ 3 are computed for each edge distinguished by its source and target token positions, enabling explicit modeling of phenomena such as cross-positional attention (Haklay et al., 7 Feb 2025).
Dataset Schemas for Variable-length Inputs: To align computation across semantically-equivalent but structurally divergent examples, schemas $G = (V, E)$ 4 are automatically generated (using LLMs) to define semantic "spans" (e.g., Subject, Object), with mapping $G = (V, E)$ 5 linking schema-level circuits to instance-level edges (Haklay et al., 7 Feb 2025). Experiments demonstrate that fully automated, LLM-generated schemas yield circuits with interpretability and faithfulness matching human-designed schemas.

4. Faithfulness, Completeness, and Minimality Metrics

Circuit faithfulness measures the extent to which a discovered circuit $G = (V, E)$ 6 reproduces the full model's behavior on a specified dataset. Key metrics include:

Soft faithfulness: $G = (V, E)$ 7 (Haklay et al., 7 Feb 2025, Hsu et al., 2024).
Hard faithfulness: $G = (V, E)$ 8.
Functional completeness: Accuracy of the model when the circuit $G = (V, E)$ 9 is ablated (the complement circuit operates). For an ideal circuit, this accuracy should approach random guessing (Yu et al., 2024).
Sparsity: Fraction of parameters or edges retained in the circuit subgraph.
Jaccard edge overlap: Measures cross-dataset or cross-task circuit similarity (Rai et al., 9 May 2026).

Recent work introduces provable input domain robustness and patching domain robustness: the circuit must maintain agreement with the model not just pointwise but uniformly over continuous input or patching perturbations (Hadad et al., 18 Feb 2026). Minimality is formalized as either subset-minimality (no element can be removed without loss of faithfulness) or cardinal minimality (smallest support possible under constraints) (Hadad et al., 18 Feb 2026).

5. Empirical Findings and Practical Implications

Empirical studies have established several robust trends:

Multi-circuit reality: For a wide range of tasks and datasets, LLMs implement multiple distinct mechanisms, often revealed only through clustering and not identified by standard hypothesis-driven methods. DCD finds 7–11 distinct circuits per dataset on contemporary LMs, each more faithful to its data group than a single global circuit (Rai et al., 9 May 2026).
Position-awareness yields smaller, more interpretable circuits: Position-aware DCD techniques discover much smaller, mechanistically meaningful circuits (e.g., 20–30 edges sufficing vs. 500 for position-agnostic on Greater-Than tasks) and align better with task semantics (Haklay et al., 7 Feb 2025).
Differentiable pruning enhances both faithfulness and completeness: Joint edge- and parameter-masking via algorithms such as DiscoGP achieve faithfulness and completeness near theoretical limits with 2–3% active edges or weights on GPT-2, outperforming patching and weight-only pruning (Yu et al., 2024).
CD-T for efficient and faithful transformer circuit extraction: On benchmarks such as indirect object identification, CD-T recovers up to 97% ROC AUC of manual circuits at lower runtimes than path-patching, with circuits using only 0.04% of model heads recovering over 46% of true-class logits; unfaithful circuits (random selection) achieve near-zero faithfulness (Hsu et al., 2024).
Provable guarantees via neural network verification: Formal methods employing neural network verifiers (e.g., a–3–CROWN) yield circuits that are robust over continuous input and patching domains and can certifiably achieve minimality, markedly outperforming heuristic approaches in robustness (100% vs. ~46.5%) on vision models (Hadad et al., 18 Feb 2026).

6. Extensions to Analog Circuit Topology Discovery

DCD has also been successfully adapted for analog circuit synthesis with generative AI and federated learning (Li et al., 20 Jul 2025). The AnalogFed framework allows collaborative discovery of novel analog circuit topologies without sharing raw proprietary data, leveraging a lightweight transformer-based generative model. Federated aggregation, decentralized reward-driven fine-tuning (using PPO), and privacy-preserving aggregation underpin practical deployment across competing institutions. Empirical evaluation demonstrates that AnalogFed achieves validity and novelty metrics on par with centralized baselines, with performance within 2–5% as the client and data scale increases. Example discoveries include optimized op-amp, boost converter, and bandgap reference designs not present in any single participant's data.

7. Limitations, Open Problems, and Future Directions

While DCD has demonstrated marked advances, certain caveats remain:

Dependence on attribution faithfulness: The accuracy of edge or node attributions directly constrains circuit interpretability (Rai et al., 9 May 2026). Attribution methods lacking functional ground truth can lead to spurious or incomplete circuits.
Sensitivity to clustering and preprocessing: Circuit discovery quality depends on the choice of dimensionality reduction, clustering method, and number of clusters (Rai et al., 9 May 2026). No universally optimal protocol has been established.
Interpretability of discovered clusters: Not all clusters correspond to human-interpretable mechanisms; further structural or symbolic analyses may be required.
Scalability of formal verification: Provable minimality and robustness are presently limited by verifier scalability; large models pose runtime and memory bottlenecks (Hadad et al., 18 Feb 2026).

A plausible implication is that, as DCD methodologies mature and are integrated with more scalable formal verification and automated schema induction approaches, they will yield finer-grained, robust, and human-aligned circuit decompositions across modalities and architectures. Continued advances in DCD are crucial for both theoretical understanding of model generalization and for practical tasks such as model debugging, auditing, and customization.