Composition of circuits within Transformer blocks
Determine how multiple circuits within a single Transformer block compose to produce the block’s additive update to the residual stream in Transformers, in order to enable precise counterfactual interventions on individual intermediate variables.
Sponsor
References
It is currently unknown how multiple circuits compose within a given block to create one additive update to the residual stream, so one cannot replace individual variables to elicit counterfactual behavior.
— Uncovering Intermediate Variables in Transformers using Circuit Probing
(2311.04354 - Lepori et al., 2023) in Discussion, Limitations