Understand Subnetwork Probing’s unexpected selections in the OR-gate toy model
Ascertain why Subnetwork Probing identifies the key and value inputs of attention head a0.0 in the toy transformer OR-gate model designed to implement a simple OR gate, despite the ground-truth circuit requiring only the two attention head outputs into the downstream MLP, and characterize the factors that cause SP to include these additional inputs.
References
We are unsure why SP finds the a0.0's key and value inputs.
— Towards Automated Circuit Discovery for Mechanistic Interpretability
(2304.14997 - Conmy et al., 2023) in Appendix: Automated Circuit Discovery and OR gates