Exact mechanism for constructing the a+b helix
Determine the exact mechanism by which transformer-based large language models such as GPT-J, Pythia-6.9B, and Llama3.1-8B construct the helix(a+b) representation from the helix(a) and helix(b) representations during addition, and isolate the corresponding computation within these models, clarifying whether and how trigonometric identities (e.g., cos(a+b)=cos(a)cos(b)−sin(a)sin(b)) are implemented by multilayer perceptrons and attention heads.
References
There are several aspects of LLM addition we still do not understand. Most notably, while we provide compelling evidence that key components create $\mathrm{helix}(a+b)$ from $\mathrm{helix}(a,b)$, we do not know the exact mechanism they use to do so. We hypothesize that LLMs use trigonometric identities like $\cos(a+b) = \cos(a)\cos(b)-\sin(a)\sin(b)$ to create $\mathrm{helix}(a+b)$. However, like the originator of the Clock algorithm \citet{nanda2023progress}, we are unable to isolate this computation in the model.