Turning local geometric insights into practical decomposition and steering tools
Develop practical, scalable methods that leverage the geometric structure of activation spaces in transformer language models to perform activation decomposition and causal steering, converting empirical observations of locally organized, low-dimensional subspaces into operational algorithms for feature discovery and model control.
References
Yet, while recent work shows meaning- ful geometric structure in activation space, how to turn these insights into practical tools for decomposition and steering remains an open challenge.
— From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
(2602.02464 - Shafran et al., 2 Feb 2026) in Section 1 (Introduction)