Turning local geometric insights into practical decomposition and steering tools

Develop practical, scalable methods that leverage the geometric structure of activation spaces in transformer language models to perform activation decomposition and causal steering, converting empirical observations of locally organized, low-dimensional subspaces into operational algorithms for feature discovery and model control.

Background

A growing body of work indicates that LLM activation spaces exhibit meaningful, locally low-dimensional geometric structure rather than being well captured by a single global set of directions. However, existing activation decomposition and steering methods often rely on global directions and therefore fail to fully exploit this local geometry.

The paper frames the problem of transforming these geometric insights into usable, scalable tools for both decomposition and steering as an open challenge. The authors propose Mixtures of Factor Analyzers as one approach to address this challenge, but the general task of developing practical methodologies remains explicitly highlighted as open.

References

Yet, while recent work shows meaning- ful geometric structure in activation space, how to turn these insights into practical tools for decomposition and steering remains an open challenge.

— From Directions to Regions: Decomposing Activations in Language Models via Local Geometry (2602.02464 - Shafran et al., 2 Feb 2026) in Section 1 (Introduction)

Turning local geometric insights into practical decomposition and steering tools

Background

References

Related Problems