Understanding contributions in composite steering of large language models
Characterize the contribution of each individual steering intervention to the final generated output when multiple steering controls are composed within a single large language model inference run, and model the non-linear interactions among controls acting on input, structural, state, and output surfaces (e.g., activation addition, post-hoc attention steering, fine-tuning, and decoding-time alignment) to enable reliable attribution and ordering effects analysis.
References
In general, the contribution of each intervention on the final output is not well understood, largely due to non-linear interactions.
— AI Steerability 360: A Toolkit for Steering Large Language Models
(2603.07837 - Miehling et al., 8 Mar 2026) in Section: Additional toolkit features, paragraph “Composite steering”.