Actionability of Interpretation for LLM Safety
Ascertain the actionability of interpretation methods for autoregressive Transformer-based generative large language models by formally defining what constitutes actionable interpretation outputs, determining evaluation criteria across diverse stakeholder groups, and establishing procedures that operationalize these outputs to support concrete safety decisions.
References
We also highlight tools that facilitate understanding and use of interpretation results, recognizing that notions of practicality can vary across stakeholders and that actionability of interpretation remains an actively researched open question.
                — Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
                
                (2506.05451 - Lee et al., 5 Jun 2025) in Limitations (Section: Limitations)