2000 character limit reached
Flexible Model Interpretability through Natural Language Model Editing (2311.10905v1)
Published 17 Nov 2023 in cs.CL and cs.AI
Abstract: Model interpretability and model editing are crucial goals in the age of LLMs. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically manipulating them.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.