Mapping model parameters to encoded information in deep neural networks
Identify, in trained deep neural networks used for machine learning, which specific parameters (weights and biases) capture which pieces of information from the training data, including for large language models such as GPT-4, to enable targeted removal or verification of stored information.
References
The problem is that we do not know which parameter in reality captures which information.
— Eternal Sunshine of the Mechanical Mind: The Irreconcilability of Machine Learning and the Right to be Forgotten
(2403.05592 - Manab, 6 Mar 2024) in Section 3: Welcome to the Machine