Mapping model parameters to encoded information in deep neural networks

Identify, in trained deep neural networks used for machine learning, which specific parameters (weights and biases) capture which pieces of information from the training data, including for large language models such as GPT-4, to enable targeted removal or verification of stored information.

Background

The paper argues that modern machine learning systems, particularly deep neural networks and LLMs, store learned information in vast numbers of parameters. While in principle one might delete particular information by adjusting or removing the corresponding parameters, the authors note that it is not known which parameters encode which information.

This lack of interpretability hinders precise machine unlearning and complicates compliance with legal rights such as the Right to be Forgotten, where specific personal data must be erased without collateral effects on unrelated information.

References

The problem is that we do not know which parameter in reality captures which information.

— Eternal Sunshine of the Mechanical Mind: The Irreconcilability of Machine Learning and the Right to be Forgotten (2403.05592 - Manab, 6 Mar 2024) in Section 3: Welcome to the Machine

Mapping model parameters to encoded information in deep neural networks

Sponsor

Background

References

Related Problems