TOFU: A Task of Fictitious Unlearning for LLMs (2401.06121v1)

Published 11 Jan 2024 in cs.LG and cs.CL

Abstract: LLMs trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work together to provide a holistic picture of unlearning efficacy. Finally, we provide a set of baseline results from existing unlearning algorithms. Importantly, none of the baselines we consider show effective unlearning motivating continued efforts to develop approaches for unlearning that effectively tune models so that they truly behave as if they were never trained on the forget data at all.

References (44)

Authors (5)

Pratyush Maini (19 papers)
Zhili Feng (22 papers)
Avi Schwarzschild (35 papers)
J. Zico Kolter (151 papers)
Zachary C. Lipton (137 papers)

Citations (85)

View on Semantic Scholar

Summary

The paper introduces TOFU as a benchmark to evaluate LLMs' unlearning capabilities using a dataset of 200 fabricated author profiles.
It employs dual metrics—forget quality and model utility—to quantify models' success in eliminating sensitive data while preserving overall performance.
Findings reveal that current LLM unlearning methods often reduce model competence, highlighting the need for innovative privacy-preserving strategies.

Overview of TOFU

In the evolving landscape of AI, the privacy implications embedded within LLMs have become a prominent concern. LLMs, trained on wide-ranging internet data, possess an innate ability to recall and disseminate sensitive details, triggering apprehensions regarding data confidentiality and compliance with privacy regulations. To counter this, the concept of unlearning is gaining traction—modifying LLMs to obliterate traces of specific data they were trained on. Despite the availability of unlearning methodologies, their true efficacy remains disputed.

Introducing the Benchmark

Addressing this quandary, researchers have formulated TOFU—a benchmark enabling thorough analysis of unlearning processes. With an arsenal of 200 fabricated author profiles replete with question-answer pairs, TOFU delineates an unlearning challenge—efface all information related to a distinct subset of these profiles. The ambition is to distinguish between models that are oblivious to the so-called 'forget set' and those that remain informed. TOFU is designed to unravel the mysteries of unlearning, scrutinizing if AI can truly be made to forget.

Metrics for Measuring Unlearning

To gauge the success of unlearning, comprehensive metrics are concocted. The dual-pronged approach evaluates models on 'forget quality'—the similarity to models unacquainted with the forget set's data—and 'model utility,' the retention of the model's functionality sans the to-be-forgotten details. These metrics, shedding light on individual and collective performance indicators, offer an almost tangible grasp of unlearning outcomes.

The Unlearning Landscape

Baseline unlearning models, subjected to this stringent evaluative regime, reveal a stark landscape where effective unlearning appears to be a distant reality. Models struggle to eradicate knowledge discretely, with their performance waning alongside their capacity to forget. It underlines the intricacy of unlearning—stripping information without eroding the model's competence is a daunting endeavor.

Future Considerations

The findings from TOFU underline the pressing need for innovation in unlearning algorithms. Current efforts appear to merely scratch the surface, shrouding models in a veneer of forgetting without genuinely purging the underlying data. It's a clarion call to researchers and practitioners alike to craft strategies that enable LLMs to reconcile the contradiction inherent in learning to forget. As the dialogue matures, so too will the potential for AI to navigate the tightrope between knowledge retention and respecting the sanctity of data privacy.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ishabytes/status/1747699080107815356

https://twitter.com/topofmlsafety/status/1747645955686686855

https://twitter.com/fly51fly/status/1746291452039323732

https://twitter.com/RahelJhirad/status/1915223013273772181

https://twitter.com/IAmACatAI/status/1747544300274270688

https://twitter.com/james_a_asher/status/1745850882791514252