Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

Published 6 Jun 2025 in cs.CL and cs.AI | (2506.06057v1)

Abstract: LLMs have revolutionized Natural Language Processing by excelling at interpreting, reasoning about, and generating human language. However, their reliance on large-scale, often proprietary datasets poses a critical challenge: unauthorized usage of such data can lead to copyright infringement and significant financial harm. Existing dataset-inference methods typically depend on log probabilities to detect suspicious training material, yet many leading LLMs have begun withholding or obfuscating these signals. This reality underscores the pressing need for label-only approaches capable of identifying dataset membership without relying on internal model logits. We address this gap by introducing CatShift, a label-only dataset-inference framework that capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data. If a suspicious dataset was previously seen by the model, fine-tuning on a portion of it triggers a pronounced post-tuning shift in the model's outputs; conversely, truly novel data elicits more modest changes. By comparing the model's output shifts for a suspicious dataset against those for a known non-member validation set, we statistically determine whether the suspicious set is likely to have been part of the model's original training corpus. Extensive experiments on both open-source and API-based LLMs validate CatShift's effectiveness in logit-inaccessible settings, offering a robust and practical solution for safeguarding proprietary data.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces CatShift, a framework that uses catastrophic forgetting to determine if a dataset was used during LLM training.
It employs a three-step process including prompt-completion construction, fine-tuning, and output distribution analysis to reveal data membership.
Empirical evaluations report strong performance with an AUC of 0.979 and an F1 score of 0.863, demonstrating effective detection in label-only settings.

CatShift: Leveraging Catastrophic Forgetting for Label-Only Dataset Inference in LLMs

LLMs have achieved remarkable success in a variety of NLP tasks, owing substantially to the enormous datasets used in their training. However, the proprietary nature of many such datasets raises significant concerns regarding copyright infringement and unauthorized usage. The paper, "Hey, That's My Data! Label-Only Dataset Inference in LLMs," introduces an innovative approach to dataset inference in LLMs, specifically addressing challenges in logit-inaccessible settings.

Overview of the Research

The authors present CatShift, a framework designed to identify whether a suspicious dataset has been used during the training of an LLM using a label-only interface. CatShift relies on the phenomenon of catastrophic forgetting (CF)—whereby models overwrite learned information when trained on new datasets—to infer dataset membership robustly and efficiently, despite the absence of log probabilities.

Methodology

The paper proposes a three-step process to facilitate dataset inference:

Completion Prompt Construction: Converts data samples into prompt-completion pairs usable in label-only settings. This step ensures compatibility with standard LLM APIs.
Target Model Fine-tuning: Fine-tunes the LLM on a portion of the suspicious dataset. This step is crucial, as it distinguishes between datasets that were part of the original training set and those that are novel by inducing observable shifts in model outputs—to internalize previously learned knowledge or newly acquire information.
Output Distribution Analysis: Compares pre- and post-fine-tuning outputs using statistical metrics (e.g., similarity scores based on BERT-based analysis) against a known non-member validation set. This analysis employs hypothesis testing to assess the dataset membership likelihood based solely on output shifts.

Empirical Evaluations

Extensive evaluations across various LLMs demonstrate the effectiveness of CatShift. For example, using open-source models like Pythia and GPT-Neo as well as API-based models such as GPT-3.5, CatShift reliably distinguishes between member and non-member datasets. The results reveal an impressive AUC score of 0.979 and an F1 score of 0.863 on the Pythia model, significantly outperforming baseline methods. On commercial models like GPT-3.5, CatShift achieves a p-value of $6.44 \times 10^{-5}$ for member datasets, indicating strong evidence of membership.

Implications and Future Directions

The implications of CatShift are noteworthy for safeguarding proprietary data in AI frameworks. Deploying this approach allows content owners to detect potential copyright infringement in LLMs used commercially, even when detailed model outputs are obscured. It underscores the necessity for continued exploration into robust dataset inference mechanisms, especially considering potential adversarial behaviors by model providers.

Nevertheless, several limitations persist, mainly concerning practical constraints in detecting CF for overlapping datasets and optimal hyperparameter tuning. Future research may address partial membership inference, enhanced statistical testing techniques, and adversarial robustness strategies to further refine the framework.

Conclusion

The "CatShift" paper presents an innovative stride in the domain of data rights within AI by leveraging catastrophic forgetting to conduct label-only dataset inference for LLMs. It is a compelling illustration of overcoming traditional hurdles in LLM dataset inference—especially when devoid of logit access—offering methodological advancements and establishing a pathway for future inquiries.

Markdown Report Issue