What's New in My Data? Novelty Exploration via Contrastive Generation (2410.14765v1)
Abstract: Fine-tuning is widely used to adapt LLMs for specific goals, often leveraging real-world data such as patient records, customer-service interactions, or web content in languages not covered in pre-training. These datasets are typically massive, noisy, and often confidential, making their direct inspection challenging. However, understanding them is essential for guiding model deployment and informing decisions about data cleaning or suppressing any harmful behaviors learned during fine-tuning. In this study, we introduce the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties. Our approach, Contrastive Generative Exploration (CGE), assumes no direct access to the data but instead relies on a pre-trained model and the same model after fine-tuning. By contrasting the predictions of these two models, CGE can generate examples that highlight novel characteristics of the fine-tuning data. However, this simple approach may produce examples that are too similar to one another, failing to capture the full range of novel phenomena present in the dataset. We address this by introducing an iterative version of CGE, where the previously generated examples are used to update the pre-trained model, and this updated model is then contrasted with the fully fine-tuned model to generate the next example, promoting diversity in the generated outputs. Our experiments demonstrate the effectiveness of CGE in detecting novel content, such as toxic language, as well as new natural and programming languages. Furthermore, we show that CGE remains effective even when models are fine-tuned using differential privacy techniques.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318, 2016.
- The falcon series of open language models. arXiv preprint arXiv:2311.16867, 2023.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pp. 464–473. IEEE, 2014.
- Together Computer. Redpajama-data: An open source recipe to reproduce llama training dataset, 2023. URL https://github.com/togethercomputer/RedPajama-Data.
- The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- What’s in my big data? In International Conference on Learning Representations, 2024.
- Latxa: An open language model and evaluation suite for Basque. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 14952–14972, Bangkok, Thailand, August 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.acl-long.799.
- Continual pre-training for cross-lingual llm adaptation: Enhancing japanese language capabilities. Conference on Language Modeling, 2024.
- Linear alignment: A closed-form solution for aligning human preferences without tuning and feedback. In Proceedings of the 41st International Conference on Machine Learning, volume 235, pp. 14702–14722. PMLR, 2024.
- Lessons learned: Surveying the practicality of differential privacy in the industry. Proceedings on Privacy Enhancing Technologies, 2023.
- Openllama: An open reproduction of llama, 2023. URL https://github.com/openlm-research/open_llama.
- Time travel in llms: Tracing data contamination in large language models. International Conference on Learning Representations, 2024.
- ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3309–3326, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.234. URL https://aclanthology.org/2022.acl-long.234.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
- Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10951–10960, 2020.
- Lora: Low-rank adaptation of large language models. International Conference on Learning Representations, 2022.
- On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
- Knowledge unlearning for mitigating privacy risks in language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 14389–14408, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.805. URL https://aclanthology.org/2023.acl-long.805.
- Diederik P Kingma. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Contrastive decoding: Open-ended text generation as optimization. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 12286–12312, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.687. URL https://aclanthology.org/2023.acl-long.687.
- Large language models can be strong differentially private learners. International Conference on Learning Representations, 2022.
- Data distillation for text classification. arXiv preprint arXiv:2104.08448, 2021.
- Enhancing the reliability of out-of-distribution image detection in neural networks. International Conference on Learning Representations, 2018.
- DExperts: Decoding-time controlled text generation with experts and anti-experts. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6691–6706, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.522. URL https://aclanthology.org/2021.acl-long.522.
- Tuning language models by proxy. In Conference on Language Modeling, 2024.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- Dataset distillation with attention labels for fine-tuning BERT. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 119–127, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-short.12. URL https://aclanthology.org/2023.acl-short.12.
- DiLM: Distilling dataset into language model for text-level dataset distillation. In Kevin Duh, Helena Gomez, and Steven Bethard (eds.), Findings of the Association for Computational Linguistics: NAACL 2024, pp. 3138–3153, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-naacl.199. URL https://aclanthology.org/2024.findings-naacl.199.
- Data portraits: Recording foundation model training data. Advances in Neural Information Processing Systems, 36, 2023.
- Contrastive decoding improves reasoning in large language models. arXiv preprint arXiv:2309.09117, 2023.
- The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116, 2023.
- The ROOTS search tool: Data transparency for LLMs. In Danushka Bollegala, Ruihong Huang, and Alan Ritter (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 304–314, Toronto, Canada, July 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-demo.29. URL https://aclanthology.org/2023.acl-demo.29.
- GAIA search: Hugging face and pyserini interoperability for NLP training data exploration. In Danushka Bollegala, Ruihong Huang, and Alan Ritter (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 588–598, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-demo.57. URL https://aclanthology.org/2023.acl-demo.57.
- Data distillation: A survey. Transactions on Machine Learning Research, 2023.
- Don’t look at the data! how differential privacy reconfigures the practices of data science. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–19, 2023.
- Navigating the OverKill in large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 4602–4614, Bangkok, Thailand, August 2024a. Association for Computational Linguistics. URL https://aclanthology.org/2024.acl-long.253.
- Detecting pretraining data from large language models. International Conference on Learning Representations, 2024b.
- Stochastic gradient descent with differentially private updates. In 2013 IEEE global conference on signal and information processing, pp. 245–248. IEEE, 2013.
- Soft-label dataset distillation and text dataset distillation. In 2021 International Joint Conference on Neural Networks, pp. 1–8. IEEE, 2021.
- Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
- SafeDecoding: Defending against jailbreak attacks via safety-aware decoding. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 5587–5605, Bangkok, Thailand, August 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.acl-long.303.
- Generalized out-of-distribution detection: A survey. International Journal of Computer Vision, pp. 1–28, 2024.
- A large language model for electronic health records. NPJ digital medicine, 5(1):194, 2022.
- Differentially private fine-tuning of language models. International Conference on Learning Representations, 2022.
- Dataset distillation: A comprehensive review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning, pp. 12674–12685. PMLR, 2021.
- Dataset condensation with gradient matching. In International Conference on Learning Representations, 2020.
- ROSE doesn’t do that: Boosting the safety of instruction-tuned large language models with reverse prompt contrastive decoding. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics ACL 2024, pp. 13721–13736, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.findings-acl.814.
- Oasis: Data curation and assessment system for pretraining of large language models. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence: Demonstrations Track, 2024.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.