Purifying Large Language Models by Ensembling a Small Language Model (2402.14845v1)
Abstract: The emerging success of LLMs heavily relies on collecting abundant training data from external (untrusted) sources. Despite substantial efforts devoted to data cleaning and curation, well-constructed LLMs have been reported to suffer from copyright infringement, data poisoning, and/or privacy violations, which would impede practical deployment of LLMs. In this study, we propose a simple and easily implementable method for purifying LLMs from the negative effects caused by uncurated data, namely, through ensembling LLMs with benign and small LLMs (SLMs). Aside from theoretical guarantees, we perform comprehensive experiments to empirically confirm the efficacy of ensembling LLMs with SLMs, which can effectively preserve the performance of LLMs while mitigating issues such as copyright infringement, data poisoning, and privacy violations.
- Gpt-4 technical report.
- A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness.
- Pythia: A suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
- Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
- Blake Brittain. 2024. Microsoft, openai hit with new lawsuit. ITnews Asia.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Differentially private optimization on large model at small cost. In International Conference on Machine Learning, pages 3192–3218. PMLR.
- Federico Cassano. 2023. https://github.com/cassanof/finetuning-harness.
- Badpre: Task-agnostic backdoor attacks to pre-trained nlp foundation models. arXiv preprint arXiv:2110.02467.
- Evaluating large language models trained on code.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Anisa Noorassa Christina L. Martini, Jodi Benassi. 2022. 2022 ip outlook report: The developments shaping copyright law. The National Law Review.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1.
- A survey on ensemble learning. Frontiers of Computer Science, 14:241–258.
- Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. arXiv preprint arXiv:2310.02238.
- Christopher V. Carani et al. 2023. Copyright Laws and Regulations USA 2024. https://iclg.com/practice-areas/copyright-laws-and-regulations/usa.
- A framework for few-shot language model evaluation.
- Sleeper agents: Training deceptive llms that persist through safety training.
- Usama Jawad. 2022. Class-action lawsuit filed against microsoft’s github copilot for software piracy. Neowin.
- Llm-blender: Ensembling large language models with pairwise ranking and generative fusion.
- Scaling laws for neural language models.
- Propile: Probing privacy leakage in large language models. arXiv preprint arXiv:2307.01881.
- The stack: 3 tb of permissively licensed source code. Preprint.
- Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pages 19274–19286. PMLR.
- Starcoder: may the source be with you!
- Logiqa 2.0—an improved dataset for logical reasoning in natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2947–2962.
- Routing to the expert: Efficient reward-guided ensemble of large language models.
- Dolos 2.0: Towards seamless source code plagiarism detection in online learning environments. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 2, ITiCSE 2023, page 632, New York, NY, USA. Association for Computing Machinery.
- Ryan Mac Michael M. Grynbaum. 2023. The times sues openai and microsoft over a.i. use of copyrighted work. The New York Times.
- Silo language models: Isolating legal risk in a nonparametric datastore. arXiv preprint arXiv:2308.04430.
- Understanding source code evolution using abstract syntax tree matching. In Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5.
- CodexLeaks: Privacy leaks from code generation language models in GitHub copilot. In 32nd USENIX Security Symposium (USENIX Security 23), pages 2133–2150, Anaheim, CA. USENIX Association.
- The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1525–1534, Berlin, Germany. Association for Computational Linguistics.
- Robi Polikar. 2012. Ensemble learning. Ensemble machine learning: Methods and applications, pages 1–34.
- Training text-to-text transformers with privacy guarantees. Findings of the Association for Computational Linguistics: ACL 2022, pages 2182–2193.
- Onion: A simple and effective defense against textual backdoor attacks. arXiv preprint arXiv:2011.10369.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Code llama: Open foundation models for code.
- Winogrande: An adversarial winograd schema challenge at scale.
- Pamela Samuelson. 2023. Generative ai meets copyright. Science, 381(6654):158–161.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- You autocomplete me: Poisoning vulnerabilities in neural code completion. In 30th USENIX Security Symposium (USENIX Security 21), pages 1559–1575.
- One size does not fit all: Investigating strategies for differentially-private learning across nlp tasks. arXiv preprint arXiv:2112.08159.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Provable copyright protection for generative models. arXiv preprint arXiv:2302.10870.
- Knowledge fusion of large language models.
- Crowdsourcing multiple choice science questions. ArXiv, abs/1707.06209.
- Detecting ai trojans using meta neural analysis.
- A survey on ensemble learning under the era of deep learning. Artificial Intelligence Review, 56(6):5545–5589.
- Bag of tricks for training data extraction from language models. arXiv preprint arXiv:2302.04460.
- Codeipprompt: Intellectual property infringement assessment of code language models. In Proceedings of the 40th International Conference on Machine Learning, pages 40373–40389.
- A recipe for watermarking diffusion models.