Papers
Topics
Authors
Recent
2000 character limit reached

Model Leeching: An Extraction Attack Targeting LLMs (2309.10544v1)

Published 19 Sep 2023 in cs.LG, cs.AI, cs.CL, and cs.CR

Abstract: Model Leeching is a novel extraction attack targeting LLMs, capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. AI, G. About Bard. Google AI: Publications, 2023. Accessed: 8th February 2023.
  2. AWS. Sagemaker data labeling pricing. https://aws.amazon.com/sagemaker/data-labeling/pricing/, 2023. Accessed: 20230-06-30.
  3. Extracting training data from large language models, 2021.
  4. Adversarial attacks and defences: A survey, 2018.
  5. Machine generated text: A comprehensive survey of threat models and detection methods, 2023.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  7. The turking test: Can language models understand instructions?, 2020.
  8. Floridi, L. Ai as agency without intelligence: on chatgpt, large language models, and other generative models. Philosophy & Technology 36, 1 (Mar 2023), 15.
  9. Pinch: An adversarial extraction attack framework for deep learning models, 2023.
  10. Deepsniffer: A dnn model extraction framework based on learning architectural hints. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2020), ASPLOS ’20, Association for Computing Machinery, p. 385–399.
  11. Adversarial examples for evaluating reading comprehension systems, 2017.
  12. Thieves on sesame street! model extraction of bert-based apis, 2020.
  13. Albert: A lite bert for self-supervised learning of language representations, 2020.
  14. Roberta: A robustly optimized bert pretraining approach, 2019.
  15. Reframing instructional prompts to GPTk’s language. In Findings of the Association for Computational Linguistics: ACL 2022 (Dublin, Ireland, May 2022), Association for Computational Linguistics, pp. 589–612.
  16. MITRE. MITRE ATLAS Adversarial Attack Knowledge Base, 2023. [Online; accessed 02-May-2023].
  17. I know what you trained last summer: A survey on stealing machine learning models and defences. ACM Comput. Surv. 55, 14s (jul 2023).
  18. OpenAI. ChatGPT. OpenAI Blog, 2023. Accessed: 2023-02-08.
  19. OpenAI. gpt4all.io, 2023. Accessed: 8th February 2023.
  20. The limitations of deep learning in adversarial settings. pp. 372–387.
  21. Squad: 100,000+ questions for machine comprehension of text, 2016.
  22. Membership inference attacks against machine learning models, 2017.
  23. Dawn: Dynamic adversarial watermarking of neural networks, 2021.
  24. Llama: Open and efficient foundation language models, 2023.
  25. Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium (USENIX Security 16) (Austin, TX, Aug. 2016), USENIX Association, pp. 601–618.
  26. Attention is all you need, 2017.
  27. The security of machine learning in an adversarial setting: A survey. Journal of Parallel and Distributed Computing 130 (2019), 12–23.
  28. Self-instruct: Aligning language model with self generated instructions, 2022.
  29. A prompt pattern catalog to enhance prompt engineering with chatgpt, 2023.
  30. A survey of large language models, 2023.
  31. Universal and transferable adversarial attacks on aligned language models, 2023.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.