Logits of API-Protected LLMs Leak Proprietary Information (2403.09539v3)
Abstract: LLM providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM's hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.
- J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2):139–160, 1982. doi: https://doi.org/10.1111/j.2517-6161.1982.tb01195.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1982.tb01195.x.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
- Stealing part of a production language model, 2024.
- Stolen probability: A structural weakness of neural language models. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2191–2197, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.198. URL https://aclanthology.org/2020.acl-main.198.
- Closing the curious case of neural text degeneration. ArXiv, abs/2310.01693, 2023. URL https://api.semanticscholar.org/CorpusID:263608672.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api.semanticscholar.org/CorpusID:258546941.
- Low-rank softmax can have unargmaxable classes in theory but rarely in practice. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6738–6758, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.465. URL https://aclanthology.org/2022.acl-long.465.
- Distilling the knowledge in a neural network. In Proc. of NeurIPS, 2015. URL https://arXiv.org/abs/1503.02531.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. ArXiv, abs/2305.02301, 2023. URL https://api.semanticscholar.org/CorpusID:258461606.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
- Models in a spelling bee: Language models implicitly learn the character composition of tokens. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5061–5068, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.373. URL https://aclanthology.org/2022.naacl-main.373.
- Tom Leinster. How the simplex is a vector space. https://golem.ph.utexas.edu/category/2016/06/how_the_simplex_is_a_vector_sp.html, 2016. Accessed: 2024-03-12.
- Prompt injection attack against llm-integrated applications. ArXiv, abs/2306.05499, 2023. URL https://api.semanticscholar.org/CorpusID:259129807.
- Language model inversion. ArXiv, abs/2311.13647, 2023.
- Distilling transformers into simple neural networks with unlabeled transfer data. ArXiv, abs/1910.01769, 2019. URL https://api.semanticscholar.org/CorpusID:203736526.
- Gpt-4 technical report, 2024.
- Codefusion: A pre-trained diffusion model for code generation, 2023.
- Stealing machine learning models via prediction apis. In USENIX Security Symposium, 2016. URL https://api.semanticscholar.org/CorpusID:2984526.
- Attention is all you need. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:13756489.
- Breaking the softmax bottleneck: A high-rank RNN language model. In ICLR, 2018. URL https://openreview.net/forum?id=HkwZSG-CZ.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.