Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Logits of API-Protected LLMs Leak Proprietary Information (2403.09539v3)

Published 14 Mar 2024 in cs.CL, cs.AI, cs.CR, and cs.LG

Abstract: LLM providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM's hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2):139–160, 1982. doi: https://doi.org/10.1111/j.2517-6161.1982.tb01195.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1982.tb01195.x.
  2. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
  3. Stealing part of a production language model, 2024.
  4. Stolen probability: A structural weakness of neural language models. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2191–2197, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.198. URL https://aclanthology.org/2020.acl-main.198.
  5. Closing the curious case of neural text degeneration. ArXiv, abs/2310.01693, 2023. URL https://api.semanticscholar.org/CorpusID:263608672.
  6. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api.semanticscholar.org/CorpusID:258546941.
  7. Low-rank softmax can have unargmaxable classes in theory but rarely in practice. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6738–6758, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.465. URL https://aclanthology.org/2022.acl-long.465.
  8. Distilling the knowledge in a neural network. In Proc. of NeurIPS, 2015. URL https://arXiv.org/abs/1503.02531.
  9. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. ArXiv, abs/2305.02301, 2023. URL https://api.semanticscholar.org/CorpusID:258461606.
  10. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  11. Models in a spelling bee: Language models implicitly learn the character composition of tokens. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5061–5068, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.373. URL https://aclanthology.org/2022.naacl-main.373.
  12. Tom Leinster. How the simplex is a vector space. https://golem.ph.utexas.edu/category/2016/06/how_the_simplex_is_a_vector_sp.html, 2016. Accessed: 2024-03-12.
  13. Prompt injection attack against llm-integrated applications. ArXiv, abs/2306.05499, 2023. URL https://api.semanticscholar.org/CorpusID:259129807.
  14. Language model inversion. ArXiv, abs/2311.13647, 2023.
  15. Distilling transformers into simple neural networks with unlabeled transfer data. ArXiv, abs/1910.01769, 2019. URL https://api.semanticscholar.org/CorpusID:203736526.
  16. Gpt-4 technical report, 2024.
  17. Codefusion: A pre-trained diffusion model for code generation, 2023.
  18. Stealing machine learning models via prediction apis. In USENIX Security Symposium, 2016. URL https://api.semanticscholar.org/CorpusID:2984526.
  19. Attention is all you need. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:13756489.
  20. Breaking the softmax bottleneck: A high-rank RNN language model. In ICLR, 2018. URL https://openreview.net/forum?id=HkwZSG-CZ.
Citations (16)

Summary

  • The paper introduces a method that exploits the softmax bottleneck to extract proprietary model characteristics from API-protected LLMs with minimal queries.
  • It empirically estimates key parameters, such as OpenAI's gpt-3.5-turbo embedding size (~4,096), unveiling a unique model signature.
  • The study highlights critical security risks and urges providers to enhance API safeguards while promoting transparency for LLM users.

Unraveling the Secrets of API-Protected LLMs through Their Softmax Bottleneck

Introduction to the Softmax Bottleneck Phenomenon

Recent advancements have propelled the development and commercialization of LLMs, rendering them accessible primarily through high-level application programming interfaces (APIs). This paper presents a novel approach for unveiling considerable amounts of non-public information about an API-protected LLM by exploiting the inherent characteristics of modern LLMs, specifically the limitations imposed by the softmax bottleneck. By analyzing the linear subspace constraints of model outputs, this methodology reveals a model's image or signature, enabling various applications ranging from identifying the LLM’s hidden size to detecting model updates and estimating output layer parameters with significant cost efficiency.

Theoretical Underpinnings and Empirical Validation

  • Theoretical Framework: At the core of our analysis is the observation that the output layer of LLMs, constrained by the softmax bottleneck, projects model outputs onto a lower-dimensional subspace. Particularly, this paper demonstrates that given a set of limited API queries, it is feasible to intercept the dimensions of this subspace, thereby gaining insights into certain architectural aspects of the model hidden behind the API.
  • Empirical Results: The empirical analysis underscores the method's efficacy, where we estimated the embedding size of OpenAI's gpt-3.5-turbo to be approximately 4,096. This insight was garnered from a remarkably small number of queries, thereby asserting the practical applicability of our approach. Moreover, the research unveils that the image of the LLM can serve as a unique identifier, offering a novel way to distinguish between models or determine updates with a high degree of accuracy.

Practical Implications and Future Prospects

  • LLM Users and Providers: Our findings bridge a critical gap between LLM providers and users, suggesting that users can leverage this method for greater accountability and transparency from providers. Conversely, it alerts providers about potential vulnerabilities, guiding them towards more secure deployment practices.
  • Security and Privacy: Highlighting the potential risks associated with the exposure of proprietary model details, the paper prompts a reevaluation of current API security measures. It proposes measures to mitigate these vulnerabilities without compromising the utility of the API features.
  • Future Research Directions: While the immediate applications of uncovering an LLM's image are profoundly insightful, the paper also lays the groundwork for future explorations. It opens avenues for more in-depth studies on model identification, update detection, and potentially developing more robust, privacy-preserving LLM architectures.

Conclusion

In conclusion, the investigation into the softmax bottleneck's impact on LLM outputs offers a compelling perspective on decoding proprietary information from API-protected models. Despite its potential to influence the current dynamics between LLM users and providers, it fundamentally calls for a balanced approach in utilizing this technique—a way to ensure accountability and transparency while safeguarding the proprietary interests of LLM providers. As the field of generative AI continually evolves, the discourse around ethical considerations, security, and privacy will undoubtedly benefit from the insights and methodologies proposed in this paper.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 14 tweets and received 815 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com