Practical risk of proprietary information extraction from output logits
Ascertain the real-world feasibility and severity of extracting proprietary model architecture information, such as hidden dimension, from language model output logits under typical access interfaces, and identify mitigating measures.
References
Meanwhile, the ability to view LLM output logits has been shown to be sufficient for extracting proprietary system information, including the model's hidden dimension, though it is unclear the extent to which this is a practical threat.
— Open Problems in Technical AI Governance
(2407.14981 - Reuel et al., 20 Jul 2024) in Section 4.3.1 “Facilitation of Third-Party Access to Models”