Practical risk of proprietary information extraction from output logits

Ascertain the real-world feasibility and severity of extracting proprietary model architecture information, such as hidden dimension, from language model output logits under typical access interfaces, and identify mitigating measures.

Background

Grey-box access can enable deeper audits but may introduce risks, such as revealing architectural details via output logits.

Understanding the practical threat level of such extraction is necessary to balance research access against model providers’ security and IP concerns.

References

Meanwhile, the ability to view LLM output logits has been shown to be sufficient for extracting proprietary system information, including the model's hidden dimension, though it is unclear the extent to which this is a practical threat.

— Open Problems in Technical AI Governance (2407.14981 - Reuel et al., 2024) in Section 4.3.1 “Facilitation of Third-Party Access to Models”

Practical risk of proprietary information extraction from output logits

Background

References

Related Problems