Reliability of identifying misuse in closed foundation model monitoring for vulnerability detection

Establish whether monitoring and moderation of closed foundation model services can reliably identify illicit use of such models for automated vulnerability detection given the dual-use nature of security testing.

Background

The authors compare open and closed release paradigms and note that even with better monitoring capabilities in closed systems, distinguishing legitimate security testing from malicious automated vulnerability detection is nontrivial.

This reliability question affects the comparative assessment of marginal risk between open and closed model releases and highlights the need for evidence about the effectiveness of monitoring systems deployed by closed model providers.

References

In considering marginal risks relative to closed foundations, while closed foundation models can be better monitored for misuse, it is not clear if such uses will be reliability identified.

— On the Societal Impact of Open Foundation Models (2403.07918 - Kapoor et al., 27 Feb 2024) in Section: Risks of Open Foundation Models; Table: Instantiation of our risk analysis framework (Cybersecurity — Evidence of marginal risk)

Reliability of identifying misuse in closed foundation model monitoring for vulnerability detection

Sponsor

Background

References

Related Problems