Latent Harmful Knowledge in Filtered Bio-Foundation Models
Ascertain whether hidden-layer representations in open-weight bio-foundation models that were pretrained with eukaryotic viral sequences excluded still encode misuse-enabling biological knowledge and can be elicited via simple probing methods such as linear classifiers.
References
Simple methods including probing have not yet been tried on these models, leaving open the possibility that latent representations still encode the necessary knowledge to enable misuse.
— Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
(2510.27629 - Wei et al., 31 Oct 2025) in Section 1 (Introduction), paragraph on elicitation practices (page 2)