Effectiveness of Pretraining Data Filtering Against Adversarial Fine-Tuning
Determine the effectiveness of pretraining data filtering that excludes dual-use biological sequences (such as eukaryotic viral data) in open-weight bio-foundation models at mitigating dual-use risks, particularly under a threat model in which adversaries can fine-tune the released model weights for malicious use.
References
However, the effectiveness of such an approach remains unclear, particularly against determined actors who might fine-tune these models for malicious use.
— Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
(2510.27629 - Wei et al., 31 Oct 2025) in Abstract (page 1)