Preventing harmful fine‑tuning while allowing benign adaptation
Develop technical methods that make models resistant to fine‑tuning or other modifications for harmful tasks while preserving the ability to fine‑tune for legitimate uses.
References
An open question is whether there exist technical methods that restrict a model's amenability to being fine-tuned (or modified through other methods) for harmful uses, while retaining the ability to be modified for benign uses.
— Open Problems in Technical AI Governance
(2407.14981 - Reuel et al., 20 Jul 2024) in Section 6.4.2 “Modification-Resistant Models”