Scope of conditions under which detectable finetuning biases appear or disappear
Characterize the conditions under which narrow finetuning produces or suppresses detectable early-token activation differences, including the roles of dataset composition and homogeneity, mixing with unrelated pretraining data, finetuning modality, and model architecture, to establish when these biases persist or vanish.
References
Additionally, the underlying mechanisms that produce these detectable biases remain unclear, as does the scope of conditions under which they appear or disappear.
— Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
(2510.13900 - Minder et al., 14 Oct 2025) in Limitations and Future Work