Residual issues after mixing pretraining data into narrow finetuning corpora
Ascertain whether mixing unrelated pretraining data into a narrow finetuning corpus fully eliminates artifacts beyond the measured activation-difference bias and identify any remaining finetuning-induced issues that persist under such mixing.
References
We suspect that these biases are a form of overfitting and find that mixing pretraining data into the finetuning corpus is enough to mostly remove this bias, but cannot be sure that there are no further issues.
— Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
(2510.13900 - Minder et al., 14 Oct 2025) in Abstract