Characterizing factors that influence the benefits of continual pre-training
Determine how the performance gains from continual pre-training of pre-trained language models vary with (i) the amount of unlabeled data used for continual pre-training, (ii) the source domain of that unlabeled data, (iii) the downstream evaluation task, (iv) the resource richness of the target languages, and (v) the specific target model being adapted, particularly in the context of domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT).
References
Moreover, it is unknown how the benefit of continual pre-training may vary with factors like the amount of unlabeled corpus, the source domain itself, the evaluation task, the resource richness of the target languages, and the trained target model .
— AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text
(2503.18247 - Belay et al., 24 Mar 2025) in Section 1 (Introduction)