Characterizing factors that influence the benefits of continual pre-training

Determine how the performance gains from continual pre-training of pre-trained language models vary with (i) the amount of unlabeled data used for continual pre-training, (ii) the source domain of that unlabeled data, (iii) the downstream evaluation task, (iv) the resource richness of the target languages, and (v) the specific target model being adapted, particularly in the context of domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT).

Background

The paper investigates continual pre-training strategies—Domain-Adaptive Pre-Training (DAPT) and Task-Adaptive Pre-Training (TAPT)—for African languages and social media text, noting that prior work demonstrates benefits but leaves open how those benefits depend on key variables.

In the introduction, the authors explicitly state that it is unknown how the magnitude of benefits from continual pre-training may vary as a function of factors such as unlabeled corpus size, domain choice, downstream task, language resource level, and the target model. This uncertainty motivates their empirical study on AfriSocial and AfroXLMR-Social.

References

Moreover, it is unknown how the benefit of continual pre-training may vary with factors like the amount of unlabeled corpus, the source domain itself, the evaluation task, the resource richness of the target languages, and the trained target model .

AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text  (2503.18247 - Belay et al., 24 Mar 2025) in Section 1 (Introduction)