How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation? (2401.17514v2)

Published 31 Jan 2024 in cs.CL

Abstract: Recent breakthroughs in scale have enabled the emergence of powerful generative LLMs, and the ability to fine-tune these models on various tasks by casting them into prompts or instructions. In this landscape, the problem of Unsupervised Domain Adaptation (UDA), or the problem of leveraging knowledge from a labeled source domain to an unlabeled target domain, has been left behind, with recent UDA methods still addressing discriminative classification. In particular, two popular UDA approaches, involving Continued Pre-Training (CPT) and learning domain invariant representations, have been under-explored in the generative setting, signaling a gap. In this work, we evaluate the utility of CPT for generative UDA. We first perform an empirical evaluation to measure the trade-offs between CPT and strong methods promoting domain invariance. We further evaluate how well the benefits of CPT extend to different architectures, tuning methods and data regimes. We then motivate the use of CPT by studying to what degree it benefits classification performance on the target domain. Finally, we attempt to understand the mechanism behind which CPT improves classification performance on the unlabeled target domain. Our findings suggest that a implicitly learns the downstream task while predicting masked words informative to that task. Our work connects the body of UDA research with that of instruction tuning, enabling an initial step towards a wider applicability of modern LLMs.

PDF HTML Abstract

Introduction

Addressing the challenge of domain adaptation in LLMs (LMs), a new paradigm within unsupervised domain adaptation (UDA) has emerged, called prompt-based UDA. This approach utilizes prompt templates to convert discriminative predictions into generative tasks, allowing for the adaptation to the target domain without relying on domain-invariant representations or extended pre-training.

Methodology

The paper introduces the FEUDA (Frustratingly Easy UDA) method, which comprises two instruction-tuning tasks. The initial task involves masked LLMing (MLM) using unlabeled data from both the source and target domains. The subsequent task leverages supervised instruction-tuning with labeled source data for classification. The integration of these tasks effectively bridges the gap between pre-training and adaptation, enhancing the LM's performance on the target domain.

Results

Extensive experiments on 24 real-world domain pairs demonstrate FEUDA's superiority over traditional domain-invariant methods. A noteworthy finding is that MLM within FEUDA augments the model's semantic and background knowledge of a domain, contributing positively to downstream classification tasks. The research reveals significant improvements in target-domain classification performance, even in few-shot learning scenarios and across various models and adaptation techniques.

Analysis and Extensions

The authors delve into the effects of MLM on UDA by analyzing the importance of masked words selection and varying masking rates. They find that the presence of both informative and uninformative words, identified through PMI, is crucial for achieving high classification accuracy. Additionally, the paper explores the impact of different masking rates, highlighting that optimal performance is attained at a 15% masking rate, while higher rates negatively affect the target domain's classification.

Conclusion

The paper concludes that domain invariance is not a necessity in prompt-based UDA—an insight that sets the stage for future explorations. FEUDA stands as a robust and competitive method, providing a simple yet effective solution for UDA challenges in LMs. As researchers and practitioners aim for better adaptability in real-world applications, FEUDA offers a promising direction.

PDF Markdown Bookmark Chat (Pro)

References (73)

Authors (3)

Rheeya Uppaal (8 papers)
Yixuan Li (183 papers)
Junjie Hu (111 papers)

Citations (2)

View on Semantic Scholar