Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics (2402.12036v2)

Published 19 Feb 2024 in cs.CL

Abstract: Recent advances in pre-trained LLMing have facilitated significant progress across various NLP tasks. Word masking during model training constitutes a pivotal component of LLMing in architectures like BERT. However, the prevalent method of word masking relies on random selection, potentially disregarding domain-specific linguistic attributes. In this article, we introduce an innovative masking approach leveraging genre and topicality information to tailor LLMs to specialized domains. Our method incorporates a ranking process that prioritizes words based on their significance, subsequently guiding the masking procedure. Experiments conducted using continual pre-training within the legal domain have underscored the efficacy of our approach on the LegalGLUE benchmark in the English language. Pre-trained LLMs and code are freely available for use.

References (23)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics (2402.12036v2)

Summary

Related Papers