DSG-KD: Knowledge Distillation from Domain-Specific to General Language Models

Published 23 Sep 2024 in cs.CL and cs.AI | (2409.14904v1)

Abstract: The use of pre-trained LLMs fine-tuned to address specific downstream tasks is a common approach in NLP. However, acquiring domain-specific knowledge via fine-tuning is challenging. Traditional methods involve pretraining LLMs using vast amounts of domain-specific data before fine-tuning for particular tasks. This study investigates emergency/non-emergency classification tasks based on electronic medical record (EMR) data obtained from pediatric emergency departments (PEDs) in Korea. Our findings reveal that existing domain-specific pre-trained LLMs underperform compared to general LLMs in handling N-lingual free-text data characteristics of non-English-speaking regions. To address these limitations, we propose a domain knowledge transfer methodology that leverages knowledge distillation to infuse general LLMs with domain-specific knowledge via fine-tuning. This study demonstrates the effective transfer of specialized knowledge between models by defining a general LLM as the student model and a domain-specific pre-trained model as the teacher model. In particular, we address the complexities of EMR data obtained from PEDs in non-English-speaking regions, such as Korea, and demonstrate that the proposed method enhances classification performance in such contexts. The proposed methodology not only outperforms baseline models on Korean PED EMR data, but also promises broader applicability in various professional and technical domains. In future works, we intend to extend this methodology to include diverse non-English-speaking regions and address additional downstream tasks, with the aim of developing advanced model architectures using state-of-the-art KD techniques. The code is available in https://github.com/JoSangYeon/DSG-KD.