Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

Published 14 Mar 2024 in cs.CL | (2403.09167v1)

Abstract: The efficacy of LLMs is heavily dependent on the quality of the underlying data, particularly within specialized domains. A common challenge when fine-tuning LLMs for domain-specific applications is the potential degradation of the model's generalization capabilities. To address these issues, we propose a two-stage approach for the construction of production prompts designed to yield high-quality data. This method involves the generation of a diverse array of prompts that encompass a broad spectrum of tasks and exhibit a rich variety of expressions. Furthermore, we introduce a cost-effective, multi-dimensional quality assessment framework to ensure the integrity of the generated labeling data. Utilizing a dataset comprised of service provider and customer interactions from the real estate sector, we demonstrate a positive correlation between data quality and model performance. Notably, our findings indicate that the domain-specific proficiency of general LLMs can be enhanced through fine-tuning with data produced via our proposed method, without compromising their overall generalization abilities, even when exclusively domain-specific data is employed for fine-tuning.

Abstract PDF HTML Upgrade to Chat

References (22)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a two-stage evolutionary prompt data production method that enhances domain-specific fine-tuning while preventing capability collapse.
A weighted random sampling strategy and multifaceted quality evaluation framework ensure balanced data and robust model optimization.
Experimental results demonstrate significant task performance gains using domain-specific data, offering a scalable solution for LLM fine-tuning.

Dial-insight: Fine-tuning LLMs with High-Quality Domain-Specific Data Preventing Capability Collapse

Introduction

The paper "Dial-insight: Fine-tuning LLMs with High-Quality Domain-Specific Data Preventing Capability Collapse" (2403.09167) addresses a critical challenge in applying LLMs to specialized domains: maintaining generalization capabilities while enhancing domain-specific proficiencies. This is particularly relevant for applications such as real estate services, where models must handle diverse, complex tasks without degrading their broader linguistic competencies. The authors propose a novel two-stage methodology for generating high-quality data prompts and a robust quality evaluation framework. This helps in fine-tuning LLMs with domain-specific data without the typical loss of generalization abilities.

Two-Stage Evolutionary Prompt Data Production Method

The proposed two-stage method for prompt generation is designed to construct complex, high-quality instructions that enhance model training in specialized domains.

Stage 1. Task Instruction Evolution: This stage involves generating seed instructions using a self-instructive approach, followed by a manual curation process to enhance quality. The instructions are then evolved semi-automatically with a combination of LLMs and evolutionary algorithms, augmented through automated and human-guided refinement processes.
Figure 1: The two-stage Prompt evolution method.
Stage 2. Prompt Evolution: This stage further refines the prompts by integrating detailed format stipulations and task directives, inspired by real-world professional scenarios. The evolution method ensures a balance between complexity and practicality, resulting in prompts that are both comprehensive and applicable to business contexts.
Figure 2: Stage 2: Prompt Evolution method

Data Label Generation and Quality Evaluation

To ensure balanced data distribution and targeted model capability development, the authors employ a weighted random sampling method for task instruction selection. This strategic approach enables diverse task representations and scene-specific enhancements, crucial for robust NLP model development. The quality evaluation system developed assesses data across multiple dimensions, including complexity, richness, and label quality. It provides actionable insights into the model's optimization direction, validated by a positive correlation with model training outcomes.

Experimental Results

Experiments demonstrate that fine-tuning models with data produced by this method significantly improves domain-specific task performance without sacrificing general linguistic abilities. Key findings from controlled experiments indicate that as data quality metrics such as richness and label quality improve, so does the domain model's task performance. Unlike previous methodologies requiring general data to prevent capability collapse, this approach maintains robust general capabilities using only domain-specific data.

Figure 3: The distribution statistics of task types (a) and the distribution statistics of task output formats (b) in the test dataset. Please note that the units on the vertical axis represent the number of test data samples.

Conclusion

The paper contributes significantly to the domain adaptation of LLMs by offering a comprehensive framework for prompt generation and data quality evaluation. The two-stage prompt evolution method combined with cost-effective quality assessment metrics leads to enhanced domain-specific model performance while preserving general capabilities. This work has profound implications for real-world applications where domain-specific proficiency must coexist with general language competencies, demonstrating a scalable solution to a long-standing challenge in LLM fine-tuning. Future work could explore broader domain applications and further optimization of quality metrics to achieve even more efficient and effective fine-tuning.

Markdown