Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

Published 14 Mar 2024 in cs.CL | (2403.09167v1)

Abstract: The efficacy of LLMs is heavily dependent on the quality of the underlying data, particularly within specialized domains. A common challenge when fine-tuning LLMs for domain-specific applications is the potential degradation of the model's generalization capabilities. To address these issues, we propose a two-stage approach for the construction of production prompts designed to yield high-quality data. This method involves the generation of a diverse array of prompts that encompass a broad spectrum of tasks and exhibit a rich variety of expressions. Furthermore, we introduce a cost-effective, multi-dimensional quality assessment framework to ensure the integrity of the generated labeling data. Utilizing a dataset comprised of service provider and customer interactions from the real estate sector, we demonstrate a positive correlation between data quality and model performance. Notably, our findings indicate that the domain-specific proficiency of general LLMs can be enhanced through fine-tuning with data produced via our proposed method, without compromising their overall generalization abilities, even when exclusively domain-specific data is employed for fine-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Yi: Open foundation models by 01.ai.
  3. Qwen technical report. arXiv preprint arXiv:2309.16609.
  4. Shai: A large language model for asset management.
  5. Reinforcement learning from statistical feedback: the journey from ab testing to ant testing.
  6. Measuring massive multitask language understanding. Cornell University - arXiv,Cornell University - arXiv.
  7. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Advances in Neural Information Processing Systems, 36.
  8. Sajed Jalil. 2023. The transformative influence of large language models on software development. arXiv preprint arXiv:2311.16429.
  9. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  10. Cmmlu: Measuring massive multitask language understanding in chinese. arXiv preprint arXiv:2306.09212.
  11. From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. arXiv preprint arXiv:2308.12032.
  12. Csds: A fine-grained chinese dataset for customer service dialogue summarization. arXiv preprint arXiv:2108.13139.
  13. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.
  14. Wanjuan-cc: A safe and high-quality open-sourced english webtext dataset. arXiv preprint arXiv:2402.19282.
  15. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv: Learning.
  16. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  17. Self-instruct: Aligning language model with self generated instructions.
  18. Chathome: Development and evaluation of a domain-specific language model for home renovation. arXiv preprint arXiv:2307.15290.
  19. Wizardlm: Empowering large language models to follow complex instructions.
  20. Glm-130b: An open bilingual pre-trained model.
  21. When scaling meets llm finetuning: The effect of data, model and finetuning method. arXiv preprint arXiv:2402.17193.
  22. A survey of large language models. arXiv preprint arXiv:2303.18223.
Citations (2)

Summary

  • The paper introduces a two-stage evolutionary prompt data production method that enhances domain-specific fine-tuning while preventing capability collapse.
  • A weighted random sampling strategy and multifaceted quality evaluation framework ensure balanced data and robust model optimization.
  • Experimental results demonstrate significant task performance gains using domain-specific data, offering a scalable solution for LLM fine-tuning.

Dial-insight: Fine-tuning LLMs with High-Quality Domain-Specific Data Preventing Capability Collapse

Introduction

The paper "Dial-insight: Fine-tuning LLMs with High-Quality Domain-Specific Data Preventing Capability Collapse" (2403.09167) addresses a critical challenge in applying LLMs to specialized domains: maintaining generalization capabilities while enhancing domain-specific proficiencies. This is particularly relevant for applications such as real estate services, where models must handle diverse, complex tasks without degrading their broader linguistic competencies. The authors propose a novel two-stage methodology for generating high-quality data prompts and a robust quality evaluation framework. This helps in fine-tuning LLMs with domain-specific data without the typical loss of generalization abilities.

Two-Stage Evolutionary Prompt Data Production Method

The proposed two-stage method for prompt generation is designed to construct complex, high-quality instructions that enhance model training in specialized domains.

  • Stage 1. Task Instruction Evolution: This stage involves generating seed instructions using a self-instructive approach, followed by a manual curation process to enhance quality. The instructions are then evolved semi-automatically with a combination of LLMs and evolutionary algorithms, augmented through automated and human-guided refinement processes. Figure 1

    Figure 1: The two-stage Prompt evolution method.

  • Stage 2. Prompt Evolution: This stage further refines the prompts by integrating detailed format stipulations and task directives, inspired by real-world professional scenarios. The evolution method ensures a balance between complexity and practicality, resulting in prompts that are both comprehensive and applicable to business contexts. Figure 2

    Figure 2: Stage 2: Prompt Evolution method

Data Label Generation and Quality Evaluation

To ensure balanced data distribution and targeted model capability development, the authors employ a weighted random sampling method for task instruction selection. This strategic approach enables diverse task representations and scene-specific enhancements, crucial for robust NLP model development. The quality evaluation system developed assesses data across multiple dimensions, including complexity, richness, and label quality. It provides actionable insights into the model's optimization direction, validated by a positive correlation with model training outcomes.

Experimental Results

Experiments demonstrate that fine-tuning models with data produced by this method significantly improves domain-specific task performance without sacrificing general linguistic abilities. Key findings from controlled experiments indicate that as data quality metrics such as richness and label quality improve, so does the domain model's task performance. Unlike previous methodologies requiring general data to prevent capability collapse, this approach maintains robust general capabilities using only domain-specific data. Figure 3

Figure 3: The distribution statistics of task types (a) and the distribution statistics of task output formats (b) in the test dataset. Please note that the units on the vertical axis represent the number of test data samples.

Conclusion

The paper contributes significantly to the domain adaptation of LLMs by offering a comprehensive framework for prompt generation and data quality evaluation. The two-stage prompt evolution method combined with cost-effective quality assessment metrics leads to enhanced domain-specific model performance while preserving general capabilities. This work has profound implications for real-world applications where domain-specific proficiency must coexist with general language competencies, demonstrating a scalable solution to a long-standing challenge in LLM fine-tuning. Future work could explore broader domain applications and further optimization of quality metrics to achieve even more efficient and effective fine-tuning.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.