Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
The paper introduces a novel methodology named Kun, designed to improve the instructional tuning of LLMs for Chinese text. This is achieved by circumventing the need for manually annotated datasets, which are typically resource-intensive to produce. The authors propose a self-training approach leveraging instruction back-translation and answer polishment (AP) to generate a high-quality instruction-following dataset from unlabelled sources such as Wudao, Wanjuan, and SkyPile. The primary aim is to automatically curate and refine large datasets that enhance the operational efficiency of LLMs.
Methodology
Kun employs a self-curation strategy that relies on adapting a self-training algorithm, integrating the novel processes of instruction back-translation and answer polishment. This approach effectively bridges the gap between raw instruction data and their corresponding outputs, thereby ensuring more contextually relevant datasets. The method operates independently of traditional LLMs, showcasing the potential for scalability in generating instruction-following capabilities without heavy reliance on manual annotations.
Experiments and Results
Empirical evaluations were conducted utilizing the 6-billion-parameter Yi model, chosen for its open-source accessibility and reliable performance metrics. The experiments span several standard and comprehensive benchmarks, such as C-EVAL and CMMLU, specifically focusing on the effectiveness of the instruction datasets produced through Kun. The human evaluation encompassed 500 prompts from ShareGPT-zh, covering various tasks to compare model outputs with those of other LLMs.
Notably, the experiments demonstrated that the Kun-52k variant exhibited a performance edge over other models, specifically through heightened output quality as ascertained by human evaluation metrics. A critical component of the methodology's success was the identification that scoring the instruction component more significantly affected the final quality than scoring both components collectively.
Contributions
The notable contributions of this paper include:
- Algorithmic Advancement: The introduction of the answer polishment (AP) process improves data coherence and clarity, leading to a more expansive and higher quality dataset for fine-tuning purposes.
- Scalable Data Generation: Over one million Chinese instructional data points were curated from unlabelled data, challenging the traditional need for extensive human labor in data annotation processes.
Implications and Future Directions
Practically, the development of Kun suggests a scalable and efficient route for enhancing the instruction-following capabilities of LLMs, with wide-ranging applicability across diverse fields that rely on LLMs. Theoretically, it empowers further research into data generation methods that operate independently of costly manual data annotation mechanisms. Future developments could explore the implementation of Kun-like strategies to other languages and domains, further expanding the methodological applications of AI within global contexts.
Overall, Kun represents a significant shift in the methodology of training LLMs, presenting a potentially impactful alternative to current data annotation practices. It opens avenues for broader application and scalability in AI, providing a useful template for similar challenges in the ever-expanding field of language processing technologies.