Reformatted Alignment (2402.12219v2)

Published 19 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The quality of finetuning data is crucial for aligning LLMs with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. This approach minimizes human annotation, hallucination, and the difficulty in scaling, remaining orthogonal to existing alignment techniques. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs. Encouragingly, without introducing any additional data or advanced training techniques, and merely by reformatting the response, LLaMA-2-13B's mathematical reasoning ability on GSM8K can be improved from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of ReAlign data yields a 67% boost in general alignment ability measured by the Alpaca dataset. This work highlights the need for further research into the science and mechanistic interpretability of LLMs. We have made the associated code and data publicly accessible to support future studies at https://github.com/GAIR-NLP/ReAlign.

PDF Abstract

Enhancing LLM Alignment with Reformatted Instruction Data

Introduction

The endeavor to align LLMs with human values and intentions has garnered significant interest within the artificial intelligence research community. Traditional methods, while effective, suffer from scalability challenges and potential factual inaccuracies. This paper introduces a novel approach, termed Reformatted Alignment, which aims to refine the quality of instruction data to better resonate with human values and specifications without the need for extensive human annotation or the risk of incorporating errors from LLM-generated data.

R EA LIGN Methodology

The Reformatted Alignment technique presents a three-fold process designed to enhance the alignment of LLMs through improved instruction data quality:

Criteria Definition: This initial step involves defining the desired criteria for responses in various scenarios, utilising a natural language format. The process resulted in the establishment of criteria for 46 distinct scenarios, facilitating varied and comprehensive instruction data enhancement.
Retrieval Augmentation: For knowledge-intensive tasks, this phase expands the knowledge base by incorporating relevant external information, thereby heightening the factuality and informativeness of the responses.
Reformatting: The culminating step realigns the responses with the predefined criteria and integrated evidence, ensuring the outputs are both structured and substantiated. This method marries human preference articulation with LLM generative capabilities, fostering instruction data closely aligned with human values.

Experimental Validation

The Reformatted Alignment approach was empirically validated across general and specialized datasets, demonstrating notable improvements in LLM performance across various benchmarks:

Mathematical reasoning saw a remarkable increase in accuracy on the GSM8K dataset, rising from 46.77% to 56.63% for LLaMA-2-13B, underscoring the method's effectiveness in enhancing specific cognitive abilities of LLMs.
General alignment ability witnessed a significant boost, with a mere 5% of Reformatted Alignment data resulting in a 67% improvement in alignment capabilities as measured by the Alpaca dataset, indicating substantial gains with minimal data alteration.

Implications and Future Directions

This research underscores the critical role of data quality in aligning LLMs with human values and the potential of Reformatted Alignment as a scalable and effective method to achieve this goal. The significant improvements observed across diverse datasets and benchmarks not only highlight the method's effectiveness but also its adaptability to various types of instruction data.

The practical implications of this research are multifold. By enhancing the alignment of LLMs without the need for extensive human intervention or the integration of potentially erroneous LLM-generated data, Reformatted Alignment paves the way for more efficient and accurate development of aligned LLMs. This approach holds promise for a wide range of applications, from tailored content generation to advanced problem-solving, where the alignment with human intent and values is paramount.

Looking ahead, this paper proposes several avenues for future research, including expanding the scope of task categories covered by the Reformatted Alignment method and exploring its applicability to multi-turn conversation scenarios. The open-source availability of the associated code and datasets further facilitates the exploration of these and other research trajectories, contributing to the advancement of the field.

Conclusion

The Reformatted Alignment method represents a substantial step forward in the effort to align LLMs with human values through improved instruction data quality. Its simple yet effective approach yields significant enhancements in both general and specific alignment capabilities, heralding a new era in the development of human-aligned artificial intelligence.