LOGO -- Long cOntext aliGnment via efficient preference Optimization (2410.18533v1)

Published 24 Oct 2024 in cs.CL and cs.AI

Abstract: Long-context models(LCMs) have shown great potential in processing long input sequences(even more than 100M tokens) conveniently and effectively. With significant progress, recent research has pointed out that LCMs can accurately locate token-level salient information within the context. Yet, the generation performance of these LCMs is far from satisfactory and might result in misaligned responses, such as hallucinations. To enhance the generation capability of LCMs, existing works have investigated the effects of data size and quality for both pre-training and instruction tuning. Though achieving meaningful improvement, previous methods fall short in either effectiveness or efficiency. In this paper, we introduce LOGO(Long cOntext aliGnment via efficient preference Optimization), a training strategy that first introduces preference optimization for long-context alignment. To overcome the GPU memory-bound issue caused by the long sequence, LOGO employs a reference-free preference optimization strategy and adopts a position synthesis method to construct the training data. By training with only 0.3B data on a single 8$\times$A800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K model to achieve comparable performance with GPT-4 in real-world long-context tasks while preserving the model's original capabilities on other tasks, e.g., LLMing and MMLU. Moreover, LOGO can extend the model's context window size while enhancing its generation performance.

References (53)

Authors (5)

Zecheng Tang (19 papers)
Zechen Sun (2 papers)
Juntao Li (89 papers)
Qiaoming Zhu (15 papers)
Min Zhang (630 papers)

Summary

LOGO — Long Context Alignment via Efficient Preference Optimization

In this paper, the authors introduce a novel training strategy called LOGO, which stands for Long cOntext aliGnment via efficient preference Optimization, aimed at improving the performance of Long-context models (LCMs) in generating aligned responses for extensive input sequences. LCMs, despite their capability to handle and locate salient information within lengthy contexts, often falter in generating coherent and accurate outputs due to instances of misalignment, such as hallucinations or not following instructions. This research proposes a systematic approach to enhance the generative capabilities of LCMs without compromising their inherent strengths.

Contribution and Methodology

The primary contribution of the paper is a method that incorporates an efficient preference optimization strategy into the training of LLMs for better long-context alignment. The authors highlight two main challenges of learning with long-contexts: (1) the predominance of context over the prediction portion during training, diluting the effectiveness of Cross-Entropy (CE) loss for optimizing generation capabilities; (2) the substantial GPU memory requirements when processing these extensive inputs.

To address these, LOGO introduces the following:

Preference Optimization Strategy: LOGO employs a preference optimization strategy, not requiring a reference model, aimed at maximizing the model's likelihood towards preferred responses versus dis-preferred (misaligned) ones by adjusting the reward-based scores during learning.
Modified Training Objective: The training objective emphasizes distinguishing between correct outputs and misaligned responses such as hallucinations. It uses multiple dis-preference instances to refine the model further.
Data Construction and Positional Index Synthesis: The pipeline constructs training samples robustly, involving chunking long input contexts to manage memory constraints efficiently. This approach uses positional index synthesis, which allows the model to simulate the effects of longer inputs without actually elongating the sequence length in the training data, thereby preserving memory resources.

Experimental Results and Evaluation

Extensively evaluating LOGO across various tasks demonstrated its efficacy. Key findings include:

Performance Parity with Closed-source Models: The LOGO-augmented Llama-3-8B-Instruct-80K model was able to achieve comparable performance to proprietary models like GPT-4 in handling long-context tasks, a substantial achievement for open-source initiatives.
Efficient Resource Utilization: Training required only 0.3B tokens over a span of 16 hours on an 8×A800 GPU machine, demonstrating LOGO's efficiency compared to traditional approaches demanding significantly more resources.
Enhancement in Variety of Tasks: Alongside long-context comprehension and generation tasks, LOGO maintained efficiency and enhanced performance on LLMing tasks (e.g., MMLU) without hindering short-context task performance, addressing the alignment tax often imposed by long context training.

Broader Implications and Future Work

LOGO sets a new paradigm for training LCMs by mitigating misalignment issues through robust methodology rather than just scaling context length or increasing the volume and quality of training data. The implications for AI research are significant as it suggests new avenues for training strategies that do not heavily rely on computational and data resources.

Future developments may look into refining models' error pattern recognition and context comprehension further or extending the application of LOGO-like strategies to various other domains or architectures. Understanding and optimizing long-context processing could unlock new capabilities and applications for LLMs, such as more advanced text summarization and processing of large-scale data inputs across domains like bioinformatics or legal document analysis.

In conclusion, this paper showcases an instrumental advancement in the alignment of LCMs for long-context tasks, offering insights and methodologies appealing for both academic exploration and practical deployments of LLMs.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/TheTuringPost/status/1852457653144104987

https://twitter.com/arXivGPT/status/1850604953213047128

https://twitter.com/crypto_ai_girly/status/1851124098811748435

https://twitter.com/arXivGPT/status/1850242690677391647