CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment (2403.16649v2)
Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning LLMs with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used ``Helpful and Harmless'' dataset.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
- Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 3.
- Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960.
- Brio: Bringing order to abstractive summarization. arXiv preprint arXiv:2203.16804.
- Quark: Controllable text generation with reinforced unlearning. Advances in neural information processing systems, 35:27591–27609.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492.
- Stanford alpaca: An instruction-following llama model.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Ethical and social risks of harm from language models (2021). arXiv preprint arXiv:2112.04359.
- Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
- A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys.
- Calibrating sequence likelihood improves conditional language generation. arXiv preprint arXiv:2210.00045.
- Click: Controllable text generation with sequence likelihood contrastive learning. arXiv preprint arXiv:2306.03350.
- Secrets of rlhf in large language models part i: Ppo. arXiv preprint arXiv:2307.04964.
- Feiteng Fang (12 papers)
- Liang Zhu (22 papers)
- Min Yang (239 papers)
- Xi Feng (17 papers)
- Jinchang Hou (3 papers)
- Qixuan Zhao (4 papers)
- Chengming Li (28 papers)
- Xiping Hu (46 papers)
- Ruifeng Xu (66 papers)