Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment (2403.16649v2)

Published 25 Mar 2024 in cs.AI

Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning LLMs with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used ``Helpful and Harmless'' dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  5. Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767.
  6. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  7. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  8. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 3.
  9. Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960.
  10. Brio: Bringing order to abstractive summarization. arXiv preprint arXiv:2203.16804.
  11. Quark: Controllable text generation with reinforced unlearning. Advances in neural information processing systems, 35:27591–27609.
  12. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  13. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  14. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
  15. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  16. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492.
  17. Stanford alpaca: An instruction-following llama model.
  18. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  19. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  20. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  21. Ethical and social risks of harm from language models (2021). arXiv preprint arXiv:2112.04359.
  22. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
  23. A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys.
  24. Calibrating sequence likelihood improves conditional language generation. arXiv preprint arXiv:2210.00045.
  25. Click: Controllable text generation with sequence likelihood contrastive learning. arXiv preprint arXiv:2306.03350.
  26. Secrets of rlhf in large language models part i: Ppo. arXiv preprint arXiv:2307.04964.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Feiteng Fang (12 papers)
  2. Liang Zhu (22 papers)
  3. Min Yang (239 papers)
  4. Xi Feng (17 papers)
  5. Jinchang Hou (3 papers)
  6. Qixuan Zhao (4 papers)
  7. Chengming Li (28 papers)
  8. Xiping Hu (46 papers)
  9. Ruifeng Xu (66 papers)