Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization (2402.09320v1)

Published 14 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs rely on Human Preference Alignment (HPA) to ensure the generation of safe content. Due to the heavy cost associated with fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM decoding with external auxiliary methods. However, these methods do not essentially enhance the LLM itself. In this paper, we rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after In-context Learning (ICL). Accordingly, we propose a novel approach called In-Context Direct Preference Optimization (ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with ICL, generating well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded scorer, both offering benefits. Extensive experiments show its effectiveness, particularly in outperforming two fine-tuning-free baselines, and it exhibits competitiveness with SFT + LoRA. We also conduct detailed analyses to offer comprehensive insights into ICDPO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. Black-box prompt optimization: Aligning large language models without model training. arXiv preprint arXiv:2311.04155.
  4. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4005–4019, Toronto, Canada. Association for Computational Linguistics.
  5. Safe rlhf: Safe reinforcement learning from human feedback. arXiv preprint arXiv:2310.12773.
  6. RAFT: Reward ranked finetuning for generative foundation model alignment. Transactions on Machine Learning Research.
  7. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  8. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
  9. Cyclealign: Iterative distillation from black-box llm to white-box models for better human alignment. arXiv preprint arXiv:2310.16271.
  10. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  11. Vaccine: Perturbation-aware alignment for large language model. arXiv preprint arXiv:2402.01109.
  12. Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564.
  13. Mistral 7b. arXiv preprint arXiv:2310.06825.
  14. Contrastive decoding: Open-ended text generation as optimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12286–12312, Toronto, Canada. Association for Computational Linguistics.
  15. Rain: Your language models can align themselves without finetuning. In International Conference on Learning Representations.
  16. Statistical rejection sampling improves preference optimization. arXiv preprint arXiv:2309.06657.
  17. Aligning large language models with human preferences through representation engineering. arXiv preprint arXiv:2312.15997.
  18. Controlled decoding from language models. arXiv preprint arXiv:2310.17022.
  19. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  20. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  21. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  22. Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  23. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492.
  24. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021.
  25. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  26. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  27. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl.
  28. Label words are anchors: An information flow perspective for understanding in-context learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9840–9855, Singapore. Association for Computational Linguistics.
  29. Making large language models better reasoners with alignment. arXiv preprint arXiv:2309.02144.
  30. Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926.
  31. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  32. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  33. Reasons to reject? aligning language models with judgments. arXiv preprint arXiv:2312.14591.
  34. Iterative forward tuning boosts in-context learning in language models. arXiv preprint arXiv:2305.13016.
  35. Not all demonstration examples are equally beneficial: Reweighting demonstration examples for in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13209–13221, Singapore. Association for Computational Linguistics.
  36. Constructive large language models alignment with diverse feedback. arXiv preprint arXiv:2310.06450.
  37. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
  38. Knowledgeable preference alignment for llms in domain-specific question answering. arXiv preprint arXiv:2311.06503.
  39. Can we edit factual knowledge by in-context learning? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4862–4876, Singapore. Association for Computational Linguistics.
  40. Principled reinforcement learning with human feedback from pairwise or k𝑘kitalic_k-wise comparisons. arXiv preprint arXiv:2301.11270.
  41. Iterative data smoothing: Mitigating reward overfitting and overoptimization in rlhf. arXiv preprint arXiv:2401.16335.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Feifan Song (14 papers)
  2. Yuxuan Fan (10 papers)
  3. Xin Zhang (904 papers)
  4. Peiyi Wang (48 papers)
  5. Houfeng Wang (43 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com