Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging (2310.11564v1)

Published 17 Oct 2023 in cs.CL

Abstract: While Reinforcement Learning from Human Feedback (RLHF) aligns LLMs with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong single-objective baselines, we show that we can achieve personalized alignment by decomposing preferences into multiple dimensions. These dimensions are defined based on personalizations that are declared as desirable by the user. In this work, we show that they can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging. The code is available at https://github.com/joeljang/RLPHF.

Analyzing Personalized Alignment of LLMs via Parameter Merging

The paper "Personalized Soups: Personalized LLM Alignment via Post-hoc Parameter Merging" proposes a novel approach to align LLMs with diverse individual preferences. Unlike standard Reinforcement Learning from Human Feedback (RLHF), which generally optimizes for aggregate human preferences, this paper addresses the Reinforcement Learning from Personalized Human Feedback (RLP\mathcal{P}HF) by modeling it as a Multi-Objective Reinforcement Learning (MORL) problem. The significance of this work lies in its innovative methodology, termed "Personalized Soups," and its implications for the future interplay between AI systems and human nuances.

Core Contributions

  1. MORL for Personalized Alignment: The authors propose handling the alignment to individual preferences as a MORL problem, which allows the model to dynamically adjust the weightage of multiple, sometimes conflicting, human preferences. This multi-objective approach contrasts with single-objective models and reveals a path to a more nuanced, user-driven interaction with LLMs.
  2. Personalized Soups: A key novelty of this paper is the introduction of Personalized Soups, a method that enables post-hoc parameter merging. Here, the model parameters are not optimized simultaneously for all preferences. Instead, each preference is trained independently using Proximal Policy Optimization (PPO), and parameters are merged at inference time. This modular approach reduces the computational complexity from exponential to linear concerning the diverse set of preferences.
  3. Empirical Validation: The paper presents empirical results that demonstrate the efficacy of transforming the alignment of LLMs to human preferences into a MORL problem, achieving more personalized and adaptable model outputs compared to traditional approaches like fine-tuning, RLHF, and simple prompting.

Theoretical and Practical Implications

  • Scalability and Flexibility: One of the standout implications of this research is its scalability. Traditional methods require substantial retraining when new preferences or combinations thereof are introduced. Conversely, Personalized Soups offer a dynamic and flexible framework where integrating a new preference involves training a new model distinct to that preference and merging it parameter-wise, avoiding the need for full retraining.
  • Future Directions in Personalization: The outlined approach could drive substantial progress in how LLMs are employed in personalized settings. For instance, personalization in customer support AI, educational tools, and interactive learning platforms can be fine-grained to cater to individual learning styles and preferences dynamically.
  • Challenge of Fairness and Bias: While promising, the model raises concerns and opens avenues for research into fairness and bias. As models become more personalized, ensuring that personalization does not amplify biases inherent in the training data or oversimplifies complex user interactions becomes crucial.

Summation

This paper sets a foundational step towards tailoring AI systems to humans' multi-faceted needs by innovating on current reinforcement learning paradigms to accommodate personalized feedback fully. Future work could explore broader applications, additional modalities of personalization, and deeper integration with human feedback loops. The scalability of the proposed approach indicates a substantial transformation potential in how AI can interface with human preferences, but it also necessitates vigilance in ethical AI practices to ensure equitable outcomes for all user groups. As AI continues to evolve, methodologies like those proposed here are pivotal in crafting systems that genuinely reflect the diverse fabric of human experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. PENS: A dataset and generic framework for personalized news headline generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  82–92, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.7. URL https://aclanthology.org/2021.acl-long.7.
  2. Training a helpful and harmless assistant with reinforcement learning from human feedback. ArXiv, abs/2204.05862, 2022a. URL https://api.semanticscholar.org/CorpusID:248118878.
  3. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022b.
  4. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  5. Dynamic planning in open-ended dialogue using reinforcement learning. arXiv preprint arXiv:2208.02294, 2022.
  6. Cold fusion: Collaborative descent for distributed multitask finetuning. arXiv preprint arXiv:2212.01378, 2022.
  7. Alpacafarm: A simulation framework for methods that learn from human feedback. arXiv preprint arXiv:2305.14387, 2023.
  8. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair nlp models. arXiv preprint arXiv:2305.08283, 2023.
  9. Koala: A dialogue model for academic research. Blog post, April, 1, 2023.
  10. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022.
  11. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  12. Lorahub: Efficient cross-task generalization via dynamic lora composition. ArXiv, abs/2307.13269, 2023. URL https://api.semanticscholar.org/CorpusID:260155012.
  13. Editing models with task arithmetic. ICLR, 2022.
  14. Exploring the benefits of training expert language models over instruction tuning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  14702–14729. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/jang23a.html.
  15. Aligning large language models through synthetic feedback. arXiv preprint arXiv:2305.13735, 2023.
  16. Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. arXiv preprint arXiv:2303.05453, 2023.
  17. Continual learning for grounded instruction generation by observing human following behavior. Transactions of the Association for Computational Linguistics, 9:1303–1319, 2021.
  18. Branch-train-merge: Embarrassingly parallel training of expert language models. arXiv preprint arXiv:2208.03306, 2022.
  19. Towards controllable and personalized review generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3237–3245, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1319. URL https://aclanthology.org/D19-1319.
  20. Generating personalized recipes from historical user preferences. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  5976–5982, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1613. URL https://aclanthology.org/D19-1613.
  21. Training millions of personalized dialogue agents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2775–2779, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1298. URL https://aclanthology.org/D18-1298.
  22. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147, 2022.
  23. Webgpt: Browser-assisted question-answering with human feedback. ArXiv, abs/2112.09332, 2021a. URL https://api.semanticscholar.org/CorpusID:245329531.
  24. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021b.
  25. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022a. URL https://api.semanticscholar.org/CorpusID:246426909.
  26. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022b.
  27. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277, 2023.
  28. Lifting the curse of multilinguality by pre-training modular transformers. arXiv preprint arXiv:2205.06266, 2022.
  29. Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards. arXiv preprint arXiv:2306.04488, 2023.
  30. Lamp: When large language models meet personalization. arXiv preprint arXiv:2304.11406, 2023.
  31. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548, 2023.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  33. Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards. In International Conference on Machine Learning, pp. 8905–8915. PMLR, 2020.
  34. Reward is enough. Artificial Intelligence, 299:103535, 2021.
  35. A long way to go: Investigating length correlations in rlhf. arXiv preprint arXiv:2310.03716, 2023.
  36. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
  37. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  38. Continual learning for instruction following from realtime feedback. arXiv preprint arXiv:2212.09710, 2022.
  39. Reinforcement learning: An introduction. MIT press, 2018.
  40. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  41. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  42. Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology, 20:27–40, 2018.
  43. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483–3512, 2014.
  44. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp.  191–199. IEEE, 2013.
  45. How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751, 2023.
  46. lo-fi: distributed fine-tuning without communication. arXiv preprint arXiv:2210.11948, 2022a.
  47. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp. 23965–23998. PMLR, 2022b.
  48. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862, 2021a.
  49. Personalized response generation via generative split memory network. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  1956–1970, 2021b.
  50. Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693, 2023.
  51. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pp. 10607–10616. PMLR, 2020.
  52. Beyond goldfish memory: Long-term open-domain conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  5180–5197, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.356. URL https://aclanthology.org/2022.acl-long.356.
  53. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019.
  54. Personalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243, 2018.
  55. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.
  56. Personalized dialogue generation with diversified traits. arXiv preprint arXiv:1901.09672, 2019.
  57. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Joel Jang (30 papers)
  2. Seungone Kim (34 papers)
  3. Bill Yuchen Lin (72 papers)
  4. Yizhong Wang (42 papers)
  5. Jack Hessel (50 papers)
  6. Luke Zettlemoyer (225 papers)
  7. Hannaneh Hajishirzi (176 papers)
  8. Yejin Choi (287 papers)
  9. Prithviraj Ammanabrolu (39 papers)
Citations (90)