Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Align on the Fly: Adapting Chatbot Behavior to Established Norms (2312.15907v1)

Published 26 Dec 2023 in cs.CL

Abstract: In this paper, we aim to align LLMs with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs' behaviors without further training, allowing for convenient updates and customization of human values. We also introduce a scalable evaluation to assess the proposed method more effectively. Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method. Our code and data are released at https://github.com/GAIR-NLP/OPO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. 2023. Qwen technical report. CoRR, abs/2309.16609.
  2. Palm 2 technical report. CoRR, abs/2305.10403.
  3. Anthropic. 2023. Model card and evaluations for claude models.
  4. Self-rag: Learning to retrieve, generate, and critique through self-reflection. CoRR, abs/2310.11511.
  5. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
  6. Training a helpful and harmless assistant with reinforcement learning from human feedback. CoRR, abs/2204.05862.
  7. Constitutional AI: harmlessness from AI feedback. CoRR, abs/2212.08073.
  8. Fine-tuning language models to find agreement among humans with diverse preferences. In NeurIPS.
  9. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR.
  10. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  11. Open problems and fundamental limitations of reinforcement learning from human feedback. CoRR, abs/2307.15217.
  12. Black-box prompt optimization: Aligning large language models without model training. CoRR, abs/2311.04155.
  13. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  14. Alpacafarm: A simulation framework for methods that learn from human feedback. CoRR, abs/2305.14387.
  15. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res., 23:120:1–120:39.
  16. Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 653–670. Association for Computational Linguistics.
  17. Iason Gabriel. 2020a. Artificial intelligence, values, and alignment. Minds Mach., 30(3):411–437.
  18. Iason Gabriel. 2020b. Artificial intelligence, values, and alignment. Minds Mach., 30(3):411–437.
  19. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pages 3356–3369. Association for Computational Linguistics.
  20. Improving alignment of dialogue agents via targeted human judgements. CoRR, abs/2209.14375.
  21. Aligning language models with preferences through f-divergence minimization. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 11546–11583. PMLR.
  22. Google. 2023. Bard.
  23. Large language models can self-improve. CoRR, abs/2210.11610.
  24. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. CoRR, abs/2305.08322.
  25. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pages 874–880. Association for Computational Linguistics.
  26. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  27. Large language models are zero-shot reasoners. In NeurIPS.
  28. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  29. Generative judge for evaluating alignment. arXiv preprint arXiv:2310.05470.
  30. RAIN: your language models can align themselves without finetuning. CoRR, abs/2309.07124.
  31. Training socially aligned language models in simulated human society. CoRR, abs/2305.16960.
  32. Aligning generative language models with human values. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 241–252. Association for Computational Linguistics.
  33. Application and evaluation of large language models for the generation of survey questions. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 5244–5245. ACM.
  34. Augmented language models: a survey. CoRR, abs/2302.07842.
  35. Willy Moka-Mubelo. 2016. Reconciling law and morality in human rights discourse. Philosophy and Politics-Critical Explorations, 3.
  36. OpenAI. 2022a. Blog: Introducing chatgpt.
  37. OpenAI. 2022b. New and improved embedding model.
  38. OpenAI. 2023. Gpt-4 technical report.
  39. Training language models to follow instructions with human feedback. In NeurIPS.
  40. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 2463–2473. Association for Computational Linguistics.
  41. Scaling language models: Methods, analysis & insights from training gopher. CoRR, abs/2112.11446.
  42. Direct preference optimization: Your language model is secretly a reward model. CoRR, abs/2305.18290.
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  44. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 5418–5426. Association for Computational Linguistics.
  45. Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  46. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
  47. Steven Shavell. 2002. Law versus morality as regulators of conduct. American law and economics review, 4(2):227–257.
  48. ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. CoRR, abs/2107.02137.
  49. Principle-driven self-alignment of language models from scratch with minimal human supervision. CoRR, abs/2305.03047.
  50. InternLM Team. 2023. Internlm: A multilingual language model with progressively enhanced capabilities. https://github.com/InternLM/InternLM.
  51. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  52. Bryan S Turner and Emile Durkheim. 2013. Professional ethics and civic morals. Routledge.
  53. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13484–13508. Association for Computational Linguistics.
  54. Learning to filter context for retrieval-augmented generation. CoRR, abs/2311.08377.
  55. Alan Watson. 2001. Society And Legal Change 2Nd Ed. Temple University Press.
  56. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  57. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  58. Ethical and social risks of harm from language models. CoRR, abs/2112.04359.
  59. Beyond goldfish memory: Long-term open-domain conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 5180–5197. Association for Computational Linguistics.
  60. XVERSE-AI. 2023. Xverse. https://github.com/xverse-ai/XVERSE-13B.
  61. Alignment for honesty. arXiv preprint arXiv:2312.07000.
  62. OPT: open pre-trained transformer language models. CoRR, abs/2205.01068.
  63. Xuanyu Zhang and Qing Yang. 2023. Xuanyuan 2.0: A large chinese financial chat model with hundreds of billions parameters. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 4435–4439. ACM.
  64. LIMA: less is more for alignment. CoRR, abs/2305.11206.
  65. Don’t make your LLM an evaluation benchmark cheater. CoRR, abs/2311.01964.
  66. Principled reinforcement learning with human feedback from pairwise or k-wise comparisons. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 43037–43067. PMLR.
  67. Normbank: A knowledge bank of situational social norms. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 7756–7776. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chunpu Xu (16 papers)
  2. Steffi Chern (11 papers)
  3. Ethan Chern (11 papers)
  4. Ge Zhang (170 papers)
  5. Zekun Wang (50 papers)
  6. Ruibo Liu (42 papers)
  7. Jing Li (621 papers)
  8. Jie Fu (229 papers)
  9. Pengfei Liu (191 papers)
Citations (15)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub