Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ARGS: Alignment as Reward-Guided Search (2402.01694v1)

Published 23 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Aligning LLMs with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided Search, a novel framework that integrates alignment into the decoding process, eliminating the need for expensive RL training. By adjusting the model's probabilistic predictions using a reward signal, ARGS generates texts with semantic diversity while being aligned with human preferences, offering a promising and flexible solution for aligning LLMs. Notably, ARGS demonstrates consistent enhancements in average reward compared to baselines across diverse alignment tasks and various model dimensions. For example, under the same greedy-based decoding strategy, our method improves the average reward by 19.56% relative to the baseline and secures a preference or tie score of 64.33% in GPT-4 evaluation. We believe that our framework, emphasizing decoding-time alignment, paves the way for more responsive LLMs in the future. Code is publicly available at: \url{https://github.com/deeplearning-wisc/args}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Anonymous. The trickle-down impact of reward inconsistency on RLHF. In Submitted to The Twelfth International Conference on Learning Representations, 2023. under review.
  2. Anthropic. https://www.anthropic.com/index/introducing-claude, 2023.
  3. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
  4. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  5. Mirostat: a neural text decoding algorithm that directly controls perplexity. In International Conference on Learning Representations, 2021.
  6. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  7. PPL-MCTS: constrained textual generation through discriminator-guided MCTS decoding. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp.  2953–2967. Association for Computational Linguistics, 2022.
  8. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. https://vicuna.lmsys.org, 2023.
  9. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 2017.
  10. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations, 2020.
  11. Microsoft DeepSpeed. https://github.com/microsoft/DeepSpeedExamples, 2023.
  12. Reward-augmented decoding: Efficient controlled text generation with a unidirectional reward model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  13. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023.
  14. Lmflow: An extensible toolkit for finetuning and inference of large foundation models. arXiv preprint arXiv:2306.12420, 2023.
  15. Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767, 2023.
  16. Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning, 2022.
  17. Hierarchical neural story generation. In Association for Computational Linguistics, 2018.
  18. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Association for Computational Linguistics, 2020.
  19. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  20. Aligning foundation models for language with preferences through $f$-divergence minimization. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
  21. Google. Bard. https://bard.google.com/, 2023.
  22. Deep reinforcement learning that matters. Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
  23. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
  24. Comparison of diverse decoding methods from conditional language models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
  25. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
  26. Grace: Discriminator-guided chain-of-thought reasoning. 2023. URL https://openreview.net/forum?id=2MiTZxLFA9.
  27. GeDi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021.
  28. Rankgen: Improving text generation with large ranking models. In Empirical Methods in Natural Language Processing, 2022.
  29. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In International Conference on Machine Learning, 2021.
  30. Contrastive decoding: Open-ended text generation as optimization. In Association for Computational Linguistics, 2023a.
  31. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  5315–5333, 2023b.
  32. Let’s verify step by step. ArXiv, abs/2305.20050, 2023. URL https://api.semanticscholar.org/CorpusID:258987659.
  33. Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2021.
  34. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 2023.
  35. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  36. NeuroLogic decoding: (un)supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021.
  37. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2022.
  38. The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626, 2023.
  39. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  40. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022.
  41. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, 2023.
  42. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In ACM SIGKDD, 2020.
  43. Self-critical sequence training for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  44. Offline rl for natural language generation with implicit language q learning. arXiv preprint arXiv:2206.11871, 2023.
  45. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
  46. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 2020.
  47. A contrastive framework for neural text generation. Advances in Neural Information Processing Systems, 2022.
  48. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  49. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  50. Solving math word problems with process- and outcome-based feedback, 2022.
  51. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
  52. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023.
  53. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
  54. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  55. Naturalprover: Grounded mathematical proof generation with language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  56. Fine-grained human feedback gives better rewards for language model training. In Advances in Neural Information Processing Systems, 2023.
  57. Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
  58. FUDGE: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
  59. Tree of Thoughts: Deliberate problem solving with large language models. 2023.
  60. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023.
  61. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  62. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023a.
  63. Secrets of rlhf in large language models part i: Ppo. arXiv preprint arXiv:2307.04964, 2023b.
  64. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Maxim Khanov (2 papers)
  2. Jirayu Burapacheep (4 papers)
  3. Yixuan Li (183 papers)
Citations (18)

Summary

Insightful Overview of "ARGS: Alignment as Reward-Guided Search"

The paper entitled "ARGS: Alignment as Reward-Guided Search" introduces a novel framework for aligning LLMs with human objectives, a critical challenge in machine learning. Unlike traditional methods such as Reinforcement Learning from Human Feedback (RLHF), which are often resource-intensive and unstable, ARGS simplifies the alignment process by integrating it directly into the decoding phase of text generation. This approach bypasses the need for the expensive retraining associated with RL-based methodologies.

Motivation and Background

LLMs, due to their extensive and varied training datasets, can inadvertently generate inappropriate or misinformed content, necessitating effective alignment strategies. While RLHF has been widely adopted, including in top-tier models like GPT-4, it poses challenges in terms of training stability and adaptability to changing reward models. The introduction of ARGS addresses these issues by allowing for real-time alignment adjustments during text generation, thus providing a flexible alternative to training-based solutions.

Methodology

ARGS operates by modifying the model's probabilistic predictions using a reward signal during the text generation process. Specifically, the framework introduces a reward-guided scoring function that combines the model's predictions with a reward model's feedback. This mechanism allows the model to produce outputs that are semantically coherent and aligned with human preferences. The framework supports both greedy and stochastic token selection strategies, enhancing its applicability across different tasks and model architectures.

Empirical Evaluation

The evaluation of ARGS was conducted on the HH-RLHF dataset, demonstrating notable improvements in model alignment. Specifically, ARGS achieved a 19.56% enhancement in average reward over baseline greedy decoding. The framework also preserved lexical diversity without sacrificing contextual relevance, which suggests that it effectively balances alignment and coherence objectives.

Additionally, the versatility of ARGS was highlighted through its application across various model architectures and alignment tasks, such as those within the Stanford Human Preferences dataset. This adaptability underscores ARGS's potential as a model- and task-agnostic solution for achieving alignment in LLMs.

Comparison and Evaluation

A comparative analysis with existing RL-based methods, like Proximal Policy Optimization (PPO), showed that ARGS not only matches their alignment performance but also significantly excels in generating diverse and contextually integrated outputs. Furthermore, the GPT-4-based evaluations corroborated these findings, with ARGS outperforming other decoding methods in generating qualitatively superior responses.

Implications and Future Work

ARGS introduces a paradigmatic shift from training-phase to decoding-time alignment, which holds substantial promise for the future development of LLMs. By enabling models to adapt quickly to new reward signals and user preferences without retraining, ARGS presents a scalable, efficient, and adaptable approach to model alignment.

In terms of future research directions, exploring more complex tasks and refining reward modeling methods could enhance the utility of ARGS. Moreover, expanding the framework's application to increasingly intricate multi-step reasoning tasks can further validate its efficacy.

In summary, ARGS sets a new direction in the field of AI alignment, providing a robust, efficient, and versatile framework that may inspire further research and application in safer and more aligned AI systems.

Github Logo Streamline Icon: https://streamlinehq.com