ARGS: Alignment as Reward-Guided Search (2402.01694v1)
Abstract: Aligning LLMs with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided Search, a novel framework that integrates alignment into the decoding process, eliminating the need for expensive RL training. By adjusting the model's probabilistic predictions using a reward signal, ARGS generates texts with semantic diversity while being aligned with human preferences, offering a promising and flexible solution for aligning LLMs. Notably, ARGS demonstrates consistent enhancements in average reward compared to baselines across diverse alignment tasks and various model dimensions. For example, under the same greedy-based decoding strategy, our method improves the average reward by 19.56% relative to the baseline and secures a preference or tie score of 64.33% in GPT-4 evaluation. We believe that our framework, emphasizing decoding-time alignment, paves the way for more responsive LLMs in the future. Code is publicly available at: \url{https://github.com/deeplearning-wisc/args}.
- Anonymous. The trickle-down impact of reward inconsistency on RLHF. In Submitted to The Twelfth International Conference on Learning Representations, 2023. under review.
- Anthropic. https://www.anthropic.com/index/introducing-claude, 2023.
- A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
- Mirostat: a neural text decoding algorithm that directly controls perplexity. In International Conference on Learning Representations, 2021.
- Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
- PPL-MCTS: constrained textual generation through discriminator-guided MCTS decoding. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp. 2953–2967. Association for Computational Linguistics, 2022.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. https://vicuna.lmsys.org, 2023.
- Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 2017.
- Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations, 2020.
- Microsoft DeepSpeed. https://github.com/microsoft/DeepSpeedExamples, 2023.
- Reward-augmented decoding: Efficient controlled text generation with a unidirectional reward model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023.
- Lmflow: An extensible toolkit for finetuning and inference of large foundation models. arXiv preprint arXiv:2306.12420, 2023.
- Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767, 2023.
- Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning, 2022.
- Hierarchical neural story generation. In Association for Computational Linguistics, 2018.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Association for Computational Linguistics, 2020.
- Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
- Aligning foundation models for language with preferences through $f$-divergence minimization. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
- Google. Bard. https://bard.google.com/, 2023.
- Deep reinforcement learning that matters. Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
- Comparison of diverse decoding methods from conditional language models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
- Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
- Grace: Discriminator-guided chain-of-thought reasoning. 2023. URL https://openreview.net/forum?id=2MiTZxLFA9.
- GeDi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021.
- Rankgen: Improving text generation with large ranking models. In Empirical Methods in Natural Language Processing, 2022.
- Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In International Conference on Machine Learning, 2021.
- Contrastive decoding: Open-ended text generation as optimization. In Association for Computational Linguistics, 2023a.
- Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 5315–5333, 2023b.
- Let’s verify step by step. ArXiv, abs/2305.20050, 2023. URL https://api.semanticscholar.org/CorpusID:258987659.
- Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2021.
- Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 2023.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- NeuroLogic decoding: (un)supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2022.
- The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626, 2023.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022.
- Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, 2023.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In ACM SIGKDD, 2020.
- Self-critical sequence training for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Offline rl for natural language generation with implicit language q learning. arXiv preprint arXiv:2206.11871, 2023.
- Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 2020.
- A contrastive framework for neural text generation. Advances in Neural Information Processing Systems, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- Solving math word problems with process- and outcome-based feedback, 2022.
- Attention is all you need. Advances in Neural Information Processing Systems, 2017.
- Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- Naturalprover: Grounded mathematical proof generation with language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Fine-grained human feedback gives better rewards for language model training. In Advances in Neural Information Processing Systems, 2023.
- Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
- FUDGE: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
- Tree of Thoughts: Deliberate problem solving with large language models. 2023.
- Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023a.
- Secrets of rlhf in large language models part i: Ppo. arXiv preprint arXiv:2307.04964, 2023b.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
- Maxim Khanov (2 papers)
- Jirayu Burapacheep (4 papers)
- Yixuan Li (183 papers)