Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling (2407.02446v1)
Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model's ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve.
- Self-consuming generative models go mad. arXiv preprint arXiv:2307.01850.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Large language models suffer from their own output: An analysis of the self-consuming training loop. arXiv preprint arXiv:2311.16822.
- Mode regularized generative adversarial networks. In International Conference on Learning Representations.
- Recall and learn: Fine-tuning deep pretrained language models with less forgetting. arXiv preprint arXiv:2004.12651.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- On the use of arxiv as a dataset. Preprint, arXiv:1905.00075.
- Faith and fate: Limits of transformers on compositionality. Advances in Neural Information Processing Systems, 36.
- The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Aligning language models with preferences through f-divergence minimization. Preprint, arXiv:2302.08215.
- Olmo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838.
- The curious case of neural text degeneration. In International Conference on Learning Representations.
- Eduard Hovy and Chin-Yew Lin. 1998. Automated text summarization and the summarist system. In TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998, pages 197–214.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702.
- Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1645–1654. PMLR.
- Kazutaka Katoh and Daron M Standley. 2013. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution, 30(4):772–780.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Openassistant conversations-democratizing large language model alignment. Advances in Neural Information Processing Systems, 36.
- From distributional to overton pluralism: Investigating large language model alignment. arXiv [cs.CL].
- Self-alignment with instruction backtranslation. In The Twelfth International Conference on Learning Representations.
- Avoiding data contamination in language model evaluation: Dynamic test construction with latest materials. Preprint, arXiv:2312.12343.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747.
- Which training methods for GANs do actually converge? In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3481–3490. PMLR.
- Future lens: Anticipating subsequent tokens from a single hidden state. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pages 548–560.
- The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1525–1534, Berlin, Germany. Association for Computational Linguistics.
- Red teaming language models with language models. Preprint, arXiv:2202.03286.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Technical report, Google.
- The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Overcoming catastrophic forgetting in zero-shot cross-lingual generation. arXiv preprint arXiv:2205.12647.
- Mint: Evaluating llms in multi-turn interaction with tools and language feedback. In The Twelfth International Conference on Learning Representations.
- On the algorithmic bias of aligning large language models with rlhf: Preference collapse and matching regularization. Preprint, arXiv:2405.16455.
- Forget me not: Reducing catastrophic forgetting for domain adaptation in reading comprehension. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
- How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534.
- LIMA: Less is more for alignment. In Thirty-seventh Conference on Neural Information Processing Systems.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.