Learning to Plan for Language Modeling from Unlabeled Data (2404.00614v2)
Abstract: By training to predict the next token in an unlabeled corpus, LLMs learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. Given the textual context, this planning module learns to predict future abstract writing actions, which correspond to centroids in a clustered text embedding space. By conditioning on these actions, our model extends the successful LLM formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves LLMing performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the LLM, new planner modules can be trained at large scale and easily be shared with the community.
- k-means++: the advantages of careful seeding. In SODA, pp. 1027–1035. SIAM, 2007.
- A neural probabilistic language model. pp. 932–938, 2000.
- Deep learning for AI. Commun. ACM, 64(7):58–65, 2021.
- Language models are few-shot learners. In NeurIPS, 2020.
- A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games, 4(1):1–43, 2012.
- Grounding large language models in interactive environments with online reinforcement learning. In ICML, volume 202 of Proceedings of Machine Learning Research, pp. 3676–3713. PMLR, 2023.
- Chatcot: Tool-augmented chain-of-thought reasoning on chat-based large language models. In EMNLP (Findings), pp. 14777–14790. Association for Computational Linguistics, 2023.
- Training Verifiers to Solve Math Word Problems, November 2021. URL http://arxiv.org/abs/2110.14168. arXiv:2110.14168 [cs].
- Jonathan St BT Evans. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468, 1984.
- Olmo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838, 2024.
- Retrieval augmented language model pre-training. In ICML, volume 119 of Proceedings of Machine Learning Research, pp. 3929–3938. PMLR, 2020.
- Reasoning with language model is planning with world model. In EMNLP, pp. 8154–8173. Association for Computational Linguistics, 2023.
- Teaching large language models to reason with reinforcement learning. arXiv preprint arXiv:2403.04642, 2024.
- Measuring massive multitask language understanding. In ICLR. OpenReview.net, 2021.
- Algorithmic progress in language models. arXiv preprint arXiv:2403.05812, 2024.
- An empirical analysis of compute-optimal large language model training. In NeurIPS, 2022.
- spacy: Industrial-strength natural language processing in python. 2020.
- Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net, 2022.
- Toward better storylines with sentence-level language models. In ACL, pp. 7472–7478. Association for Computational Linguistics, 2020.
- Narrative text generation with a latent discrete plan. In EMNLP (Findings), volume EMNLP 2020 of Findings of ACL, pp. 3637–3650. Association for Computational Linguistics, 2020.
- Daniel Kahneman. Thinking, fast and slow. Macmillan, 2011.
- Generalization through memorization: Nearest neighbor language models. In ICLR. OpenReview.net, 2020.
- Language models can solve computer tasks. In NeurIPS, 2023.
- Large language models are zero-shot reasoners. In NeurIPS, 2022.
- Topically driven neural language model. In ACL (1), pp. 355–365. Association for Computational Linguistics, 2017.
- Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pp. 707–710. Soviet Union, 1966.
- Retrieval-augmented generation for knowledge-intensive NLP tasks. In NeurIPS, 2020.
- Jieyi Long. Large language model guided tree-of-thought, 2023. URL https://arxiv.org/abs/2305.08291.
- Sentence-level planning for especially abstractive summarization. In Proceedings of the Third Workshop on New Frontiers in Summarization, pp. 1–14, 2021.
- Pointer sentinel mixture models. In ICLR (Poster). OpenReview.net, 2017.
- Document-level neural machine translation with hierarchical attention networks. In EMNLP, pp. 2947–2954. Association for Computational Linguistics, 2018.
- Recurrent neural network based language model. In INTERSPEECH, pp. 1045–1048. ISCA, 2010.
- Extensions of recurrent neural network language model. In ICASSP, pp. 5528–5531. IEEE, 2011.
- Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL, pp. 280–290. ACL, 2016.
- Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning. In NeurIPS, pp. 25192–25204, 2021.
- Art: Automatic multi-step reasoning and tool-use for large language models, 2023. URL https://arxiv.org/abs/2303.09014.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Generating summaries with topic templates and structured convolutional decoders. In ACL (1), pp. 5107–5116. Association for Computational Linguistics, 2019.
- Alec Radford. Improving Language Understanding by Generative Pre-Training. 2018. URL https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- A recipe for arbitrary text style transfer with large language models. In ACL (2), pp. 837–848. Association for Computational Linguistics, 2022.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1), pp. 3980–3990. Association for Computational Linguistics, 2019.
- Toolformer: Language models can teach themselves to use tools. In NeurIPS, 2023.
- Mastering atari, go, chess and shogi by planning with a learned model. Nat., 588(7839):604–609, 2020.
- Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936, 2023.
- Mpnet: Masked and permuted pre-training for language understanding. In NeurIPS, 2020.
- PEARL: prompting large language models to plan and execute actions over long documents. In EACL (1), pp. 469–486. Association for Computational Linguistics, 2024.
- Reinforcement learning: An introduction. MIT press, 2018.
- Abstractive document summarization with a graph-based attentional neural model. In ACL (1), pp. 1171–1181. Association for Computational Linguistics, 2017.
- On the planning abilities of large language models - A critical investigation. In NeurIPS, 2023.
- Attention is all you need. In NIPS, pp. 5998–6008, 2017.
- Pointer networks. In NIPS, pp. 2692–2700, 2015.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In ACL (1), pp. 2609–2634. Association for Computational Linguistics, 2023a.
- Topic-guided variational auto-encoder for text generation. In NAACL-HLT (1), pp. 166–177. Association for Computational Linguistics, 2019.
- Guiding Language Model Reasoning with Planning Tokens, December 2023b. URL http://arxiv.org/abs/2310.05707. arXiv:2310.05707 [cs].
- Rationale-augmented ensembles in language models, 2022a. URL https://arxiv.org/abs/2207.00747.
- Self-consistency improves chain of thought reasoning in language models, 2022b. URL https://openreview.net/forum?id=1PL1NIMMrw.
- Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022, 2022a.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022b.
- Transformers: State-of-the-art natural language processing. In EMNLP (Demos), pp. 38–45. Association for Computational Linguistics, 2020.
- Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought, 2023. URL https://arxiv.org/abs/2305.11499.
- Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
- Hierarchical attention networks for document classification. In HLT-NAACL, pp. 1480–1489. The Association for Computational Linguistics, 2016.
- Tree of thoughts: Deliberate problem solving with large language models. In NeurIPS, 2023.
- A survey of controllable text generation using transformer-based pre-trained language models. ACM Comput. Surv., 56(3):64:1–64:37, 2024a.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023.
- Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12:39–57, 01 2024b. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00632. URL https://doi.org/10.1162/tacl_a_00632.
- Verify-and-edit: A knowledge-enhanced chain-of-thought framework. In ACL (1), pp. 5823–5840. Association for Computational Linguistics, 2023.
- Revisiting topic-guided language models. Transactions on Machine Learning Research, 2023a.
- Progressive-hint prompting improves reasoning in large language models, 2023b. URL https://arxiv.org/abs/2304.09797.
- Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675, 2023.