OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models (2410.09671v1)

Published 12 Oct 2024 in cs.AI and cs.CL

Abstract: In this technical report, we introduce OpenR, an open-source framework designed to integrate key components for enhancing the reasoning capabilities of LLMs. OpenR unifies data acquisition, reinforcement learning training (both online and offline), and non-autoregressive decoding into a cohesive software platform. Our goal is to establish an open-source platform and community to accelerate the development of LLM reasoning. Inspired by the success of OpenAI's o1 model, which demonstrated improved reasoning abilities through step-by-step reasoning and reinforcement learning, OpenR integrates test-time compute, reinforcement learning, and process supervision to improve reasoning in LLMs. Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning, achieving advanced reasoning capabilities beyond traditional autoregressive methods. We demonstrate the efficacy of OpenR by evaluating it on the MATH dataset, utilising publicly available data and search methods. Our initial experiments confirm substantial gains, with relative improvements in reasoning and performance driven by test-time computation and reinforcement learning through process reward models. The OpenR framework, including code, models, and datasets, is accessible at https://openreasoner.github.io.

Authors (13)

Jun Wang (991 papers)
Meng Fang (100 papers)
Ziyu Wan (32 papers)
Muning Wen (20 papers)
Jiachen Zhu (16 papers)
Anjie Liu (4 papers)
Ziqin Gong (3 papers)
Yan Song (91 papers)
Lei Chen (485 papers)
Lionel M. Ni (20 papers)
Linyi Yang (52 papers)
Ying Wen (75 papers)
Weinan Zhang (322 papers)

Citations (5)

View on Semantic Scholar

Summary

OpenR: Enhancing Reasoning Capabilities in LLMs

The paper "OpenR: An Open Source Framework for Advanced Reasoning with LLMs" introduces a comprehensive open-source platform aimed at advancing the reasoning abilities of LLMs through the integration of test-time computation and process supervision with reinforcement learning. This framework, termed OpenR, seeks to foster collaboration in the AI research community by providing resources that address various components necessary for improving reasoning in LLMs.

Overview of OpenR Framework

OpenR is designed to enhance LLM reasoning by embracing the methodologies that underpin OpenAI's o1 model, notably incorporating non-autoregressive decoding, reinforcement learning (RL) for policy training, and process reward models (PRMs) for guided search. The framework facilitates a shift from scale-focused training paradigms towards a more nuanced emphasis on inference-time computation, thereby enabling models to engage in more deliberate and step-by-step analysis, akin to the human cognitive process commonly described as System 2 thinking.

Technical Contributions

Data Augmentation: OpenR presents the MATH-APS dataset, building on established datasets like PRM800K and Math-Shepherd. This dataset enables the collection of process supervision data through automated methods, reducing dependence on costly human annotations.
Process Reward Models (PRMs): PRMs provide granular feedback during the reasoning process, allowing models to assess intermediate steps. The training of a high-quality PRM, Math-psa, demonstrates significant improvements in guiding LLMs towards accurate reasoning paths.
Reinforcement Learning Integration: The framework structures reasoning tasks as a Markov Decision Process (MDP), facilitating policy learning for LLMs through RL techniques like PPO. This iterative approach enhances the LLM's decision-making ability by aligning language generation with desired outcomes based on PRM feedback.
Test-Time Computation: OpenR employs advanced search algorithms such as beam search and best-of-N strategies during inference. These methods are guided by PRMs to optimize reasoning outcomes under given computation budgets, achieving notable accuracy improvements on the MATH dataset.

Experimental Insights

Experiments conducted using OpenR reveal that both beam search and best-of-N methodologies substantially outperform simpler approaches like majority voting in terms of reasoning accuracy. The Math-psa PRM, in particular, demonstrates superior performance across varying computational constraints, thereby validating the efficacy of the framework’s approach to process supervision. Moreover, reinforcement learning within the OpenR setting shows promise, though results indicate that more complex datasets may necessitate additional enhancements to achieve broader generalization.

Implications and Future Directions

OpenR's contributions have significant implications for developing models with improved autonomous reasoning capabilities. As LLMs become more proficient in reasoning tasks, their applicability across fields such as science, mathematics, and coding is poised to expand. This framework serves as a foundation for researchers to explore reasoning enhancements in models, potentially leading to broader insights into cognitive modeling and AI alignment.

Future research may focus on expanding datasets for process supervision, refining PRM methodologies, and scaling the framework to accommodate a wider array of reasoning tasks. Further exploration into more complex inference-time strategies such as Monte Carlo Tree Search (MCTS) may also yield valuable advancements in test-time computation.

In conclusion, OpenR presents a robust foundation for advancing LLM reasoning, providing researchers with valuable tools and benchmarks to drive forward the field of AI reasoning. Through its open nature, it encourages collaboration and innovation, aligning with the ongoing pursuit of AI systems capable of complex and reliable reasoning.

PDF Markdown

Related Papers

Find Related Papers

GitHub

Tweets

https://twitter.com/Synced_Global/status/1845755471640334491

https://twitter.com/openreasoner/status/1845771871507063098