Unknown architecture and mechanisms of OpenAI’s o1 Large Reasoning Model

Characterize the architecture, operational mechanisms, and internal capabilities of OpenAI’s o1 (Strawberry) Large Reasoning Model that differentiate it from standard autoregressive Large Language Models, including the structures employed during pretraining and inference.

Background

The paper distinguishes Large Reasoning Models (LRMs) like OpenAI’s o1 from prior LLMs, noting that o1 appears fundamentally different in nature. However, technical details about its architecture and operational mechanisms are not disclosed by OpenAI, hindering rigorous evaluation, interpretability, and the development of appropriate benchmarking tools.

This unknown limits researchers’ ability to assess how o1 achieves its reasoning capabilities, including whether and how reinforcement learning, rollout-style inference, or private Chain-of-Thought traces are integrated, and complicates comparisons with classical planners and LLM-Modulo systems.

References

we draw a distinction between previous LLMs and o1, a Large Reasoning Model (or LRM), as its new (unknown) architecture, operation, and capabilities all seem to be fundamentally different from those of vanilla LLMs, both at pretraining phase and at inference time.

— LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench (2409.13373 - Valmeekam et al., 20 Sep 2024) in Introduction, footnote discussing Large Reasoning Models (following OpenAI), Section 1

Unknown architecture and mechanisms of OpenAI’s o1 Large Reasoning Model

Background

References

Related Problems