When and how to incorporate additional Recursive Reward Modeling steps beyond critique

Ascertain the appropriate point at which to introduce further steps of Recursive Reward Modeling (RRM) beyond a single critique stage for supervising large language models and determine whether the critique method can be effectively used within RRM.

Background

The authors position critique as a first step toward scalable oversight and relate it to Recursive Reward Modeling (RRM), which iteratively decomposes evaluation and supervision tasks. They explicitly state uncertainty about when to add additional RRM steps and whether critique is suitable for RRM, highlighting a gap in guidance for multi-step oversight workflows.

Resolving this question would inform the design of scalable oversight systems that leverage critiques as building blocks within RRM, potentially improving training signal quality for advanced models.

References

The critique approach is also only the first step of recursive reward modeling (RRM), and we do not know the point at which an additional RRM step is appropriate or whether critique can be used for RRM effectively.

— LLM Critics Help Catch LLM Bugs (2407.00215 - McAleese et al., 2024) in Section “Discussion and Limitations”

When and how to incorporate additional Recursive Reward Modeling steps beyond critique

Background

References

Related Problems