Emma
Summary:
-
A new model uses process supervision to improve mathematical reasoning and achieve state-of-the-art performance.
-
Process supervision directly trains models to produce human-endorsed thought chains and has alignment benefits over outcome supervision.