Emma

Summary:

  • A new model uses process supervision to improve mathematical reasoning and achieve state-of-the-art performance.
  • Process supervision directly trains models to produce human-endorsed thought chains and has alignment benefits over outcome supervision.