Integrating Multiple Confidence Signals for Self-Reward
Develop a unified self-reward mechanism for reinforcement learning on unlabeled data that integrates multiple complementary confidence signals to construct more reliable and fine-grained rewards for large language models.
References
This leaves open an important question: how can multiple complementary confidence signals be integrated to construct more reliable and fine-grained self-reward mechanisms?
— Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning
(2510.17923 - Tang et al., 20 Oct 2025) in Section 2.2 (Confidence-Based Reward)