Integrating Rpc-theoretic insights into LLM training
Determine how to leverage the error-decomposition framework for sampling-based test-time scaling in large language model reasoning—separating estimation error from model error—and the Reasoning-pruning Perplexity Consistency (Rpc) principle that combines internal LLM probabilities with self-consistency, to improve the training process of large language models rather than using these ideas only as a post-hoc inference-time procedure.
References
Although integrating our method with trained LLMs may provide additional performance gains, it remains an open question how to utilize our insights to improve the training process of LLMs.
— A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
(2510.15444 - Zhou et al., 17 Oct 2025) in Appendix, Section "Limitations and Future Work" (Item 2)