Dice Question Streamline Icon: https://streamlinehq.com

Formulating a setup that jointly studies demonstrations and evaluative feedback

Formulate a learning setup that jointly models demonstration-based supervision and evaluative feedback (e.g., whether a generated response is good), enabling analysis of how overlap among consistent hypotheses and additional feedback interact, and characterization of the feedback requirements to guarantee performance.

Information Square Streamline Icon: https://streamlinehq.com

Background

After showing that MLE over a restricted policy class can attain non-trivial overlap with the true supports, the authors suggest this overlap might be transformed into strong performance if one could also leverage evaluative feedback (such as correctness of generated responses).

They propose capturing this via a parameter quantifying overlap and feedback needs, but explicitly leave open the formulation of a suitable setup to paper both feedback sources together for their problem.

References

However, we leave it open to formulate an interesting setup that enables a study of both types of feedbacks together for our problem, and we do not attempt to investigate this any further.

Learning to Answer from Correct Demonstrations (2510.15464 - Joshi et al., 17 Oct 2025) in Appendix A.3 (Overlap of MLE), end of subsection