Value-suboptimality guarantees under suboptimal demonstrators for general bounded rewards
Determine whether value-suboptimality guarantees can be achieved for general bounded reward model classes when the demonstrator is suboptimal, within the learning-from-correct-demonstrations framework that relies on low-cardinality reward model classes rather than policy-class assumptions.
Sponsor
References
We leave it as an interesting and important direction for future work whether we can achieve $$-value suboptimality for general bounded reward model classes even if the demonstrator is suboptimal.
— Learning to Answer from Correct Demonstrations
(2510.15464 - Joshi et al., 17 Oct 2025) in Remark (Value Suboptimality), Section 6 (Learning from Suboptimal Demonstrator)