Generalize reward normalizations beyond linear policy-indexed constraints
Extend the identification and inference machinery developed for linear reward normalizations indexed by a reference policy ν to accommodate affine or nonlinear reward normalizations, establishing how to recover normalized rewards and conduct efficient inference under these generalized constraints.
Sponsor
References
Several directions remain open. Second, while we focus on linear normalizations indexed by a reference policy~\nu, the same machinery should extend to affine or nonlinear normalizations.
— Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models
(2512.24407 - Laan et al., 30 Dec 2025) in Conclusion