Generalize reward normalizations beyond linear policy-indexed constraints

Extend the identification and inference machinery developed for linear reward normalizations indexed by a reference policy ν to accommodate affine or nonlinear reward normalizations, establishing how to recover normalized rewards and conduct efficient inference under these generalized constraints.

Background

A central component of the paper is reward normalization, which selects a representative from the equivalence class of rewards consistent with observed softmax behavior. The authors focus on linear normalizations defined by a reference policy ν, enabling closed-form identification via fitted Q-iteration and efficient estimators.

Many practical applications impose non-linear or affine constraints on rewards (e.g., anchoring to known baselines or nonlinear state potentials). Extending the framework to these broader normalizations would enhance flexibility while preserving rigorous identification and inference guarantees.

References

Several directions remain open. Second, while we focus on linear normalizations indexed by a reference policy~\nu, the same machinery should extend to affine or nonlinear normalizations.

Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models (2512.24407 - Laan et al., 30 Dec 2025) in Conclusion