Selecting suitable reward features for IRL in continuous state spaces

Determine which feature functions should be used to represent reward functions in inverse reinforcement learning for continuous state spaces under the standard linear reward model, in which the reward is expressed as a weighted sum of features, so that the chosen feature set is suitable for capturing and reproducing expert policies.

Background

Inverse reinforcement learning (IRL) seeks to infer a reward function from expert demonstrations. A common approach represents the reward as a linear combination of features, which requires selecting appropriate feature functions. In continuous and high-dimensional state spaces, raw state variables are often insufficient, making the identification of suitable features critical.

The paper proposes polynomial basis functions as candidates and introduces a correlation-based feature selection mechanism. While the method aims to address this challenge, the underlying question of determining generally suitable features for IRL in continuous settings is explicitly identified by the authors as an open challenge.

References

A central open challenge in inverse reinforcement learning is the choice of suitable features to represent the reward.

Automated Feature Selection for Inverse Reinforcement Learning  (2403.15079 - Baimukashev et al., 2024) in Figure 1 caption (Introduction)