Determining decomposability in standard RL benchmarks

Determine whether the decomposability condition—namely, that every pair of states is linked via finite sequences of shared predecessor relationships—is satisfied in widely used reinforcement learning benchmark environments such as MuJoCo continuous control tasks and Atari games, under their respective transition dynamics, to validate the assumptions needed for Adversarial IRL to recover state-only rewards up to a constant.

Background

Adversarial Inverse Reinforcement Learning (AIRL) provides theoretical guarantees for recovering state-only rewards up to a constant when the environment’s transition dynamics satisfy the decomposability condition. This condition informally requires that every pair of states be connected (linked) through a finite chain of states that share common predecessors, ensuring certain equalities propagate across the state space.

The paper notes that while the decomposability condition holds in some constructed examples (e.g., gridworlds with a ‘stay’ action), it can fail in other settings (e.g., certain cyclic or checkerboard-structured dynamics). Whether popular benchmark environments satisfy this condition is crucial, because AIRL’s reward recovery guarantees depend on it; without decomposability, AIRL may not recover state-only rewards up to a constant as claimed.

References

We have not been able to determine whether the decomposability condition is satisfied in standard RL benchmarks, such as MuJoCo tasks or Atari games.

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning  (2203.11409 - Gleave et al., 2022) in Subsection "Recovering rewards" (following the definition and examples of the Decomposability Condition)