Interference from Cross-Task Experience Replay in Lifelong RL

Determine whether replaying samples from previously encountered tasks interferes with the online gradient updates when learning a new task in lifelong reinforcement learning settings that employ experience replay (e.g., reservoir sampling).

Background

In the experiments on the 2D navigation task, the authors compare several baselines, including a replay-based approach that uses reservoir sampling to manage a long-term memory. They observe that this baseline achieves slightly higher average return than simple fine-tuning but exhibits more oscillatory returns during lifelong training.

Based on this observation, they explicitly conjecture that cross-task replay may impose interference on the online updates for the current task. This raises a concrete and important question about the stability of replay-based continual reinforcement learning methods and whether interleaving samples from prior tasks can degrade adaptation to new tasks.

References

We conjecture that replaying samples from other tasks can impose some interference on the online updates when learning a new task.

— A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning (2205.10787 - Wang et al., 2022) in Section 4.1 (Simple 2D Navigation)

Interference from Cross-Task Experience Replay in Lifelong RL

Background

References

Related Problems