Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation (2403.07261v1)

Published 12 Mar 2024 in cs.LG and cs.AI

Abstract: Offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while solely relying on a static dataset. For precise and efficient task identification, existing OMRL research suggests learning separate task representations that be incorporated with policy input, thus forming a context-based meta-policy. A major approach to train task representations is to adopt contrastive learning using multi-task offline data. The dataset typically encompasses interactions from various policies (i.e., the behavior policies), thus providing a plethora of contextual information regarding different tasks. Nonetheless, amassing data from a substantial number of policies is not only impractical but also often unattainable in realistic settings. Instead, we resort to a more constrained yet practical scenario, where multi-task data collection occurs with a limited number of policies. We observed that learned task representations from previous OMRL methods tend to correlate spuriously with the behavior policy instead of reflecting the essential characteristics of the task, resulting in unfavorable out-of-distribution generalization. To alleviate this issue, we introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning through a process called adversarial data augmentation. Specifically, the objective of adversarial data augmentation is not merely to generate data analogous to offline data distribution; instead, it aims to create adversarial examples designed to confound learned task representations and lead to incorrect task identification. Our experiments show that learning from such adversarial samples significantly enhances the robustness and effectiveness of the task identification process and realizes satisfactory out-of-distribution generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Uncertainty-based Offline Reinforcement Learning with Diversified Q-Ensemble. In Advances in Neural Information Processing Systems. 7436–7447.
  2. Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. In International Conference on Learning Representations.
  3. A Survey of Meta-Reinforcement Learning. arXiv preprint arXiv 2301.08028 (2023).
  4. Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. In International Conference on Learning Representations.
  5. Offline Model-Based Adaptable Policy Learning. In Advances in Neural Information Processing Systems. 8432–8443.
  6. Offline Meta Reinforcement Learning–Identifiability Challenges and Effective Data Collection Strategies. In Advances in Neural Information Processing Systems. 4607–4618.
  7. RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv preprint arXiv:1611.02779 (2016).
  8. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International conference on machine learning. PMLR, 1126–1135.
  9. An Introduction to Deep Reinforcement Learning. Foundations and Trends in Machine Learning 11, 3-4 (2018), 219–354.
  10. Scott Fujimoto and Shixiang Shane Gu. 2021. A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems. 20132–20145.
  11. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning. 2052–2062.
  12. Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning. 1582–1591.
  13. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Conference on Empirical Methods in Natural Language Processing. 6894–6910.
  14. Context Shift Reduction for Offline Meta-Reinforcement Learning. arXiv preprint arXiv:2311.03695 (2023).
  15. Generative Adversarial Nets. In Advances in Neural Information Processing Systems. 2672–2680.
  16. Guidelines for Reinforcement Learning in Healthcare. Nature Medicine 25, 1 (2019), 16–18.
  17. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning. 1856–1865.
  18. Jonathan Ho and Stefano Ermon. 2016. Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems. 4565–4573.
  19. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems. 12498–12509.
  20. Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning. arXiv preprint arXiv:2206.02000 (2022).
  21. Residual Reinforcement Learning for Robot Control. In International Conference on Robotics and Automation. 6023–6029.
  22. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems. 21810–21823.
  23. A Survey of Generalisation in Deep Reinforcement Learning. arXiv preprint arXiv:2111.09794 (2021).
  24. Offline Reinforcement Learning with Implicit Q-Learning. In International Conference on Learning Representations.
  25. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Advances in Neural Information Processing Systems. 11761–11771.
  26. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems. 1179–1191.
  27. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv preprint arXiv:2005.01643 (2020).
  28. Multi-Task Batch Reinforcement Learning with Metric Learning. In Advances in Neural Information Processing Systems. 6197–6210.
  29. FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization. In International Conference on Learning Representations.
  30. Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy. In AAAI Conference on Artificial Intelligence. 7637–7646.
  31. A Survey on Model-Based Reinforcement Learning. arXiv preprint arXiv 2206.09328 (2022).
  32. Offline Meta-Reinforcement Learning with Advantage Weighting. In International Conference on Machine Learning, Vol. 139. 7780–7791.
  33. Model-Based Reinforcement Learning: A Survey. Foundations and Trends in Machine Learning 16, 1 (2023), 1–118.
  34. AWAC: Accelerating Online Reinforcement Learning with Offline Datasets. arXiv preprint arXiv:2006.09359 (2020).
  35. MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL. In International Conference on Machine Learning. 26087–26105.
  36. Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748 (2018).
  37. Weighted Policy Constraints for Offline Reinforcement Learning. In AAAI Conference on Artificial Intelligence. 9435–9443.
  38. Offline Meta-Reinforcement Learning with Online Self-Supervision. In International Conference on Machine Learning. 17811–17829.
  39. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning. 8748–8763.
  40. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In International Conference on Machine Learning. 5331–5340.
  41. Policy Regularization with Dataset Constraint for Offline Reinforcement Learning. In International Conference on Machine Learning. 28701–28717.
  42. Rl-cyclegan: Reinforcement Learning Aware Simulation-to-real. In Conference on Computer Vision and Pattern Recognition. 11157–11166.
  43. RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems. 16082–16097.
  44. Mastering The Game of Go without Human Knowledge. Nature 550, 7676 (2017), 354–359.
  45. Model-Bellman Inconsistency for Model-Based Offline Reinforcement Learning. In International Conference on Machine Learning. 33177–33194.
  46. Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction (Second Edition). MIT press.
  47. MuJoCo: A Physics Engine for Model-Based Control. In International Conference on Intelligent Robots and Systems. 5026–5033.
  48. Attention is All you Need. In Advances in Neural Information Processing Systems. 5998–6008.
  49. Grandmaster Level in StarCraft II using Multi-agent Reinforcement learning. Nature 575, 7782 (2019), 350–354.
  50. Offline Meta Reinforcement Learning with In-Distribution Online Adaptation. In International Conference on Machine Learning. 36626–36669.
  51. Behavior Regularized Offline Reinforcement Learning. arXiv preprint arXiv:1911.11361 (2019).
  52. Prompting Decision Transformer for Few-Shot Policy Generalization. In International Conference on Machine Learning. 24631–24645.
  53. Meta-Gradient Reinforcement Learning. In Advances in Neural Information Processing Systems. 2402–2413.
  54. Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery. In AAAI Conference on Artificial Intelligence. 10843–10851.
  55. Conservative Data Sharing for Multi-Task Offline Reinforcement Learning. In Advances in Neural Information Processing Systems. 11501–11516.
  56. COMBO: Conservative Offline Model-Based Policy Optimization. In Advances in Neural Information Processing Systems. 28954–28967.
  57. MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems. 14129–14142.
  58. Haoqi Yuan and Zongqing Lu. 2022. Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning. In International Conference on Machine Learning. 25747–25759.
  59. Behavior Prior Representation learning for Offline Reinforcement Learning. In International Conference on Learning Representations.
  60. In-sample Actor Critic for Offline Reinforcement Learning. In International Conference on Learning Representations.
  61. DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning. In AAAI Conference on Artificial Intelligence. 11210–11218.
  62. Task Inference for Offline Meta Reinforcement Learning via Latent Shared Knowledge. In International Conference on Knowledge Science, Engineering and Management. 356–365.
  63. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chengxing Jia (9 papers)
  2. Fuxiang Zhang (9 papers)
  3. Yi-Chen Li (10 papers)
  4. Chen-Xiao Gao (7 papers)
  5. Xu-Hui Liu (6 papers)
  6. Lei Yuan (34 papers)
  7. Zongzhang Zhang (33 papers)
  8. Yang Yu (385 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.