Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning (2405.11740v1)

Published 20 May 2024 in cs.LG and cs.AI

Abstract: In visual Reinforcement Learning (RL), upstream representation learning largely determines the effect of downstream policy learning. Employing auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the sample efficiency and performance of downstream RL. Prior advanced auxiliary tasks all focus on how to extract as much information as possible from limited experience (including observations, actions, and rewards) through their different auxiliary objectives, whereas in this article, we first start from another perspective: auxiliary training data. We try to improve auxiliary representation learning for RL by enriching auxiliary training data, proposing \textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observations \textbf{(LFS)}, a novel self-supervised RL approach. Specifically, we propose a training-free method to synthesize observations that may contain future information, as well as a data selection approach to eliminate unqualified synthetic noise. The remaining synthetic observations and real observations then serve as the auxiliary data to achieve a clustering-based temporal association task for representation learning. LFS allows the agent to access and learn observations that have not yet appeared in advance, so as to quickly understand and exploit them when they occur later. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced auxiliary tasks. Extensive experiments demonstrate that our LFS exhibits state-of-the-art RL sample efficiency on challenging continuous control and enables advanced visual pre-training based on action-free video demonstrations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. R. Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Exploration by random network distillation. In Seventh International Conference on Learning Representations, pages 1–17, 2019.
  4. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
  5. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  6. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  7. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 4956–4975. PMLR, 2022.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. Bnas: Efficient neural architecture search using broad scalable architecture. IEEE Transactions on Neural Networks and Learning Systems, 2021.
  10. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
  11. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  12. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  13. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  14. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  15. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  16. O. Henaff. Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning, pages 4182–4192. PMLR, 2020.
  17. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397, 2016.
  18. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  19. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on robot learning, pages 651–673. PMLR, 2018.
  20. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  21. Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020a.
  22. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020b.
  23. Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191, 2021.
  24. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
  25. Mind: Masked and inverse dynamics modeling for data-efficient deep reinforcement learning. 2024.
  26. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE transactions on neural networks and learning systems, 31(6):2064–2076, 2019.
  27. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  28. H. Liu and P. Abbeel. Aps: Active pretraining with successor features. In International Conference on Machine Learning, pages 6736–6747. PMLR, 2021a.
  29. H. Liu and P. Abbeel. Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems, 34:18459–18473, 2021b.
  30. Enhancing reinforcement learning via transformer-based state predictive representations. IEEE Transactions on Artificial Intelligence, 2024.
  31. Cross-domain random pre-training with prototypes for reinforcement learning. arXiv preprint arXiv:2302.05614, 2023a.
  32. Comsd: Balancing behavioral quality and diversity in unsupervised skill discovery. arXiv preprint arXiv:2309.17203, 2023b.
  33. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  34. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  35. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
  36. Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
  37. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  38. Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020.
  39. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, pages 9870–9879. PMLR, 2021.
  40. Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
  41. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2020.
  42. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021a.
  43. Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 11920–11931. PMLR, 2021b.
  44. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10674–10681, 2021c.
  45. Mastering atari games with limited data. Advances in neural information processing systems, 34:25476–25488, 2021.
  46. Playvirtual: Augmenting cycle-consistent virtual trajectories for reinforcement learning. Advances in Neural Information Processing Systems, 34:5276–5289, 2021.
  47. Mask-based latent reconstruction for reinforcement learning. Advances in Neural Information Processing Systems, 35:25117–25131, 2022.
  48. Value-consistent representation learning for data-efficient reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11069–11077, 2023.
  49. Taco: Temporal latent action-driven contrastive loss for visual reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
Citations (1)

Summary

We haven't generated a summary for this paper yet.