Self-supervised network distillation: an effective approach to exploration in sparse reward environments (2302.11563v4)
Abstract: Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.
- Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems, volume 34, pages 29304–29320.
- Unsupervised state representation learning in Atari. CoRR, abs/1906.08226.
- Masked Siamese networks for label-efficient learning. arXiv e-prints, page arXiv:2204.07141.
- A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976.
- An information-theoretic perspective on intrinsic motivation in reinforcement learning: A survey. Entropy, 25.
- Never give up: Learning directed exploration strategies. arXiv:2002.06038.
- Baldassarre, G. (2019). Intrinsic motivations and open-ended learning. arXiv:1912.13263v1 [cs.AI].
- Intrinsic motivations and open-ended development in animals, humans, and robots: An overview. Frontiers in Psychology.
- VICReg: Variance-invariance-covariance regularization for self-supervised learning. In International Conference on Learning Representations.
- Barto, A. (2013). Intrinsic Motivation and Reinforcement learning, page 17–47. Springer, Berlin/Heidelberg, Germany.
- Intrinsic motivation for reinforcement learning systems. In Proceedings of the 13th Yale Workshop on Adaptive and Learning Systems, pages 113–118. Yale University Press.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279.
- Large-scale study of curiosity-driven learning. arXiv:1808.04355.
- Exploration by random network distillation. arXiv:1810.12894.
- Learning a similarity metric discriminatively, with application to face verification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 539–546.
- Leveraging procedural generation to benchmark reinforcement learning. CoRR, abs/1912.01588.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 [cs.LG].
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284.
- BYOL-explore: Exploration by bootstrapped prediction. In Advances in Neural Information Processing Systems, volume 35, pages 31855–31870.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In 13th International Conference on Artificial Intelligence and Statistics, volume 9, pages 297–304.
- Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition.
- Advances in adaptive skill acquisition. In Artificial Neural Networks and Machine Learning (ICANN), pages 650–661, Switzerland AG. Springer Nature.
- VIME: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109–1117.
- Learning state representations with robotic priors. Autonomous Robots, 39(3):407–428.
- EMI: Exploration with mutual information. arXiv:1810.01176.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations.
- Auto-encoding variational bayes. arXiv:1312.6114.
- Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems, 25.
- A simple randomization technique for generalization in deep reinforcement learning. CoRR, abs/1910.05396.
- State representation learning for control: An overview. CoRR, abs/1802.04181.
- Count-based exploration with the successor representation. arXiv:1807.11622.
- Count-based exploration in feature space for reinforcement learning. arXiv:1706.08090.
- Playing Atari with deep reinforcement learning. In Neural Information Processing Systems.
- Human-level control through deep reinforcement learning. Nature, 518:529–533.
- On what motivates us: a detailed review of intrinsic v. extrinsic motivation. Psychological Medicine, 52(10):1801–1816.
- Count-based exploration with neural density models. In International Conference on Machine Learning, pages 2721–2730.
- What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1:6.
- Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71.
- Curiosity-driven exploration by self-supervised prediction. arXiv:1705.05363.
- Intrinsic motivation based on feature extractor distillation. In Kognice a umělý život XX, pages 84–91. ČVUT v Praze.
- Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary Educational Psychology, 25(1):54–67.
- Proximal policy optimization algorithms. arXiv:1707.06347.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592.
- State entropy maximization with random encoders for efficient exploration. In International Conference on Machine Learning, pages 9443–9454.
- Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3):379–423.
- Model-based active exploration. In International Conference on Machine Learning, pages 5779–5788.
- Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2):70–82.
- Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems, volume 29.
- Reinforcement learning in game industry — review, prospects and challenges. Applied Sciences.
- CURL: contrastive unsupervised representations for reinforcement learning. CoRR, abs/2004.04136.
- Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814.
- #Exploration: A study of count-based exploration for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 2753–2762.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748.
- Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605.
- Intrinsic reward driven imitation learning via generative model. In International Conference on Machine Learning, pages 10925–10935.
- Yuan, M. (2022). Intrinsically-motivated reinforcement learning: A brief introduction. arXiv:2203.02298.
- Barlow twins: Self-supervised learning via redundancy reduction. CoRR, abs/2103.03230.